Category Archives: Uncategorized

Writing a Game Engine in 2017

Writing a game engine in 2017, what does that look like? Should a developer download Unity? Gamemaker? Love2D? Almost certainly, correct? Why would anyone want to create a game engine… In 2017? This will be my last post for some years, so lets see if I can make it a decent one.

There are only a couple reasons to create a custom piece of technology, game engines included:

  • It is personally fun to the creator
  • Some kind of advantage can be utilized

Really these can be coalesced into a single piece as having fun while doing the hard-work of making a game is huge advantage. The types of advantages that come along with a piece of technology are:

  • Specialization
  • Performance
  • Breakthroughs/Innovations
  • Control

Again these can probably all be coalesced into the Breakthroughs/Innovations category. Custom tech should exist to give the creator an advantage somehow, and the types of advantages that really matter in terms of product success are the innovations and breakthroughs.

Usually the breakthroughs are in some kind of new graphics technique, new kind of AI, or new utilization of hardware. However, breakthroughs can be made in any area of game development, not just the vertical scaling of certain common aspects. A breakthrough in expression, for example, can yield tremendous success. Content iteration time, highly efficient or specialized tools are a couple other examples of great places to gain a competitive advantage. Here is a long reddit post I wrote on this topic, please do read it.

One last thing is that making a custom game engine will require quite a bit of expertise, and in order to write a good one will require a lot of experience. It’s not necessary to make a good game engine to make a good product, but it certainly helps! At the very least writing a game engine and finishing a product (finishing a game) with that engine will immediately make an engineer go from no-hire to immediately-hire. That said I’ll link to a document I was commissioned to write that covers a bunch of game engine and programming related topics, so any newer programmers can take a look and use it for a reference.

Runtime Compiled C++

I would like to share one kind of breakthrough more or less popularized by Casey Muratori from his Handmade Hero YouTube series. The idea is to use C/C++ for everything. Personally I wouldn’t dare use pure C because function/operator overloading is a very good feature. Then place game implementation, all of it, into a DLL. The entry point holds memory allocation features, and manages the game DLL. At run-time the game DLL can be hotloaded and swapped out whenever the user likes. C++ becomes as if it were a scripting language.

To make this work a lot of care needs to be placed on making sure compile-times stay very low. Here is a thread where I talked a bit about compile times for larger projects. Casey Muratori has released a ctime utility for profiling compile terms. Perfect :)

Finally, here is an old post of mine talking about some pros and cons of this style. Please do read this post. It contains majority of pros and cons, and talks about using code “as an editor”, as in, a general purpose editor (like level editor, or sequence editor, or cut-scene editor).

And last I have a working example for Windows here on Github. In hindsight I would probably recommend implementing this with SDL. Since the entrypoint for this style of game engine does not need to be recompiled at run-time, any type of heavy-weight libraries that can be unknown to the game implementation can safely hide in the entrypoint without affecting compile times much. SDL has some tools for dealing with loading/unloading dynamic libraries that can be used for cross-platform development.

Game Objects/Entities in Runtime C++

The biggest limitation for most programmers would be the total destruction of all function pointers, including the pointers to vtables that most compilers use to support the C++ virtual keyword. Since most compilers will likely implement vtables by storing a pointer in C++ objects to some kind of static table of function pointers, reloading dynamic libraries can move the location of vtables. This can leave dangling pointers in leftover C++ objects.

This can be solved with complicated C++ serialization techniques and reflection.

Or, this can be ignored completely.

Back in the days of GBA some games would want to make use of polymorphism (I recommend clicking the link, it is not Wikipedia). Polymorphism is an incredibly useful concept, and the typical way to implement polymorphism in C++ is with the virtual keyword. In C function pointers are used. With runtime C++ neither of these is an option (or so it seems).

When a dynamic library is unloaded and reloaded all functions are assigned new locations in memory. The nice thing is that the static variables and static file-scope bits of data are all cleared to zero if unitialized, and initialized if they have initializers. This is nice since function pointers in these data sections will be properly initialized to new function addresses upon load!

By taking a page out of old GBA C-implementation techniques we can reimplement polymorphism in a very straightforward way.

The idea was to have entities contain a virtual table (or vtable) of function pointers. However, there is no good reason to duplicate virtual tables into every single entity in the game, and instead this data really should be made static. Each entity can contain an integer identifier to describe what kind of entity it is. This ID can be used to index into a static array of vtables.


Here is how I define what my entities look sort of like in my own personal code:

Anything that is universal to all entities no matter what can go into the Entity struct. For now I just have double linked list pointers. I made add later, I may not add more later. It is unimportant. The important part is the id.

The id is used to index into a table that looks like this:

Every entity can Update or Draw itself in a polymorphic way. Making any entity update or draw itself is a matter of:

Personally I didn’t bother with type-safety. However some type-safety can be added if anyone cares enough.

Instead of each entity holding a pointer to a vtable like the C++ compilers do, they just hold an integer that indexes into a global array. The global array is properly initialized upon dynamic library reloading. It all works perfectly.

Some years ago in university I actually wrote out this kind of vtable stuff for a school club, as an example to students in my following year. Turns out it was incredibly useful for modern day game engine implementation. Feel free to take a peek if an example is of interest (albeit an older example).

Run-time Compiled C++ is also a good option! This github repo implements a much more full-featured style than the style this article describes. At the very least this code serves as a really cool learning resource. I have peered over this github repository quite a few times in the past years.

Smaller Compile Times

In order for this style to work compile times must be kept to a minimum. To facilitate this I have implemented a slew of useful libraries called tinyheaders. Feel free to use them in your own game engine. They accomplish tasks like multiplayer netcode, playing/looping sounds, collision detection, and a bunch of other odds and ends. Writing code like shown in tinyheaders, or on HandmadeHero is important to keep compile times at bay.

Truth be told almost an entire game engine can be constructed from these tinyheaders alone!

Unfortunately for many, this rules out the use of most C++ libraries, especially the ones making judicious use of templates. Libraries like Boost, as the prime example, will not work in this case.

Another important trick is the unity build. I have heard rumors of ridiculous claims, that some game engines when compiled as a unity build gain a 10% boost in performance. I have heard that unity builds are a more natural compilation scheme for C/C++. In my experience using a unity build makes compilation faster, if the entire project is written with compile times in mind. Compilation performant code should be aspired to from project inception! Keeping compile times under a few seconds is very important for iteration times.

Hotloading Assets, not just Code

Packaging up and creating assets is the other part of “compile times” that should be handled with care. For example it is quite tempting to place texture atlas compilation into the code compilation step. This can pretty quickly degenerate compile times in an unnecessary way. Instead some kind of alternative should be devised.

In my personal code I have a button to hotload assets, and upon game open (in debug mode) asset packaging is invoked. Since run-time C++ hotloading is used I actually try, for fun, to not close the game all day as I develop! So initial opening of the game does not happen very often, so it is OK is asset compilation takes a little time.

A nice commenter over in my Texture Atlas post pointed out another good solution; there is a program running that scans over asset directories looking for work to do (like atlas compiling). Whenever it sees work to be done, it kicks off and does it. The game itself can be notified of new assets (either by scanning file timestamps itself), or through some other means — whatever is preferred. TCP or UDP are some examples for inter-process communication. The asset scanner can also just exist within the game itself as a separate thread, sleeping whenever necessary.

Always On-Line Philosophy

Suddenly it starts making sense to just always leave the game running. Code can be hotloaded, and assets can be hotloaded. There should only be a need to close and re-open the game if data layouts change (see the alternatives listed above if you want to support data layout changes see the above alternative link).

The nice thing about the code-hotloading is the way it encourages the programmer to use code “as an editor”. Suddenly all the work of making some data driven is all about writing code as if it were the raw data! Incredible.

Here’s a very typical example. Take some more canonical or traditional styled C++ for a simple player movement class and function:

Okay, this is fine and dandy, it does work. After some playtesting it is realized speed multipliers would be fun. Lets code that:

Great! So in this style of development the game would need to be closed, recompiled, reopened many times to achieve this effect. Often times these games are not written with compilation speed in mind, so maybe compilation takes a good 3 minutes. Loading the game takes a good 15 seconds. Going through some menus takes another 30 seconds. We are now looking at a solid 5-7 minutes of completely useless bullshit that gets in the middle of important problem solving and iteration. Unacceptable! If readers wonder where the extra 2 minutes came from… Experience shows that once a long compilation is finished the engineer will already be watching youtube or browsing reddit, unaware for a solid minute or two (at minimum) before doing the next steps.

Now the designer wants to do some heavy tuning of the multipliers, their stats, duration, when they are triggered, etc.

The “old” way to solve this comes from the conventional wisdom: DATA DRIVE THE THINGS.

So now we spend some time spitting out these multiplier floats into yet another shitty xml file, and the designer can modify this xml file and see changes in-game while the game is live. Woopty doo. So how long did it take to implement this xml stuff? Was a new library integrated into the project to read and save xml files? How long does it take to compile? Or does it add yet another DLL dependency?

Instead of all this garbage, how about we treat the C++ code as data itself. Instead the player thingy can look like:

The speed can be tuned by anyone at any time that has a little bit of C knowledge, especially while the game is running! Brilliant.

But what about multipliers? Assuming a data layout change, and some kind of queue is created to hold multipliers, the game will still need to close and re-open. But! Just once, and we’ve been optimizing our compile times so it’s really not a big deal.

Multiplier support:

Alright, great! The internal queue system applies bits to the m_flags as necessary, and unsets them. This style of development really hammers home the differences between run-time RAM data, and on-disk data (code, assets). The above snippet places disk data into constants in the code, and the run-time mutable RAM is in the m_pos, m_direction and m_flags pieces.

Any code constant can be modified live and instantaneously iterated. Amazing.

But Randy (the idealized naive viewer will say) isn’t that just hacky hard-coded code??? Yes, it is hard-coded. Hacky? Sure, whatever. Label it “hacky”. But the facts still remain: this style of development has crazy good iteration time. Obviously it is requiring designers to have solid C understanding. This makes this style out of reach for anyone that is not good at C. This can be viewed as a downside. This can also be argued as a very good plus-side. To each their own!

The Code is the Editor

The point is the style of code can be shifted. Code becomes an editor, a live editor. Instead of spending time creating UI based dev-tools code itself is the dev-tool. Animation layout and definitions can be placed into structs forming data tables. If initializes at file-scope, these can be hotloaded and tweaked at run-time.

Imagine using a very cool tool like Spine (video link). The video shows the user doing some fundamental operations: attaching bones together, defining animation curves, keyframes, and time deltas. All of this can be done in C. If a programmer is comfortable in C all of these pieces can just be placed directly into code!

What is the fundamental movements of the mouse and keyboard while using a tool like Spine?

  1. The mouse moves
  2. Some things are drag and dropped
  3. Some keys are typed to define numbers or names

How does this relate to C code?

  1.  The mouse moves to another line of code
  2.  Some things are copy pasted from one spot to another
  3.  Some keys are typed to define numbers or names


As long as the C code is well-written and the system is designed well, it can behave very closely to a very good editor like Spine. But! The iteration time is *instantaneous* as far as getting these things in game and on game-screen goes.

It seems like a zero-cost benefit, but there is a real tradeoff here. The designer must be good at C. The animation editor must be good at C. Everyone must be very good at C. That’s a huge baseline! Being good at C is really hard, and so this option is not for everybody.

ECS is Garbage

Components are garbage. Entity component systems are garbage. OOP is garbage. All acronyms are garbage. Just write the god damn game. Solve specific problems with specific code solutions. Refactor the code as necessary. Everyone that bothers writing “an ECS system” is either still learning core fundamentals, or just wasting their time.

This post describes my personal plan of action towards creating a custom game engine, and the focus has been entirely on simplicity and iteration time. Please, readers, do not focus on “run-time object models”, “entity component systems”, or any other garbage on the internet. Truth is all of these people have never shipped a good game, or made a good product. Just go look at Love2D source, or GameMaker, or whatever have you. These products are successful, and they don’t bother around with acronyms. They just solve problems for their customers.

Your game, and your game engine should be solving *real* problems. Actual problems with clear definitions. Problems that you have seen. The game engine should be solving problems regarding iteration times, innovation, or specialization. The game itself should also be solving some kind of problem. What does the customer need? What do they starve for? How does the product relate to these needs and desires? Often times talk of ECS is pure hype.

The way I see it ECS is an attempt to construct a methodology for software engineering. Software engineering being more about API design, organization, code longevity etc. compared to just raw coding. Software engineering is a difficult and unnatural skill that more or less requires experience. Writing a good game engine will take experience, not an ECS/OOP/DOD/InsertDumbThingsHere acronym. Trial and error, sweat and blood, these are how good software is written, game engines included.

It seems some others are starting to speak up against ECS, here’s a thread from talking about this.

Addendum: The above section is just trying to attack acronyms and naive advice. As seen below (in the comments) there are some games that were released which tried to implement some kind of ECS or component system, and successfully shipped. Sure! But the exception makes the rule, right? Of course a team of solid engineers can create and ship a game with some kind of ECS, but still, it took a team of solid engineers to do so. Just take the above section with a dose of skepticism.

Know what you Want

Since the primary advantages of custom tech involve innovation, and making something truly special, it becomes paramount to know what you want. What do you want out of your product? Are you making a game, or a game engine? If you are not making a game, then why make a game engine? That’s ridiculous. What kind of product is one without customers?

Know what kind of game you want to make. If this is unknown then why are you reading a post about making a game engine? Please just go play with Unity until you know what you want. Then we can play ball when you get back.

The Long Grind

Making a custom piece of technology is a long grind. Making a good game engine in order to produce a quality product is a long grind. There will be little to no external source of motivation. Either you have the guts to just do the hard work, or you don’t. Making something great requires perspiration and knowledge, and a little opportunity. If you’re able to read this blog post you have enough opportunity.

Figure out what your personal strengths are and how you are as a person. Play to your strengths. Don’t try to build up weak spots and make them your strengths, unless those weak spots are who you truly are. For many people this precludes writing a custom engine, but that’s fine! Making a custom engine is just one option, and should serve the creator (not vice versa).


This may be the last blog post I ever write on this website, and at the very least will be the last one for some years. Hopefully it is useful to someone. Please leave comments or suggestions and I’ll edit this post and beef it up over time.


Extra Arguments

A lot of readers dislike the above section that attacks ECS as an acronym and concept. For the sake of posterity, and for anyone interested, I’ll respond to some common pro-ECS arguments seen on the internet. Feel free to skip this section otherwise.

I will reference this tidbit from an old reddit thread and respond to each point. The points raised in the thread represent very common arguments on the internet in favor of ECS. The goal here is to provide curious readers a voice that opposes popular opinions, that way readers can hear more than one side, and subsequently form their own opinion.

ECS enables composition which is one way to solve the problem of sharing behavior. Inheritance solves the same problem but results in inflexible hierarchies. See the bugs in League of Legends caused by “everything extends Minion.” ECS lets you pick and choose any set of qualities for your entities without running the risk of bringing in qualities you don’t want and without duplicating code.

The idea here is “use ECS because composition is good”. I’m sure most readers are familiar with the old is-A vs has-A stuff. Most readers have heard about features floating up inheritance hierarchies. However, ECS or any other fancy acronyms are not needed to implement some kind of composition. Simply defining some POD structs and including them inside of other structs can implement the concept of composition.

The gripe I have is when  we say “do this ECS thingy to get composition benefits”. This is a methodology, a cookie cutter solution, i.e. no critical thinking involved. It sounds like that dumb bag of tricks metaphor everyone spouts about angrily on forums. Bad engineers rely on their bag of tricks to solve problems without thinking. Bad engineers implement acronyms simply to get “benefits”. Bad engineers waste time making up idealized problems only to spend energy solving them, usually for the sole purpose of measuring e-peen.

ECS provides a path to lock-free parallelized processing of your model. You know ahead of time what components each system reads from and writes to. With that knowledge alone, you can automate the parallelization of your systems. This helps solve the problem of finishing processing in under 16ms.

Depends on the definition of ECS, which constantly changes and nobody really seems to be an authority on the subject. Lock free algorithms, or good multi-threaded algorithms in general can be implemented without ECS. Multi-threading requires separation of data and problems, neither of which are exclusive to ECS. Anyone can implement a game without knowledge of ECS and include threads in an efficient manner.

Additionally the part about “automate the parallelization of systems” part is very naive. I assume this is trying to talk about setting up some kind of dependency graph, and some sort of task system can be used to do work on a threadpool. Okay, that’s fine and dandy, but it isn’t necessarily a good thing. Dependency graphs and parallelization come with tradeoffs. For example code flow becomes completely lost in some data structure that represents the dependency graph. Suddenly a programmer cannot easily trace a callstack, or understand the flow of how state mutates over time without referring to dependency graph. This is a really big abstraction that comes with a heft abstraction cost.

I have never seen a single discussion of ECS that has actually shipped a game that talks about the tradeoffs involved here, and how to gauge tradeoffs against a certain game project.

But of course, assuming random people on the internet have actually shipped a product with an ECS is a ridiculous assumption; pretty much all articles I have personally seen on the internet regarding ECS’s were definitely not written by anyone with real experience.

Use an ECS because:

  1. Cache misses incurred by the more common architectures
  2. Ensuring composability, simultaneously erasing bogus dependencies between code and data

For point one: why not just solve the cache miss problem directly? If a particular game is actually having cache miss related performance problems, why not address that problem instead of relying on some goofy ECS-method to solve it for you?

Saying “solve cache misses” is just ridiculous. What cache misses? Where is this cache miss code and how is it specifically a problem? Oh wait, these cache misses must be yet another idealized problem born of e-peen measuring instead of actual experience or actual problems in an actual shipped game.

Any half-decent programmer will know how to avoid a cache miss. ECS is not needed to learn about caches.

Point two sounds interesting, but I just don’t know what it is saying. It sounds something like the point Erin from the comments made (should code belong in A or B), which I responded to elsewhere in this post.

Rigid classes rarely match game objects. Making them flexible can lead down paths of optional pointers (dynamic), or multiple inheritance (static), or just building massive “everything” objects. I prefer flexible and sparse representation of game objects (entities) defined by their properties.

This sounds nice I suppose, but I just don’t know exactly what half of these terms are. What is “rigid class”, what is “optional pointer”? I just cannot respond to this without delving into definitions of all these terms.

Updating by “object” easily has issues like entity B (during its update) accessing state of entity A (at frame n+1) and entity C (at frame n).

So? What is the problem here? For the sake of argument I will assume this point is trying to describe a point made by Erin in the comments, something along the lines of “where does this code belong, in A or B?”. This can be a pretty annoying problem and lead to a lot of jumping back and forth across pointer indirection. The jumping back and forth is a problem since it destroys code flow (the ability for an engineer to quickly grok the program’s path of execution).

This can be solved by separating the marriage of code and data, in effect splitting up the concept of “object” into data + code. Code operates on some memory. Code that operates on data A, and code that operates on data B can easily be placed into two categories. Data A and data B are similarly two different conceptual categories.

That’s fine. This can be done without the ECS acronym. These concepts are not ECS specific. A good engineer will not rely on following steps 1-10 of How to Create Your Very Own ECS in order to solve mangled code flow problems.

Lacking uniform representation means it’s hard to add pan-object features. One way is having everything inherit down to a base object, where you can add such things, but that is horrible. Components make this trivial: entities are just IDs to associate components to. So you can decide a “faction” is something with a name and relations which other things can be associated to. Done. Or if you want a debug-tag, define the component and attach it to things: in data or at runtime! No need to touch any other code or change any structs. Modular composition bliss.

This is describing the simple concept of composition. This is not “ECS” or “component based design”, or any other dumb acronym.

These concepts have existed for many decades.

Entity specification (data), and serialization… often a pain. Components make this a 1:1 correspondence: just load properties on an ID. Templating is easy: an entity instance can inherit from a template, and add overrides. Serialization just stores the overridden (local) components.

Honestly I don’t know what most of these terms really mean. I’m guessing the overall point is 1:1 correspondence of serialization to components is nice and straight-forward. OK sure, that’s a good point. But like all other points, why should anyone care about ECS? Defining some POD-structs and making good use of them is as old as my grandfather. Yes, clearly POD-structs are trivial to serialize, so if data within a program is well organized and sorted, then sure serialization should become much easier.

But this does not really require the use of some silly acronym. Any good engineer will already know this.

What all these points show is that online voices that argue in favor of ECS are simply very excited about following a methodology. The idea is if an engineer follows this acronym, or these set of standards, or these “rules” they will get some good benefits. The problem is this is a fantasy. Good code will never come from a cookbook of steps. It requires critical thinking, adaptation, and experience.


Binary Heaps in C

After conducting a short search for a decent implementation of a binary heap in C I ended up having to write one. Some requirements were: no dependencies, no dynamically allocated memory, no need to modify the source to extend to new types.

The first result found from a google search yielded this OK implementation. Unfortunately it makes use of allocation routines and has a couple annoying header dependencies. It also defines the macro CMP, a fairly commonly used macro. The idea is a good idea — if we define a modifiable macro the heap can be converted to a min or max heap as needed. However in practice this is not that useful; if we use min heap first, and later require max heap, we then still need to copy + paste out a new set of heap source files. Why not just make the two different sources up-front once and for all? It also typedefs the internal type, so use of the header would require alteration of the header. This triggers recompiles, and is a pain to maintain as new types of data are needed, perhaps copy and pasting new implementations around, modifying the build system as necessary to accommodate the new files, etc.

This implementation suffers almost identical downsides, and has no way to store auxiliary data at all. Heaps of only single integers are not very useful!

One nice strategy is it uses a void* and stride size so the heap could hold arbitrary pieces of POD memory. A small downside to this strategy is the requirement to constantly multiply stride with indices, which sort of clutters the source. Not really a big deal, just a nit-pick. I actually could not find a single implementation that takes this strategy.

In the end I decided to use an indirection trick: sort an array of struct { int key, int val }, where the val member is used as an index into an array of auxiliary data. The key integer is used for sorting, and the val integer is used to lookup data associated with a key. This implementation style can be re-used for any kind of data without C++ templates, and without requiring the user to mess with the source, ever.

And here’s the output:

Custom Physics Engine – Part 2: Manifold Generation


During the previous article in this Custom Physics Engine series we talked about impulse resolution. Though understanding the mathematics and physics presented there are important, not much could be put into practice without both collision detection and manifold generation.

Collision detection is a pretty widely documented area, and so I won’t go into too much detail on how to achieve collision detection in this article series. Instead I’ll focus on manifold generation, which is in my opinion much more difficult and less-documented compared to collision detection.

In general collision detection is useful for retrieving a boolean result of “are these two things colliding”. The usefulness of such a result ends when this collision needs to be resolved. This is where manifold generation comes in.

Manifold Generation – Summary

A manifold, in context of physics engines, is a small structure that contains data about the details of a collision between two objects. The two bodies are commonly referred to as A and B. Whenever referring to a “collision” as a system, A is usually the reference object, as in the problem is viewed from A’s orthonormal basis.

I am not actually sure why this structure is called “the manifold”, and I do not know anyone that actually knows. So don’t ask! Either way this structure should be passed around by reference or pointer to avoid unnecessary copies. I also pool all my manifolds, and intrusively link them in order to keep a list of active manifolds during a scene’s step (the term scene is defined in the previous article).

Manifold generation involves gathering three pieces of information:

  • Points of contact
  • Penetration depth
  • Vector of resolution, or collision normal

The points of contact are 2D points (or 3D for a 3D engine) that mark where one shape overlaps another. Usually these contact points are placed onto the vertex of one shape that resides within another.

Two points of contact for two boxes found intersecting.

Two points of contact for two boxes found intersecting.

The penetration depth is the depth of which the two shapes are intersecting. This is found using the Separating Axis Test (SAT). There are lots of resources around that talk about SAT, so I suggest googling for them. The penetration depth is defined as the axis of least penetration. In this case (assuming blue’s frame of reference and vertical as y axis) the y axis is the axis of least penetration.

The collision normal is used to describe in which direction to press both objects away from one another. The collision normal will be a face normal, and in this case it would be a normal pointing towards the brown box, where the normal corresponds to the blue box’s top face.

Generating these three pieces of information can be quite a challenge. Now lets view what a possible setup of the manifold structure might look like:

Note that there can be a variable amount of contact points. I suggest having a contactCount of zero signify “no collision”. I also suggest having only two possible points of contact for a 2D simulation, and to start just use a single possible point of contact. More than one point of contact isn’t necessary until advanced constraint resolution is used (a future article).

It is important to just keep an array of contacts within this data structure as to keep strong cache coherency. There’s no reason to dynamically allocate the array of contacts.

Circle to Circle

I’ll be covering how to gather manifold information for specialized cases of couple different types of shapes. Lets first go over circle to circle first. Here is what the definition of a circle would look like:

The first thing to do is to see if they are colliding or not. Again, throughout this article I’m going to mostly blaze through the collision detection and focus just on gathering important manifold information. Feel free to google or ask specific questions in the comments about collision detection.

The above code is really quite simple. The important thing to note is that our contact normal will always be a vector from A to B. In order to create a vector from one point to another you take your endpoint minus your starting pointer. In this case B’s position subtracted by A’s position. This results in a vector from A to B. This vector normalized will be the collision normal, or in other words the direction in which to resolve the collision.

It is important to note that no square root functions are called before the early out condition is checked. Most of the time your shapes are probably not colliding, and so there’s no reason to use a square rooted value.

The last tricky thing is to check if the two shapes are right on top of each other. Though this is unlikely in a dynamic environment, sometimes shapes can be placed directly upon one another through an editor. It is important to, in all collision detection functions, to handle all special cases, even if the handling is bad. Whatever you do just make sure you are consistent. I just chose a random vector to resolve in the direction of.

The nice thing about cirlce to circle collision is that only one collision point is really needed. Just  be sure to be consistent in how you choose your collision point in order to reduce simulation jitter.


Collision detection between two AABBs is a bit more complicated than two circles but still quite simple. The idea is to make use of min and max. Lets assume we’re storing our AABBs in a structure like so:

This allows a very simple algorithm to find points of contact. For convex polygons that are not axis aligned often times Sutherland-Hodgman clipping will need to be performed. In our case we can implicitly deduce our contact area due to the nature of AABBs.

First determine if the two AABBs are overlaping at all. When an axis of least penetration is found the collision area can then be deduced.


AABB intersection with intersection points and area of intersection. Min the maxes and max the mins.

The idea is to perform the SAT while storing each overlap value. The least overlap is your axis of separation. To get the contact area and two points of intersection you can min your maxes and max your mins (I’m talking about the extents of each AABB).

This sounds silly, but that’s how you do it. I suggest drawing it out. Here’s how to find your collision area given by two points (intersection points of the AABBs):

The last bit of info required would be to record the penetration and contact normal. Penetration is your axis of least overlap, so after you’ve found your axis of least overlap you can just assign a vector value as your contact normal. If you have found the axis of least penetration to be on the x axis, you want to point towards object B along the x axis. If the y axis is the axis of least penetration, you want to point towards object B along the y axis.

That’s all there is to the AABB to AABB intersection. Be sure to properly record the number of contacts found (if any), and if neither axis x or y are actually overlapping, then that means there is no intersection.

AABB to Circle

I will leave AABB to Circle collision an exercise for the reader, though I will quickly provide an explanation paragraph behind the idea. What needs to be done is to first determine if the shapes are overlapping at all. I have a previous post on my blog that explains the Circle to AABB intersection, and more information about such an intersection check can be found around the internet.

Lets assume A is the AABB and Circle is B and we have a collision. The collision normal will again be the vector from A to B, except slightly modified. The early out condition involves finding the closest point on the AABB to the Circle. The collision normal is the translation vector from A to B subtracted by a vector to the closest point on the AABB. This will represent a vector from the circle’s center to the closest point.

The contact point will be residing on the circle’s radius in the direction of the contact normal. This should be easy to perform if you understood the Circle vs Circle collision detailed above. The penetration depth will be the length of the collision normal before it is normalized.

There is one special case that must be properly handled: if the center of the circle is within the AABB. This is quite simple to handle; clamp the center of the circle to the edge of the AABB along the edge closest to the circle’s center. Then flip the collision normal (so it points away from the AABB instead of to the center) and normalize it.

OBBs and Orientation

Now lets start talking about adding in some manifold generation for some more complex oriented shapes! The first thing that must be learned is how to properly change from one orthonomormal basis to another (that is shift from one frame of reference to another). This will vastly simplify collision detection involving OBB shapes.

Changing a basis involves taking the orientation and translation of an OBB and applying the inverse of these two it to another shape. In this way you can then treat the OBB as an AABB as long as you are still referring to your transformed object. Lets go over this in some more detail with some pictures.

Here is what an OBB is like in the OBB’s frame of reference (left), and the OBB in model space (right).


Note: origin is (0, 0) in model and reference space.

The important thing to realize is that in order to place an object into an OBB’s frame of reference it must have inverse translation and rotation of the OBB’s translation and rotation applied to it. This takes the OBB’s position to the origin, and the OBB can then be treated as an AABB.

If the inverse rotation of the OBB, in this case -45 degrees, is applied to both the OBB and an object near it, this is what happens:

Change of basis.

Change of basis going from left to right.

As you can visually see, once the circle has been inversely transformed into the OBB’s frame of reference the OBB can be viewed as a simple AABB centered at the origin. The extents of the OBB can be used to mimic an AABB, and the OBB to Circle intersection and manifold generation can be treated identically to the AABB to Circle intersection, if a proper inverse transformation is performed. Again, this inverse transformation is called a “change of basis”. It means you transform the Circle into the OBB’s frame of reference.

Mat2 Rotation Matrices

Lets go over rotations in 2D extremely quickly. I won’t go over derivation here for brevity’s sake (as you will see, brevity is a close friend of mine in these Physics Engine articles haha). Instead I will just show you how to create your own 2 by 2 matrix and use it as apart of whatever math library you currently have (which you should hand-code yourself!). Really the only useful thing about having a 2 by 2 matrix is to do rotation operations.

For those using C++ you’re in luck for I know how to use unions.

The above is a proper usage of the unnamed union trick. The elements of the 2 by 2 array can be accessed as if they are a two dimensional array, single dimensional array, or separate floating point values. Additionally you can stick two vectors into your union for column or row access, if you so wish.

I want to briefly hit all the important methods without writing an entire book, so get ready for code snippets to be thrown at you.

The first thing you should realize is that the default constructor does nothing. This is important. Often times you will create a matrix only to briefly thereafter assign some value to it. Do not default construct your matrix to zero values as an optimization. Force the user to use the Set function like so: mat.Set( 0, 0, 0, 0 ).

The interesting functions here are the rotation constructor and SetRotation functions. Each one computes cosine and sine from a given radian value and caches the result. Caching the results prevents unneeded additional calls to cosine and sine. Note the format in which sine and cosine are stored. It is also important to realize that m00 and m10 represent a transformation of the x axis, and m01 and m11 represent a transformation of the y axis. Each of these two are columns, both columns can be viewed as unit vectors.

Multiplying a Mat2 with a vector will rotate the vector’s x and y components around the origin. It is important to realize where your origin is before you apply a rotation. If you want to jump into an OBB’s frame of reference you must do an inverse translation to set the OBB as the origin. This allows you to then apply the OBB’s inverse rotation (perhaps with the inverse operator of your Mat2, see Box2D if you don’t know how to inverse a Mat2) and rotate about the origin (which is about the OBB).

OBB Representation

Every oriented shape will need to store its orientation somehow. I suggest the following:

The OBB should store its current orientation in both a single floating point value along with a matrix to represent that radian value as a rotation matrix. When you need to rotate the OBB during integration, you can just add or subtract a little bit to the radians value, and then call u.SetRotate( radians ) to update the rotation matrix. This makes use of a simple and organized way to cache results from sine and cosine calls, and minimizes the amount of calls to these functions that you require.


Now lets talk about the big one. How in the world can you see if two OBBs intersect? Both boxes are oriented, so the problem would involve a lot of complex calculations involving trigonometric computations.

Lets make things easier: transform one OBB into the other OBB’s frame of reference, and treat the transformed object as an AABB. Now the problem becomes much simpler.

First perform a separating axis check and find the axis of least penetration. In order to perform the SAT you must find a projected radius onto the axis you are currently testing.



If the sums of the projected radii from both OBBs are larger than the distance between the center of each OBB (along your respective axis), then they are intersecting on that axis. This method works for all convex polygons in 2D.

The way I perform this check is by taking the translation vector from A to B, lets call it T. Then I rotate T into A’s frame of reference and subtract A’s extent vector, and subtract that entire result by   B’s extent vector rotated into A’s frame of reference. This results in a vector holding the overlap along the x and y axis for object A. Due to symmetry only two axes need to be tested. The same operation can be done for object B to find B’s separating axis. If no separating axis is found the shapes are intersecting.

This is where things start to get difficult. Since we just performed an early out test, now we need to find the axis of least separation. However you cannot just blindly perform floating point comparisons due to floating point error during rotation of the translation vector from A to B. You must bias your comparisons to favor one axis over another in a consistent manner. This is important for stability!



In the above picture, which set of contact points/normal is the “correct” one? Each axis of separation is very close to the same distance, so floating point error could account for which axis is chosen. If your simulation flops between the two you’ll end up with strange jitter and your simulation will be less believable. The solution is to just favor one axis over another using an error EPSILON value.

Here’s a function (you’re lucky I just gave it to you!) that will check which value is greater than the other. Each value is modified slightly, and a small bias is fed into the comparison based off of how large each value passed in is. This can be used to favor one axis of separation over another, until a threshold larger than floating point error is breached.

Carefully record which direction your normal goes (from A to B) depending on what axis is separating. This is a similar operation to the one found in AABB to AABB as seen above.

Once an axis is found two line segments must be identified: the reference face and incident face. The reference face corresponds to your normal you recorded. So the reference face is the face that corresponds to your axis of least penetration. If your axis of least penetration is on A, then your  reference face is on A. The incident face is the one with the information we need to generate our manifold.

The incident face it the face on the other object that the reference face has hit. We must compute it. All that needs be done is find out which face is most facing the normal (has the most negative dot product). Looping through all faces performing dot products is the simplest way to achieve this. A more optimized algorithm is the follow the sign of the flipped normal. Your normal will have an x and y component (and z in 3D), and each component will be positive or negative. This gives you a total of four combinations of positive or negative.

First check to see if the normal is point more towards the x axis or y axis (after you transform it into the incident face’s frame of reference). Then check the sign of the y axis. You know know to which face your normal is most pointing.

Think of it this way: if the normal is pointing more towards the x axis (absolute value of n.x is greater than absolute value of n.y), then you are going to be pointing more towards either the left or right face on your OBB (in the OBB’s frame of reference). All that you need to know from there, is if you’re pointing left or right on the x axis, which is denoted by the sign of the normal.

Your incident face segment endpoints are the extents of the x axis of the OBB, which are aligned with the x axis in the OBB’s frame of reference. You can then take the OBB x half-extent and use the positive and negative version of it to form two points: (-x, 0) and (x, 0) where x is the half-extent of the OBB on its x axis. Rotate these points with the OBB’s rotation matrix, and then translate them into world space with the OBB’s position vector, and you now have your endpoints for your incident face in world space.

All that is left is the clip the incident face to the reference face side planes using Sutherland-Hodgman clipping. Here’s a diagram showing this:


This is a fairly difficult thing to do unless you know your math fairly well. Each side plane can be simply computed once you know your reference normal. Here’s the process for getting two side planes:

The above code was hand-derived by myself, but you’ll find something very similar within Box2D Lite (where I originally learned the math from). If this is confusing to you I suggest reading up on the various types of representations of lines in 2D.

Here’s another diagram I just found in my own source files:

You might have noticed I’m storing the c value. This is important as the c value stored within the Line structure can be used to find the distance of a point to the line like so:

I’ll be nice and provide you my clipping notes I created for Sutherland-Hodgman clipping :)

However since we are clipping a line to a single plane the algorithm will need to be slightly modified. In some cases you need to push extra points, since Sutherland-Hodgman assumes to be clipping two polygons in a loop. See Box2D Lite for a good implementation of the incident to line clipping. I however use my own hand-derived algorithm that works in a very similar way. I’ll share some pseudo code for clipping a segment to a Line, assuming the Line is in the format of offset c and a normal n:

After clipping to the side planes floating point error must be accounted for. If our clipping process went as expected we must have two resulting points. If we end up with less than two output points that means floating point error has screwed us over, and we must treat the entire process as if the two OBBs are non-intersecting.

Assuming we have two points output from our clipping we then need to only consider points that are behind the reference face. Use the DistanceToLine function I provided above to check this. Record each point behind the reference face as a contact point!


If you’ve been reading along diligently and deriving your own understanding, you should be able to figure out how to perform such a check. This test is the exact same as OBB to OBB, except with less rotating from one basis to another. You can recall that the OBB to OBB test rotated one OBB into the frame of the other completely, turning the test into an AABB to OBB test. The same thing can be done here without the preliminary change of basis, and perhaps some other small optimizations. I will leave the section out as a challenge for the reader.


I hope you have a solid understanding of various types of hand-crafted intersection tests! Feel free to email me or comment here with questions or comments. I refer you to Box2D Lite’s source code as a reference to the material in this article, along with all of Erin Catto’s GDC slides, especially the one from 2007. The next article in this series is likely to talk about creating a simple O(N^2) broadphase, and how to cull out duplicate contact pairs. Until then this article along with the previous one is more than enough to create a simple physics engine. Be aware that with impulse resolution only one point of contact should be necessary. This article goes over grabbing multiple contact points, and this is because more advanced collision resolution techniques can make use of more contact points.

Batch Files: Tips and Tricks

I spent a lot of time one weekend creating a utility I call AutoGCC. GCC is a compiler that comes with Cygwin for Windows. When running this compiler and finishing assignments there are a whole lot of very annoying commands and actions to complete, that are highly repetitive. Perfect situation for constructing a utility! Here is a link to the program, which I update as I find bugs or think of additional features:

There are a few tips and tricks I learned about batch programming that I’d like to share, and to document for my own future reference.

You can use parentheses to link multiple lines so they are treated as a single line. This allows you to make nicely formatted if statements, like so: 

if not exist %CD%\error_files\ (
mkdir %CD%\error_files\

The next trick that took me forever to understand the syntax of, is string manipulation. Take the variable %PATH% for example. It is a string. However what if I want to modify a portion of this string to my own liking? Look at the following: %PATH:;c:\AutoGCC=%


That code actually searches for a matching sequence of characters of the sequence “;c:\folder“, and will set that portion of the string %PATH% to nothing. This is basically deleting an element from the %PATH% variable. You could also have done:


Which would change the element ;c:\folder to ;c:\LOL.

There is also a nice way to edit the path (or any registry key) of a user’s machine by editing the registry key for the system path! This will however require a restart to take effect. Here is the syntax I used in AutoGCC to modify my system’s path (type reg add /? for information on the reg add command):

reg add "HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Environment" /v Path /t REG_EXPAND_SZ /d "%PATH:;c:\AutoGCC=%" /f

You can also use the CALL command to launch a specific portion of your code as if it were its own batch file. This is really useful for making functions within your batch file. Take a look at the following code:

if exist %file% (

  if %var% NEQ 2 (
    call FUNCTION_1

  do stuff...
  goto EOF

The above code will go to the line labeled with FUNCTION_1 if %file% exists, and then return just below the goto FUNCTION_1 statement once the line goto EOF is reached. goto EOF sends the control to the end of the file, though since FUNCTION_1 was called by a call command, it is as if it is an emulated batch script of its own, and the emulation will close, not the original environment that called it.

These few things are extremely useful for me when working on my batch scripts, and hopefully will be for you as well!

TL Secret

First person to quote this sentence (back on TL at the Lessons thread), and the one directly after it, will win a free one hour lesson! Additionally whoever is the 100th poster in this thread (cmon, we can get that many posts!) will win a free one hour lesson as well!

Edit: This contest is long over :P

Linear Algebra: Rotation Matrices

A day or so ago I looked up the topic of using rotation matrices to modify the angle of a vector by an arbitrary amount. I visited none other than the WikiPedia page for rotation matrices.

The idea behind rotation matrices is to solve for the x and y coordinates after the rotation by using two separate equations. Here is a rotation matrix for 2 dimensions:

That matrix will rotate the 2D plane counterclockwise by angle θ. Using this knowledge, the new x and y coordinates for a point on a rotated vector would be solved for using these two equations:

Very simple! So here is an example problem: Say you have vector ABAB = (2, 3).

To solve for a rotated counterclockwise version of this vector you input your desired angle of rotation in for θ, 2 for x and 3 for y. This rotation matrix rotates a 2D plane clockwise:

Rotation in 3 dimensions is basically the same process. All that is different is the z coordinate and modified rotation matrices. You simply solve for each individual coordinate with the rotation matrices for the z, x, and y axis. These three rotation matrices and more are found at the WikiPedia page for rotation matrices.

Linear Algebra: Basics

Today with my math teacher I went over some of the basics of linear algebra. The two topics I went over included solving for the intersection of two lines in 3D space, and finding the angle between two vectors. The rest of the mathematical topics I’ll try to cover in the near future include: finding a vector orthogonal to two other vectors in 3D space; finding the time T during the intersection of ray between a circle of sphere; using a matrix to rotate a vector by an arbitrary angle. I found these topics, as well as an entire survival guide for DigiPen students, at this site here. Hopefully by explaining the math I went over today I can solidify my own understanding.

First off: solving for the intersection of two lines in 3D space. I know how to do this in matrix form, as it is easiest and simplest in matrix form. Here is a matrix of the x y and z coordinates for a point in 3D space: [x, y, z]. An equation for a vector in 3D space consists of a point on a line, and the direction vector of a line multiplied by a magnitude. This magnitude represents the length of the line, whereas the timestep, or t, represents a constant.

Here is an equation for a line in 3D space: L1 = [2, 4, -1] + t[-1, 2, 3]. I just chose these numbers at random because they really don’t matter. The first matrix is a point on the line. The second matrix is the rise, run, and the z equivalent to rise or run (just like two dimensional lines). You need a point on the line, otherwise the direction vector (rise//run//zrun) could sit anywhere in space. Without the direction vector for your line, you line could be facing in any direction as long as it sits on your point. The t is the magnitude of the direction vector, and these could be used as something to define the distance something traveled over time of t. t is just a constant.

In order to solve for the intersection of two lines, I’ll quickly show the process with some variables. Here are the two lines: L1 = [a, b, c] + t[d, e, f]; L2 = [g, h, i] + r[j, k, l]. Now since these two equations each represent a line, the point of intersection is going to be a point that can be used to satisfy either of the equations. You just set both of the equations equal to each other, one variable at a time (x, y and z) and solve for each one. To solve for the x coordinate of the intersection you would use a + td = g + rj. You do this for variable a through c corresponding to d through f. Then using substitution, if need be, you can solve for t and then solve for the rest of the variables, thus getting a final matrix of [x, y, z].

The second thing I learned today was finding the angle between two vectors. Luckily, you only need the direction vectors of the line equations. This makes sense because no matter where the lines are in space, they will have the same angle of intersection as long as the direction of the lines face in stays constant. To do this, you use the equation of:

Theta, the zero thingy on the left, is the angle you are solving for. a and b both represent matrices that represent direction vectors, like the direction vectors in the line equations earlier in this post. Arccos is cos^-1. The a and b on the top half of the right side of the equation is pronounced as a dot b. The dot is the operator for the dot product. The dot product is used to find the scalar projection of a onto b, and vise versa. I honestly don’t fully understand what exactly the dot product does yet (read last paragraph, I understand it now), but for now I just need it for equations like this one, and it returns a scalar value. To use the dot product on two 1×3 matrices, you would do this: [a, b, c] dot [d, e, f] = ((a x d) + (b x e) + (c x f). The |a| represents the magnitude of a, which is the length of a. If the direction vector of a line is representing velocity vectors, then the magnitude of a would be the speed of the direction vector. To find the magnitude of a 1×3 matrix you do this: |M| = |[a, b, c]| = sqrt(a^(2) + b^(2) + c^(2)). Does that look familiar? It should; it’s basically the Pythagorean Theorem in 3D. It takes the rise, run, and zrun and converts the three into a length value, just like the Pythagorean Theorem does with two lines, except this is with three.

Now once you find a dot b, and magnitude of a times magnitude of b, you then divide a dot by magnitude of a times magnitude of b, then arccos that value which results in your angle!

Dot product explained: Okay! I so I did a bit of research and asked a couple people some questions and now I understand what the value returned by the dot product does. It projects a vector onto another vector and returns the length. A dot B also equals B dot A, which makes sense because multiplication itself is commutative, and the formula for the dot product is just multiplication of three values. Here is a picture to help visualize this:

The blue line would be the dot product of A and B. This is very useful for collision in programming, and transforming vectors from one grid space to another. Here is a good example of using the dot product for 2D collision detection of convex polygons:

The red and blue are both one dimensional projections of the objects onto a line. The dotted line is perpendicular one side length of one of the objects. In the diagram is looks like it is perpendicular to both, which is fine, but it is important to understand that the dotted line is normal to one of the sides of one of the polygons. Once you find a dotted line that is perpendicular to a side of one of the shapes, you use the dot product on a two dimensional matrix and project both of the shapes onto the normal to the dotted line. You can then compare the two projections to see if they overlap. If the two projections overlap, you then try the entire process over for a different side of one of the polygons. Once you try this algorithm over each of the sides of each object and no collision vector was detected (a collision vector would be the length of overlap formed by overlapping projections) in at least one of the iterations, then the two objects are not colliding with one another.