# Inertia Tensor – Capsule

A capsule is defined by two hemispheres and a cylinder. Capsules are pretty useful geometry for collision detection since they are easily defined mathematically (implicitly defined). This means that minimal information can be stored in memory to represent a capsule in 3D space. Given a radius value and position information only a few floating point values are necessary. Algorithms like GJK also work well with implicitly defined shapes since support mappings can be computed in constant time.

Given the inertia tensor of a cylinder and sphere the inertia tensor of a capsule can be calculated with the help of the parallel axis theorem. The parallel axis theorem can shift the origin that an inertia tensor is defined relative to, given just a translation vector. We care about the tensor form of the parallel axis theorem since it is probably easiest to understand from a computer science perspective:

J is the final transformed inertia. I is the initial inertia tensor. m is the mass of the object in question. R is the translation vector to shift with. E3 is the standard basis, or the identity matrix. The cross symbol is the outer product (see next paragraph).

Assuming readers are familiar with the dot product and outer product, computing the change in an inertia tensor isn’t too difficult.

The center of mass of a hemisphere is 3/8 * radius above the center of the spherical base. Knowing this and understanding the parallel axis theorem, the inertia tensor of one of the hemispheres can be easily calculated. My own derivation brought me the conclusion that the inertia tensor of capsule is:

Please note the final inertia tensor assumes the capsule is oriented such that the cylinder is aligned along the y axis. A rotation matrix R can be used to transform the final inertia tensor I into world space like so: R * I * R^T. To learn why this form is used read this post.

Special thanks to Dirk Gregorius for emailing me with an error in the original draft of this post! He kindly provided the public with a nice document showing his derivation of the inertia tensor of a capsule.

Following suit to identify the source of my error, I ended up writing my own derivation down in PDF format.

# Deriving OBB to OBB Intersection and Manifold Generation

The Oriented Bounding Box (OBB) is commonly used all over the place in terms of simulation and game logic. Although often times OBBs aren’t actually bounding anything, and really just represent an implicitly defined box.

OBBs commonly come in 2D and 3D form, though there aren’t all that many resources on the internet describing how to derive the 3D case. The ability to write robust collision detection, and more importantly manifold generation (contact data needed to resolve the collision), relies on a good understanding of linear algebra and geometry.

Readers will need to know about affine and linear transformations, and have intuition about the dot and cross products. To learn these preliminaries I highly recommend the book “Essential Mathematics” by Jim Van Verth. I will also be assuming readers are familiar with the Separating Axis Theorem (SAT). If you aren’t familiar with SAT please do a short preliminary research on this topic.

The fastest test I’m aware of for detection collision between two OBBs, in either 2 or 3 dimensions, makes use of the Separating Axis Theorem. In both cases it is beneficial to transform one OBB into the space of the other in order to simplify particular calculations.

Usually an OBB can be defined something like this in C++:

## 6 Face Axes

As with collision detection against two AABBs, OBBs have x y and z axes that correspond to three different potential axes of separation. Since an OBB has local oriented axes they will most of the time not line up with the standard Euclidean basis. To simplify the problem of projecting OBB $$A$$ onto the the axes of OBB $$B$$, all computations ought to be done within the space of $$A$$. This is just a convention, and the entire derivation could have been done in the space of $$B$$, or even in world space (world space would have more complex, and slow, computations).

The translation vector $$t$$ is defined as $$t = R_{a}^{T} * (C_b – C_a)$$, where $$C_i$$ denotes the center of an OBB, and $$R$$ denotes the rotation of an OBB. $$t$$ points from $$A$$’s center to $$B$$’s center, within the space of $$A$$. The relative rotation matrix to transform from B’s frame to A’s frame is defined as $$C = R_{a}^{T} * R_b$$. $$C$$ can be used to transform $$B$$’s local axes into the frame of $$A$$. In $$A$$’s own frame it’s local axes line up with the standard Euclidean basis, and can be directly computed with the half width extents $$e_a$$.

To calculate the separation $$s$$ along the local axis $$l$$ from an OBB examine the following diagram and matching equation:

s = |t \cdot l| – (|a \cdot l| + |(C * b) \cdot l|)
\label{eq1}

Due to symmetry the absolute value of a projection can be used in order to gather a value that can be used to represent a projection along a given direction. In our case we aren’t really interested in the sign of the projection and always want the projection towards the other OBB. I understand that newer programmers may have some trouble translating math into code, so here’s an example of eq.\eqref{eq1} as C++:

## 9 Edge Axes

In 2D eq.\eqref{eq1} is all that is needed to detect overlap upon any given possible axis of separation. In 3D a new topological feature arises on the surface of geometry: edges. Because edges form entirely separate Voronoi regions from face surfaces they can also provide possible axes of separation. With OBBs it is fast to test all possible unique edge axes of separation.

Given two edges upon two OBBs in 3D space the vector perpendicular to both would be the possible axis of separation, and can be computed with a cross product. A possible 9 axes of separation arise from crossing each local axis of $$A$$ with the each local axis of $$B$$.

Since $$C$$ is composed of a concatenation between $$R_{a}^{T}$$ and $$R_b$$, the rows of this matrix can be thought of as the local axes of $$B$$ in the space of $$A$$. When a cross product between two axes is required to test separation, the axis can be pulled directly out of $$C$$. This is much faster than computing a cross product in world space between two arbitrary axis vectors.

Lets derive a cross product between $$A$$’s x axis and $$B$$’s z axis, all within the space of $$A$$. In $$A$$’s space $$A$$’s x axis is $$\begin{Bmatrix}1 & 0 & 0\end{Bmatrix}$$. In $$A$$’s space $$B$$’s z axis is $$\begin{Bmatrix}C_{20} & C_{21} & C_{22}\end{Bmatrix}$$. The axis $$l$$ is just the cross product of these two axes. Since $$A$$’s x axis is composed largely of the number $$0$$ elements of $$C$$ can be used to directly create the result of the cross product, without actually running a cross product function. We arrive to the conclusion that for our current case $$l = \begin{Bmatrix}0 & -C_{22} & C_{21}\end{Bmatrix}$$. This particular $$l$$ axis can be used in eq.\eqref{eq1} to solve for overlap.

Since the matrix $$C$$ is used to derive cross product results the sign of the elements will be dependent on the orientation of the two OBBs. Additionally, the cross product will negate on of the terms in the final axis. Since eq.\eqref{eq1} requires only absolute values it may be a good idea to precompute abs($$C$$), which takes the absolute value element by element. This computation can be interleaved between the different axis checks for the first 6 checks, only computed as needed. This may let an early out prevent needless computation.

The same idea can be used to derive this axis in the space of $$B$$ by crossing $$B$$’s local z axis with $$A$$’s local x axis transformed to the space of $$B$$. To do this it may be easy to use $$C^T$$, which represents $$R_{b}^{T} * A$$. More information about this can be found in my previous article about the properties of transposing a matrix concatenation.

Eventually all axes can be derived on paper and translated into code.

## Manifold Generation

A collision manifold would be a small bit of information about the nature of a collision. This information is usually used to resolve the collision during physical simulation. Once an axis of least penetration is found the manifold must be generated.

For SAT manifold generation is quite straight forward. The entirety of the details are expressed very well by Dirk Gregorius’ great online tutorial from GDC 2013. The slides are openly available for free, and currently hosted here generously by Erin Catto. Readers are encouraged to familiarize themselves with the idea of reference and incident faces, as well as the idea of reference face side planes. For a 2D analogue I have provided my own slides on this topic.

There are only two different cases of collision that should be handled: edge to edge and something to face. Vertex to edge, vertex to face, and vertex to vertex resolutions are ill-defined. On top of this numerical imprecision makes detecting such cases difficult. It makes more sense in every way to skip these three cases and stick to the original two.

## Something to Face

Something has hit the face of one of the OBBs. It can be anywhere from 1 to 4 vertices from the face of the opposing OBB. The entire intersecting volume between two OBBs does not need to be considered due to the nature of SAT and the axis of minimum penetration. Only two faces from each OBB need be considered for this case, and are by convention named the reference and incident faces, one from each OBB.

Finding the incident face given an axis of separation can be found in constant time with implicit geometry like an OBB; the problem reduces to that of finding a vector that most faces a given vector, or in other words the most negative dot product.

Once the incident face is identified it is clipped to the side planes of the reference face. There may be some tricks involving basis transformations here to simplify the clipping routine. In order to perform clipping the incident face vertices must be calculated. Given a face normal from an OBB all four face vertices can be computed with the normal and the half width extents of the OBB. In similar fashion the side planes of the reference face can also be directly computed.

## Edge to Edge

The edge to edge case requires a clever constant-time computation of a supporting edge. A supporting edge is the edge most pointed to by a given axis of separation. The computation of this edge may be tricky to derive yourself, though there do exist online references. I suggest checking out Open Dynamics Engine (ODE) by Russell Smith, or the Cyclone Engine by Ian Millington for information on how to compute a supporting edge.

Once both supporting edges are calculated the closest two points between each edge can be calculated and used as a single contact point of which to resolve upon.

## Numeric Stability

In all uses of OBB to OBB intersection cross products with nearly parallel vectors pose a problem. The best solution is to detect when any two axes from each OBB are parallel and skip all nine cross product axes. It can be visually shown that no false positives will come from skipping the cross product axes in this case.

A simpler solution, as proposed by Gottschalk ’00 in his paper “Collision Queries using Oriented Bounding Boxes”, which made its way into Ericson’s book “Real-Time Collision Detection”, is to bias all elements of $$C$$ with a small epsilon to drive near-zero axes away from a degenerate case. For other axes the epsilon value should be tuned such that it doesn’t have much an impact upon results.

If using this collision routine to resolve collision during physical simulation certain tuning my be appropriate. If an iterative resolver is used it may be preferred to have slightly more consistent manifold generation, as opposed to exact manifold generation. For details I suggest researching works by Erin Catto and comments made by others around the Bullet Forums.

## AABB to OBB

Hopefully by now readers realize that this whole time we’ve been simplifying the problem of OBB to OBB to the problem of AABB to OBB. When an actual AABB needs to be collided against an OBB simplifications can occur. Many simplifications revolve around the non-necessity to transform things into the basis of the AABB, since world coordinates should suffice.

## One Axis Aligned (or Two?)

Some games have OBBs that only orient on some of their axes instead of all of them. One can immediately realize that no cross product checks need to be performed in any case, if two OBBs have orientation locked on the same axis. Other optimizations will occur as well, making some axis tests reduced to that of AABB overlap testing.

## Conclusion

Collision detection is difficult. Writing robust collision detection code requires a good mathematical foundation as well as the ability to write efficient code. This slight mixture of fields of study may be difficult to learn all at once, but it all gets easier with time and practice and the rewards can be very fulfilling.

I’d like to thank my friend Nathan Carlson for teaching much of this information to me, and for his insightful discussions.

# Buoyancy, Rigid Bodies and Water Surface

I’ve spent the last couple weeks researching eigenvalue decomposition and solving cubic polynomials in order to simulate a liquid surface and polyhedral buoyancy! I gave a lecture about this topic at my university and hope the slides can be of interest or help to someone. I’ve attached a couple demo GIFs to the bottom of this post.

# Dynamically Slicing Shapes

Just this morning I finished up a small open source project, along with a huge article at gamedev.tutsplus. The open source project is an implementation of Sutherland-Hodgman clipping which can be used in many advanced shape and mesh manipulation techniques.

Gif of the open source project in action. Slicing a quad interactively.

# Orientation Constraint Derivation

I decided today that I wanted to create an orientation constraint. This will let me build more complex joints that make use of constraining relative angles between rigid bodies. I learned the details of implementing such a constraint from Bart Barenbrug’s thesis.

The position constraint is quite simple. The idea is to take a basis of vectors (u, v, w) into local space of two rigid bodies A and B. This yields xa, xb, ya, yb, za and zb. Each frame the local basis vectors are transformed into world space. These two world space sets should be aligned at all times in order to satisfy the constraint.

The dot product of xa and yb provides information about the error along u. The dot product of xa and zb provides information about the error along v. The dot product of ya and zb provides information about the error along w. Knowing this four vectors can be used instead of six, as the only ones needed are: xa, ya, yb, zb.

C = \theta_a – \theta_b = 0
\label{eq1}

Deriving the equation yields:

\dot{C} = \omega_a – \omega_b = 0
\label{eq2}

Resulting in the following Jacobian:

J = \begin{bmatrix} 0 & -t & 0 & t \end{bmatrix}
\label{eq3}

Where t is an axis of rotation within a basis matrix. Three of these constraints can be used to restrict each axis of rotation.

The interesting part of this constraint, for myself at least, was writing down on paper the mass matrix used to solve all three constraints simultaneously. All three constraints meaning restricting the relative rotation between two rigid bodies, aka the orientation joint (or angle joint). As a quick refresher the mass matrix is within the following equation:

\lambda = \frac{-(JV + B)}{J * M^{-1} * J^T}
\label{eq4}

The bottom portion J * M-1 * JT would be the constraint mass. This is actually a tensor and must be transformed from constraint space by using the tensor transformation formula. This is the same sort of transformation used on inertia tensors to transform them from model to world space.

Once this mass matrix is calculated it can be inverted and used to solve for lambda, where lambda will satisfy all constraint equations used in the mass matrix.

M^{-1} = \begin{bmatrix}
M^{-1}_a & 0 & 0 & 0 \\
0 & I^{-1}_a & 0 & 0 \\
0 & 0 & M^{-1}_b & 0 \\
0 & 0 & 0 & I^{-1}_b
\end{bmatrix}
\label{eq5}

Knowing this we can setup our constraint mass like so:

\begin{bmatrix}
0 & -t_1 & 0 & t_1 \\
0 & -t_2 & 0 & t_2 \\
0 & -t_3 & 0 & t_3 \\
\end{bmatrix}
\begin{bmatrix}
M^{-1}_a & 0 & 0 & 0 \\
0 & I^{-1}_a & 0 & 0 \\
0 & 0 & M^{-1}_b & 0 \\
0 & 0 & 0 & I^{-1}_b
\end{bmatrix}
\begin{bmatrix}
0 & 0 & 0 \\
-t_1 & -t_2 & -t_3 \\
0 & 0 & 0 \\
t_1 & t_2 & t_3 \\
\end{bmatrix}
\label{eq6}

Multiplying the two equations on the left gives:

\begin{bmatrix}
0 & I^{-1}_a * -t_1 & 0 & I^{-1}_b * t_1 \\
0 & I^{-1}_a * -t_2 & 0 & I^{-1}_b * t_2 \\
0 & I^{-1}_a * -t_3 & 0 & I^{-1}_b * t_3 \\
\end{bmatrix}
\begin{bmatrix}
0 & 0 & 0 \\
-t_1 & -t_2 & -t_3 \\
0 & 0 & 0 \\
t_1 & t_2 & t_3 \\
\end{bmatrix}
\label{eq7}

Multiplying the last two equations leaves the final constraint mass matrix:

\begin{bmatrix}
\scriptsize -t_1 \cdot I^{-1}_a -t_1 + t_1 \cdot I^{-1}_b t_1 &
\scriptsize -t_2 \cdot I^{-1}_a -t_1 + t_2 \cdot I^{-1}_b t_1 &
\scriptsize -t_3 \cdot I^{-1}_a -t_1 + t_3 \cdot I^{-1}_b t_1 \\
\scriptsize -t_1 \cdot I^{-1}_a -t_2 + t_1 \cdot I^{-1}_b t_2 &
\scriptsize -t_2 \cdot I^{-1}_a -t_2 + t_2 \cdot I^{-1}_b t_2 &
\scriptsize -t_3 \cdot I^{-1}_a -t_2 + t_3 \cdot I^{-1}_b t_2 \\
\scriptsize -t_1 \cdot I^{-1}_a -t_3 + t_1 \cdot I^{-1}_b t_3 &
\scriptsize -t_2 \cdot I^{-1}_a -t_3 + t_2 \cdot I^{-1}_b t_3 &
\scriptsize -t_3 \cdot I^{-1}_a -t_3 + t_3 \cdot I^{-1}_b t_3 \\
\end{bmatrix}
\label{eq8}

I apologize for the tiny text! Using this mass matrix like the following can be done to implement an orientation constraint to restrict the relative rotation of two rigid bodies:

# Protip: How to be a Physics Dev – Debug Rendering

Today I figured out the single most important tip I could ever give anyone in regards to being a competent physics engine developer. I will share this tip right here on my humble blog:

Your skills as a physics developer will grow at a rate proportional to how good your debug draw information is. I’d actually argue that your skills as a physics developer are proportional to how good your debug draw is.

I feel I’m at a decent place in my skills, as I feel my debug draw has become decent. Lets get some screenshots going!

Initial simplex during Quick Hull.

Stack of polyhedron (OBB meshes). The blue and red contacts are warm started (red is warm started consistently over many frames).

Spent a lot of time creating Maya-like camera controls with orbiting.

I digress. I didn’t figure out this simple fact on my own; I gleaned this wisdom off of my friend Nathan Carlson. He pointed out that the reason a lot of students at DigiPen have trouble with computational geometry and general physics engine development due to a lack of debug rendering.

Just take a look at Dirk Gregorius’s sample demo from his GDC 2013 lecture. He fully renders Gauss maps, a minkoswki difference, and has glorius camera controls. I actually copied his camera design and implemented it myself. You can see that he has Ant Tweakbar integrated as well, for various tweaking tasks. Erin Catto has lots and lots of different tests, asserts, and most important of all tons of debug drawing for his Box2D library. The Zero team on the third floor of DigiPen definitely has the craziest debug draw I’ve ever heard of!

Hopefully I can achieve something at least in the likeness of all these other developers, and I recommend readers do the same; intuition is often best built visually when it comes to geometry.

# Dynamic AABB Tree

After studying Box2D, Bullet and some slides an alumnus (thanks to Nathan Carlson!) from DigiPen created, I put together a dynamic AABB (axis aligned bounding box) tree broadphase. A Dynamic AABB Tree is a binary search tree for spatial partitioning. Special thanks to Nathanael Presson for the original authoring of the tree within the Bullet engine source code (see comments of this post).

A broadphase in a physics simulation has the job of reporting when bodies are potentially colliding. Usually this is done through cheap intersection tests between simple bounding volumes (AABBs in this case).

There are some interesting properties of a dynamic AABB tree that make the data structure rather simple in terms of implementation.

There are a couple main functions to implement outlined here:

• Insert
• Remove
• Update

The underlying data structure ought to be a huge array of nodes. This is much more optimal in terms of cache performance than many loose heap-allocated nodes. This is very important as the entire array of nodes is going to be fetched from memory every single time the broadphase is updated.

The node of the AABB tree can be carefully constructed to take up minimal space, as nodes are always in one of two states: branches and leaves. Since nodes are stored in an array this allows for the nodes to be referenced by integral index instead of pointer. This allows the internal array to grow or shrink as necessary without fear of leaving dangling pointers anywhere.

The idea of the tree is to allow user data to be stored only within leaf nodes. All branch nodes contain just a single AABB enveloping both of its children. This leads to a short description of invariants for the data structure:

• All branches must have two valid children
• Only leaves contain user data

The first rule allows for some simplification of operations like insert/remove. No additional code branching is required to check for NULL children, increases code performance.

## Insert

Insertion involves creating a fat AABB to bound the userdata associated with the new leaf node. A fat AABB is just an AABB that is slightly larger than a tight fitting AABB. Once a new leaf node is created with its fat AABB a tree traversal is required in order to find a sibling to pair up with, and a new parent branch is constructed.

Traversing the tree should be done by following some sort of cost heuristic. The best seems to be a cost involving the surface area of involved nodes. See Box2D for a resource in cost heuristics.

After a new parent is created and a sibling found, the bounding volume hierarchy of all parent nodes are invalid. A traversal up the tree, correcting all the bounding AABBs and node heights is required. This hierarchy correction can be abstracted into a simple function:

## Remove

Removing nodes from binary search trees can involve a stack to trace back a traversal path, however due to the few invariants of the dynamic AABB tree removal quite simple. Since all branches must contain two valid children, and only leaves contain userdata, the only nodes that require deletion are leaves.

Remove the leaf’s parent and replace it with the leaf’s sibling. After this operation the parent hierarchy needs to be recalculated (just as in insert).

## Update

Since this tree is dynamic it needs a way to handle moving objects, and not just static level geometry. Since fat AABBs are created upon insertion, objects can move around a little bit before the AABB within the tree is invalid. The fatness factor, or how much to fatten each AABB, can be fine-tuned for performance. I myself use an arbitrary distance (about 5% of the average scale of objects). It is also possible to fatten the AABB based upon the object’s previous frame’s displacement (see Box2D for details on displacement predictions).

The update function of the tree checks to make sure that the current tight-fitting AABB of a shape is still contained by the AABB within the tree. In order for this operation to be constant time the node index in the tree needs to be intrusively stored within the shape itself. This will allow a shape to provide its node index along with an up to date tight fitting AABB to the tree, in order to see if its AABB is still within the fat AABB’s bounds.

If the tight AABB is not contained by the fat AABB the shape needs to be removed and reinserted into the tree.

## Balancing the Tree

Since this sort of spatial partitioning involves a binary search tree, the tree will perform optimally (in terms of collision queries) so long as the tree is balanced.

However, how you balance the tree matters. I’ve heard (through rumors) that dynamic AABB trees ought to be balanced in terms of surface area and not tree height. Though this makes sense conceptually, I was not sure how to balance the tree based of surface area. Knowing this, I just went with something I’m more comfortable with, which is balancing based upon tree height.

Balancing a single node involves taking a look at its two children. The height of each child should be compared, and if one is higher than the other by a height of two or more a rotation should be performed. In other words, the child with a larger height ought to be promoted, wherein promotion is a way of rotating a node up the tree hierarchy.

This sort of balancing (AVL balancing) can be performed at the beginning of the hierarchy synchronization loop. This allows the tree to self-balance itself upon insertion and removal of leaves.

## Free List

The tree needs to maintain an internal free list of unused nodes. Constructing a free list involves looping over all the nodes of the tree, upon initial tree construction, linking each node to the next with indices. This process is really simple, though readers may be unfamiliar with lists of indices as opposed to lists of pointers.

Here is a helper function I’ve constructed for myself to setup free lists (useful upon construction of the tree, and growing of the internal array):

Taking nodes from the free list involves setting the current m_freeList index to the next node in the list. Inserting nodes is just as trivial. You can think of this sort of list as a singly linked list, except the links are through indices rather than pointers.

## Queries

Querying volumes against the tree is highly efficient, so long as collision checks against an AABB are fast. Sphere, AABB and raycasts are all very fast.

Querying the tree involves detecting collisions with each node in the tree, starting with the root, traversing to all children until the tree is exhausted. Traversal to a child should be performed if there is a collision with the parent.

This sort of query can easily be done with recursion (note: the callback returns bool, signifying continue or termination of the query):

However it may be more favorable to do so with an iterative algorithm. Iterative algorithms are generally a little harder to implement, but take much less memory (and thus are more efficient):

## Culling Queries Further

It might be a good idea to only query rigid bodies that moved last frame. This is much better than querying all dynamic bodies unconditionally. If you have a contact graph or constraint graph, or any sort of islanding implementation then you can easily cull objects from queries. The idea is that if a rigid body is static or not placed within an island, then it didn’t move at all the previous frame. This is very simple and elegant.

Usually only active (awake) dynamic bodies are placed into an island (and any objects they have contact constraints with). This is because objects that are sleeping don’t need to be considered at all when forming new islands, as they are asleep. Knowing this a small flag can be set within a shape when it is placed into an island. This flag can later be checked to know whether or not it needs to be queried at all within the broadphase.

Though this step is not specific to the dynamic AABB tree, it is a nice trick to know. One caveat is that when contact constraints are updated, they must first have each involved shape’s bounding volume checked against each other. This will allow old contact constraints to be removed once there is no potential for the narrow phase to return true.

## Conclusion

The dynamic AABB tree is an extremely fast spatial partitioning data structure, and is ideal for both large unbounded worlds, and lots of dynamic rigid bodies. Querying the tree has the time complexity of a binary search. I can’t imagine much better of a broadphase for rigid body simulation. There’s an old saying of “no broadphase is perfect, and neither is this one.” when speaking of broadphases. However the dynamic AABB tree has completely impressed me.

In terms of performance, my simulation would grind to a halt and bottleneck with an N^2 broadphase at just over 100 dynamic bodies. Now I can easily simulate around 5000 rigid bodies with the broadphase taking about 5% of the simulation time. My current bottleneck is cache misses during constraint solving.