Author Archives: Randy Gaul

Small C++ Reflection Demo

I created a small demonstration program that explains the core ideas behind implementing a custom reflection system for C++. More might be written in this post in the future — for now I’m just storing the demo right here on this webpage:

 

Robust Parallel Vector Test

In an effort to try to find some robust tests for parallel vectors I’ve decided to create some slides talking about the solutions I’ve come across. Hopefully one day some reader can comment or email about some alternatives or better solutions! I decided to write this blog post in slide format since Microsoft’s Powerpoint is pretty snazzy for making quick diagrams and whatnot :)

Here’s the link to Ericson’s blog post on tolerances: Tolerances Revisited. Please note there’s a little bit of ambiguity about whether the test should care if the vectors point in the same direction or not. In general it doesn’t really matter since the point of the slides is numeric robustness.

The gist of the slides is that the scale of the vectors in question matters for certain applications. Computing a good relative epsilon seems difficult, and maybe different epsilon calculations would be good for different applications. I’m not sure!

Here’s a demo you can try out yourself to test two of the solutions from the slides (tests returning true should be considered false positives):

Output of the program is:

Download (PDF, Unknown)

Nearly Geodesic Sphere – Mesh Creation

Capture

This post will let me store some useful code and possibly share it with others. I had originally found this wonderful explanation on stack overflow on how to generate triangles in a super simple manner for a nice sphere mesh.

The idea is to take an octohedron, or icosahedron, and subdivide the triangles on the surface of the mesh, such that each triangle creates 4 new triangles (each triangle creates a Triforce symbol).

One thing the stack overflow page didn’t describe is intermittent normalization between each subdivision. If you imagine subdividing an octohedron over and over without any normalization, the final resulting sphere will have triangles that vary in size quite a bit. However, if after each subdivision every single vertex is normalized then the vertices will snap to the unit sphere more often. This results in a final mesh that has triangles of closer to uniform geodesic area.

The final mesh isn’t purely geodesic and there will be variation in the size of the triangles, but it will be hardly noticeable. Sphere meshes will look super nice and also behave well when simulating soft bodies with Matyka’s pressure volume.

Here’s an example program you can use to perform some subdivisions upon an octohedron (click here to view the program’s output):

Inertia Tensor – Capsule

Capsule

A capsule is defined by two hemispheres and a cylinder. Capsules are pretty useful geometry for collision detection since they are easily defined mathematically (implicitly defined). This means that minimal information can be stored in memory to represent a capsule in 3D space. Given a radius value and position information only a few floating point values are necessary. Algorithms like GJK also work well with implicitly defined shapes since support mappings can be computed in constant time.

Given the inertia tensor of a cylinder and sphere the inertia tensor of a capsule can be calculated with the help of the parallel axis theorem. The parallel axis theorem can shift the origin that an inertia tensor is defined relative to, given just a translation vector. We care about the tensor form of the parallel axis theorem since it is probably easiest to understand from a computer science perspective:pat

J is the final transformed inertia. I is the initial inertia tensor. m is the mass of the object in question. R is the translation vector to shift with. E3 is the standard basis, or the identity matrix. The cross symbol is the outer product (see next paragraph).

Assuming readers are familiar with the dot product and outer product, computing the change in an inertia tensor isn’t too difficult.

The center of mass of a hemisphere is 3/8 * radius above the center of the spherical base. Knowing this and understanding the parallel axis theorem, the inertia tensor of one of the hemispheres can be easily calculated. My own derivation brought me the conclusion that the inertia tensor of capsule is:

Please note the final inertia tensor assumes the capsule is oriented such that the cylinder is aligned along the y axis. A rotation matrix R can be used to transform the final inertia tensor I into world space like so: R * I * R^T. To learn why this form is used read this post.

Special thanks to Dirk Gregorius for emailing me with an error in the original draft of this post! He kindly provided the public with a nice document showing his derivation of the inertia tensor of a capsule.

Following suit to identify the source of my error, I ended up writing my own derivation down in PDF format.

LIT (Lua Interpreted Triggers) – Starcraft: Brood War

Download LIT here

In the past I frequented a website called StarEdit.net in order to learn how to make scenario files for the game Starcraft: Brood War (SCBW). The scenario files could be generated with an editor, and had a simple trigger system. A lot of fun can be had making small games within these scenario files, and the games can be played over Blizzard’s Battle.net servers.

Blizzard’s stock editor was very limiting so some fan-made solutions popped up over the years. Creating triggers in these editors involved manually navigating a point-and-click GUI. The GUI was super cumbersome to use, and ended up being unruly for large numbers of triggers.

In an effort to revisit some fond memories of creating SCBW scenarios I decided to create a tool for generating triggers en masse (windows only). This post explains the details of how to use LIT.

Lua Interpreted Triggers (LIT)

Lua is a pretty good choice for creating some kind of tool to generate SCBW triggers. Since all that is needed from my tool is to generate some text, creating a super small and simple Lua API doesn’t take much development time, and can be very useful to use when making scenario files.

LIT is comprised of a small executable that embeds the Lua language, and a few Lua files to represents the LIT api.

Setting Up LIT

I’ve written a short post here on StarEdit.net about how to setup and run LIT from the command line.

LIT Examples

Since a SCBW trigger involves conditions and actions, using LIT involves laying out conditions and actions. There are two main resources for learning how to use LIT and they both come in the downloaded ZIP file. Inside of ActionsExample.lua and ConditionsExample.lua I’ve written out a short demonstration for how to use each unique condition and action in SCBW.

In order to learn how to use any particular condition or action, just consult these two example files.

However, since LIT is written with Lua the entire Lua language can be used to generate SCBW triggers! Since anyone that is interested in using LIT will probably not know how to write Lua code, I recommend reading some simple tutorial on using Lua before getting started. Here’s a decent looking one.

Lets take a look at writing a Binary Countoff using LIT:

The triggers generated by running this Lua file with LIT are: click to view.

As you can see there’s a little bit of state stored in a couple Deaths objects. That state is a number, a player and a unit. Using this state text is output in a straightforward manner.

Include Function

My favorite part about LIT is the include function! When writing out a LIT file, another entire LIT file can be included into it. Look at this example:

As the comment says, the include function will copy paste an entire Lua file straight into the spot where the include function is. A file included into another file can also include files. Files can be included through different folders, and the include function supports relative paths.

Lets assume that AnotherFile.lua holds the rest of the code from the first Binary Countoff example. If this is the case, then the output triggers will be exactly the same!

Say we have a file A and a file B. If A includes B and B includes A, then LIT will crash. Similarly, if any chain of files all form a circular inclusion, LIT will crash.

This lets users of LIT organize their Lua files in any manner they wish. One of the biggest drawbacks of using a GUI editor to create triggers, is once a couple thousand triggers exist in a single scenario file it becomes nearly impossible to efficiently navigate them. Using your operating system’s folders and file structure, organizing LIT is simple and powerful.

Demonstration Map

Here’s a link to download a demonstration map. The map contains a concept of screen-sized rooms that the player can move around with. It’s implemented with a few burrowed units on each room. The download link comes with the map file and the LIT files used to generate the map’s triggers. The most interesting file is RoomLogic.lua.

Memory Management

Any competent software engineer will have spent significant time working with low level memory management. Even though the operating system code is written for will often provide some kind of allocation and deallocation mechanism, application specific assumptions can be made to increase memory related performance.

For example certain hardware doesn’t have virtual memory support, or the virtual memory support can be quite lacking. A lack of virtual memory means raw allocations from the OS return real addresses to the hardware RAM. Usually virtual memory can alleviate some effects of memory fragmentation through a level of indirection, though when dealing with physical memory yourself no such alleviation exists.

This is just one example of how a software memory manager can be written and used to control memory fragmentation in a way that makes sense for the application.

Types of Allocators

There are a few main types of allocators that I myself have found pretty useful: paging, stack and heap based allocations. Each one makes specific assumptions about the types of allocations and how the memory ought be used. Due to these assumptions significant performance boosts can be reaped in ways that may not have been realistic with raw operating system allocations.

Stack Based Allocation

My favorite type of allocation involves the use of a simple stack. The idea is to make one large call to malloc or new and hold this piece of memory. The Stack itself just holds a pointer to this large chunk of memory, and an integer representing an index into the stack with an element size in bytes.

Here is what a Stack implementation might look like (in pseudo code):

Allocation can work by moving the m_memory pointer forward in the stack. Deallocation can work by moving the m_memory pointer backwards in the stack. Notice that the Free function requires the user to pass back in the size of the allocation! This can be avoided by storing this size parameter from Allocate inside of the m_memory array itself, just before the location of the returned address. Upon deallocation this value can be retrieved by moving the data parameter of Free back in memory by 4 bytes.

The advantage of the stack allocator is that it’s extremely fast and dubiously simple to implement. The limitation is that deallocations must be performed in the reverse order of allocations, since the stack itself is in LIFO order. This makes the use cases for the stack allocator pretty limited. Usually resources, like images, level files, sounds, models, etc. can be loaded into memory with a stack based allocator. Anything that has a very clear and non-variable lifespan should be able to be allocated on a stack.

One last trick is that the last allocation can be trivially resized! Often times an algorithm will require a lot of temporary scratch memory to perform some calculations, or store some state. An initial guess as to how much memory is needed can often be calculated as the worst-case scenario. Once an algorithm finishes this scratch memory can be reduced to the size actually used, if it is the last allocation on the stack. Resizing the last stack allocation involves moving the index backwards in memory.

Heap Allocation

Implementing your own heaps is pretty similar to the stack based allocator. A heap allocator will use the operating system to allocate a large chunk of memory. Subsequent calls to the heap’s Allocate and Free methods will just dip into this chunk and fetch a piece.

The heap is more versatile and general purpose than a stack allocator. The heap can be implemented with a linked list of nodes. Each node represents a piece of memory. A node can either be allocated or free. To keep track of these linked list pointers, allocation state, and size of the memory block some memory itself is required! This stuff can be stored in a separate array, or right inside the large raw chunk of memory (just like with the stack allocator).

Usually it is preferential to add a small header to each allocation to store this information. A heap node might look something like this:

When the heap is first constructed it will contain a linked list of HeapHeader structs, but only a single header will be present, and it holds the entire piece of raw memory originally allocated by the OS upon the Heap allocator’s construction.

Allocating from the heap involves splitting a free HeapHeader into an allocated piece, and a new HeapHeader for the leftover space. The details of this lay mostly in the linked list implementation, and is not the focus of this article.

In order to reduce memory fragmentation it is a good idea to merge adjacent free HeapHeader links into a single link. This ought to be handled in the Heap::Free function. The details of merging free links lay mostly in the linked list implementation, and is not the focus of this article.

Here’s an example of what the Heap may look like in implementation:

When Heap::Allocate is called a free link of appropriate size must be searched for. This has the time complexity of O( N ), and a lot of memory must be fetched into the cache upon allocation as the list itself is traversed. There are tricks to improve allocation performance of heaps, and a simple one would be the cache a single pointer to a free block in the heap itself. This pointer can be cached in Heap::FreeHeap::Allocate, or both. Once a new call to Heap::Allocate is made this cached pointer can be tested first to see it is an appropriate size.

There are two common ways to search through the links for an allocation: first fit and best fit. First fit will return the user with the first piece of memory large enough to hold the allocation. Best fit will return a chunk of memory that came from a HeapHeader with the smallest size that is still large enough to hold the requested allocation size.

First fit can be preferential for cache coherency, as it may prefer to allocate from the beginning of the heap and try to keep things closer together in memory. Best fit may be preferential for keeping the heap as un-fragmented as possible.

Paged Allocation

The heap based allocator intends to fight memory fragmentation through fitting links to allocation sizes, and by merging adjacent free memory blocks. This type of fragmentation is called external fragmentation. Another type of memory fragmentation is called internal fragmentation.

An internal memory fragmentation is when an allocated piece of memory is given to the user that actually holds more memory than the user requested. The user is assumed to not know about this extra piece of memory. This can provide an advantage to the allocator: all allocations can be of a fixed size, and any allocation larger than this fixed size is denied.

This lets the allocator act like an array. When an allocation is requested an empty element can be returned to the user. Upon freeing a piece of memory, the element is simply marked as free and placed into a free list.

The free list is a linked list of array elements. The memory in the free elements themselves should be used to store the pointer of each subsequent free element.

Allocation and deallocation become constant in time complexity and there is zero external memory fragmentation. In this way internal memory fragmentation is traded for external memory fragmentation.

Pages!

The term “pages” comes into play when the array is filled up. Once an array is full of allocated elements another array can be allocated. Once this array is filled up, another one is allocated. Each array (aka page) can be stored in a singly linked list of pages.

The free list itself can pointer across multiple pages without any problems.

A page containing only free elements can be deleted entirely, though this feature might not need to be supported.

A paged allocator can also hold an array of singly linked lists of pages. Each element of this array can hold a list of pages that corresponds to a different element size. This can allow the paged allocator to fit different allocation requests into the most appropriate page list. A common tactic is to have pages that represent arrays with an element size of 2^N bytes, where N is usually at least 2, and smaller than some value K.

The biggest advantage of a paged allocator is zero external fragmentation. The internal fragmentation does make memory more non-homogeneous. This type of allocator will probably lower your cache line utilization. Cache line utilization would be how much memory in each cache line fetched from main memory to the CPU cache is actually used. Since internal fragmentation is a feature of a paged allocator, cache line utilization will probably suffer.

The unused memory in the pages can be reduced drastically on a per-application basis; if the users of the allocator are able to specify the element sizes of different page lists, then zero internal fragmentation can be achieved.

Handle-Based Array

Instead of thinking of a paged allocator in terms of separate arrays, one might think of a simpler allocator that holds just a single array. If the elements within this array of of POD nature the array elements can be referenced by index. This lets the array grow or shrink in size as necessary, as new sized arrays can still be accessed by an old index.

Whenever the user wants a pointer to an element they first give the array an index, and a pointer is returned. This pointer is never stored anywhere! Continuous translation from index to pointer occurs -this allows the internal array itself to moved around in memory as necessary.

Users might need a little more power to refer to elements than a simple integer. Some type of handle might be needed to translate from index to pointer. Read more about handles here.

Conclusion

Given these three types of allocators an application should have all the variety of memory allocation necessary to run with pretty good performance. More advance allocation techniques definitely exist, and some are just combinations of the three basic allocators presented in this article.

Each allocator can be quite simple in isolation! I myself implemented a stack in about 100 lines, a paged allocator in 150, and a heap in about 250 lines of C++ code.

Further reading might include topics such as: cache coherency, memory alignment, garbage collection, virtual memory, page files (operating system pages).

Distance Point to Line Segment

2014-07-23 00.16.37

Understanding the dot product will enable one to come up with an algorithm to compute the distance between a point and line in two or three dimensions without much fuss. This is a math problem but discussion and focus is on writing a function to compute distance.

Dot Product Intuition

The dot product comes from the law of cosines. Here’s the formula:

\begin{equation}
c^2 = a^2 + b^2 – 2ab * cos\;\gamma
\label{eq1}
\end{equation}

This is just an equation that relates the cosine of an angle within a triangle to its various side lengths a, b and c. The Wikipedia page (link above) does a nice job of explaining this. Equation \eqref{eq1} can be rewritten as:

\begin{equation}
c^2 – a^2 – b^2 = -2ab * cos\;\gamma
\label{eq2}
\end{equation}

The right hand side equation \eqref{eq2} is interesting! Lets say that instead of writing the equation with side lengths a, b and c, it is written with two vectors: u and v. The third side can be represented as u - v. Re-writing equation \eqref{eq2} in vector notation yields:

\begin{equation}
|u\;-\;v|^2 – |u|^2 – |v|^2 = -2|u||v| * cos\;\gamma
\label{eq3}
\end{equation}

Which can be expressed in scalar form as:

\begin{equation}
(u_x\;-\;v_x)^2 + (u_y\;-\;v_y)^2 + (u_z\;-\;v_z)^2\;- \\
(u_{x}^2\;+\;u_{y}^2\;+\;u_{z}^2) – (v_{x}^2\;+\;v_{y}^2\;+\;v_{z}^2)\;= \\
-2|u||v| * cos\;\gamma
\label{eq4}
\end{equation}

By crossing out some redundant terms, and getting rid of the -2 on each side of the equation, this ugly equation can be turned into a much more approachable version:

\begin{equation}
u_x v_x + u_y v_y + u_w v_w = |u||v| * cos\;\gamma
\label{eq5}
\end{equation}

Equation \eqref{eq5} is the equation for the dot product. If both u and v are unit vectors then the equation will simplify to:

\begin{equation}
dot(\;\hat{u},\;\hat{v}\;) = cos\;\gamma
\label{eq6}
\end{equation}

If u and v are not unit vectors equation \eqref{eq5} says that the dot product between both vectors is equal to cos( γ ) that has been scaled by the lengths of u and v. This is a nice thing to know! For example: the squared length of a vector is just itself dotted with itself.

If u is a unit vector and v is not, then dot( u, v ) will return the distance in which v travels in the u direction. This is useful for understanding the plane equation in three dimensions (or any other dimension):

\begin{equation}
ax\;+\;by\;+\;cz\;-\;d\;=\;0
\end{equation}

The normal of a plane would be the vector: { a, b, c }. If this normal is a unit vector, then d represents the distance to the plane from the origin. If the normal is not a unit vector then d is scaled by the length of the normal.

To compute the distance of a point to this plane any point can be substituted into the plane equation, assuming the normal of the plane equation is of unit length. This operation is computing the distance along the normal a given point travels. The subtraction by d can be viewed as “translating the plane to the origin” in order to convert the distance along the normal, to a distance to the plane.

Writing the Function: Distance Point to Line

The simplest function for computing the distance to a plane (or line in 2D) would be to place a point into the plane equation. This means that we’ll have to either compute the plane equation in 2D if all we have are two points to represent the plane, and in 3D find a new tactic altogether since planes in 3D are not lines.

In my own experience I’ve found it most common to have a line in the form of two points in order to represent the parametric equation of a line. Two points can come from a triangle, a mesh edge, or two pieces of world geometry.

To setup the problem lets outline the function to be created as so:

The two parameters a and b are used to define the line segment itself. The direction of the line would be the vector b – a.

After a brief visit to the Wikipedia page for this exact problem I quickly wrote down my own derivation of the formula they have on their page. Take a look at this image I drew:

2014-07-23 00.16.37

The problem of finding the distance of a point to a line makes use of finding the vector that points from p to the closest point on the line ab. From the above picture: a simple way to calculate this vector would be to subtract away the portion of a – p that travels along the vector ab.

The part of a – p that travels along ab can be calculated by projecting a – p onto ab. This projection is described in the previous section about the dot product intuition.

Given the vector d the distance from p to ab is just sqrt( dot( d, d ) ). The sqrt operation can be omit entirely to compute a distance squared. Our function may now look like:

This function is quite nice because it will never return a negative number. There is a popular version of this function that performs a division operation. Given a very small line segment as input for ab it is entirely possible to have the following function return a negative number:

It’s very misleading to have a function called “square distance” or “distance” to return a negative number. Passing in the result of this function to a sqrt function call can result in NaNs and be really nasty to deal with.

Barycentric Coordinates – Segments

A full discussion of barycentric coordinates is way out of scope here. However, they can be used to compute distance from a point to line segment. The segment portion of the code just clamps a point projected into the line within the bounds of a and b.

Assuming readers are a little more comfortable with the dot product than I was when I first started programming, the following function should make sense:

This function can be adapted pretty easily to compute the closest point on the line segment to p instead of returning a scalar. The idea is to use the vector from p to the closest position on ab to project p onto the segment ab.

The above function works by computing barycentric coordinates of p relative to ab. The coordinates are scaled by the length of ab so the second if statement must be adapted slightly. If the direction ab were normalized then the second if statement would be a comparison with the value 1, which should make sense for barycentric coordinates.

Sample Program

Here’s a sample program you can try out yourself:

The output is: “Distance squared: 2.117647″.