The trick is super simple: just process the first element outside of the loop to set up your initial conditions, then form your loop to skip the first element. An assumption would be made that there’s at least one element in the array to process. Here’s an example for computing an AABB:

AABB aabb; aabb.min = verts[ 0 ]; aabb.max = verts[ 0 ]; for ( int i = 1; i < vertCount; ++i ) { aabb.min = Min( aabb.min, verts[ i ] ); aabb.max = Max( aabb.max, verts[ i ] ); }

Usually I myself would have written this kind of code like so and not given any more thought to it:

AABB aabb; aabb.min = Vec3( FLT_MAX, FLT_MAX, FLT_MAX ); aabb.max = Vec3( -FLT_MIN, -FLT_MIN, -FLT_MIN ); for ( int i = 0; i < vertCount; ++i ) { aabb.min = Min( aabb.min, verts[ i ] ); aabb.max = Max( aabb.max, verts[ i ] ); }

This second code chunk is arguably just slightly more esoteric and is definitely a little less efficient for no good reason.

One could also skip the first element when finding the min/max of any sort of array, like for example: dot product results. Though simple it’s pretty nice to find small ways to write slightly better code.

]]>// Reflection demonstration by Randy Gaul // Reflection lets C++ users store information about the different types // of data their program uses. This means non-templated structs are used // to store relevant information about types that would usually be lost // during compilation. These TypeInfo structs can be used during run-time // to write more type-aware code. This means that reflection can be used // to automate certain processes that deal with interpreting memory. // Examples of areas that can be automated: script binding, serialization, // visual editor support, etc. #include <cstdio> #include <cassert> #include <cstring> // Stores information about a type. Things like the name of a type, // size in bytes, and other things are important to keep track of. struct TypeInfo { const char* name; int size; // You can add an array of TypeInfo pointers here to represent // relationships between different types (like inheritance parent // child relationships, or struct/class data members). }; // Used to store a pointer to a TypeInfo struct as a static parameter // of a templated static function. This lets the user lookup TypeInfo // pointers given a template type parameter. template < typename T > struct TypeLookupByTemplate { static TypeInfo* GetType( TypeInfo* typeInfo ) { static TypeInfo* s_typeInfo = NULL; if ( !s_typeInfo ) { s_typeInfo = typeInfo; } return s_typeInfo; } }; // You can make this larger if you need to #define MAX_TYPE_INFOS 128 TypeInfo typeInfos[ MAX_TYPE_INFOS ]; int typeInfosCount; // Add a type to the type infos array. For now this just stores // name and size of a type. #define ADD_TYPE( T ) \ do { \ assert( typeInfosCount < MAX_TYPE_INFOS ); \ TypeInfo* typeInfo = typeInfos + typeInfosCount++; \ typeInfo->size = sizeof( T ); \ typeInfo->name = #T; \ TypeLookupByTemplate< T >::GetType( typeInfo ); \ } while ( 0 ); // Allows code to get a pointer to a TypeInfo through a template type lookup ID. #define GET_TYPE_BY_TEMPLATE( T ) TypeLookupByTemplate< T >::GetType( NULL ) TypeInfo* GetTypeByString( const char* typeNameCharPointer ) { for ( int i = 0; i < typeInfosCount; ++i ) { if ( !strcmp( typeNameCharPointer, typeInfos[ i ].name ) ) { return typeInfos + i; } } // Unable to find a specific type by string assert( false ); return NULL; } // Lookup a type info struct by string ID. // Looking up a type with an O( N ) loop will probably be fast enough for most // use cases. If this ever becomes a problem a hash table can be used here. #define GET_TYPE_BY_STRING( typeNameCharPointer ) GetTypeByString( typeNameCharPointer ) // A Variable struct contains a pointer to some memory. The typeInfo pointer // stores information about the type of data that the void pointer points to. // This is useful to pass many types of data to a single function, thus red- // ucing code duplication in many areas. struct Variable { // Functions to initialize the data and typeInfo pointers void Set( void* dataPtr, TypeInfo* typeInfoPtr ) { data = dataPtr; typeInfo = typeInfoPtr; } template < typename T > void Set( T& typedData ) { data = &typedData; typeInfo = GET_TYPE_BY_TEMPLATE( T ); } // If any code would like to retrieve the explicit data the code // must provide a templated type to cast to. template < typename T > T& GetValue( ) { // An assert here can force type safety assert( GET_TYPE_BY_TEMPLATE( T ) == typeInfo ); return *(T*)data; } void* data; TypeInfo* typeInfo; }; int main( ) { // Add some type information to the reflection system ADD_TYPE( int ); ADD_TYPE( float ); // Print out stored type information from the reflection system TypeInfo* intTypeInfo = GET_TYPE_BY_STRING( "int" ); printf( "%s\n", intTypeInfo->name ); printf( "%d\n", intTypeInfo->size ); TypeInfo* floatTypeInfo = GET_TYPE_BY_TEMPLATE( float ); printf( "%s\n", floatTypeInfo->name ); printf( "%d\n", floatTypeInfo->size ); // Create an integer and then create a Variable that describes the integer // and holds a pointer to the integer in memory. int x = 10; Variable var; var.Set( x ); // Variables let you query name and size information printf( "%s\n", var.typeInfo->name ); printf( "%d\n", var.typeInfo->size ); // Variables also let you cast to the contained type printf( "%d\n", var.GetValue< int >( ) ); }

]]>

Here’s the link to Ericson’s blog post on tolerances: Tolerances Revisited. Please note there’s a little bit of ambiguity about whether the test should care if the vectors point in the same direction or not. In general it doesn’t really matter since the point of the slides is numeric robustness.

The gist of the slides is that the scale of the vectors in question matters for certain applications. Computing a good relative epsilon seems difficult, and maybe different epsilon calculations would be good for different applications. I’m not sure!

Here’s a demo you can try out yourself to test two of the solutions from the slides (tests returning true should be considered false positives):

#include <cstdio> #include <cmath> struct Vec3 { float x; float y; float z; Vec3 operator*( float a ) { Vec3 v; v.x = x * a; v.y = y * a; v.z = z * a; return v; } }; float Dot( Vec3 a, Vec3 b ) { return a.x * b.x + a.y * b.y + a.z * b.z; } Vec3 Normalize( Vec3 a ) { return a * (1.0f / std::sqrt( Dot( a, a ) )); } bool Parallel0( Vec3 a, Vec3 b, float tol ) { a = Normalize( a ); b = Normalize( b ); float dot = 1.0f - std::abs( Dot( a, b ) ); return dot < tol; } bool Parallel1( Vec3 a, Vec3 b, float tol ) { if ( Dot( a, b ) < 0 ) { b.x = -b.x; b.y = -b.y; b.z = -b.z; } float k = a.y / b.y; b = b * k; float x = std::abs( a.x - b.x ); float y = std::abs( a.y - b.y ); float z = std::abs( a.z - b.z ); if ( x < tol && y < tol && z < tol ) return true; return false; } void Tolerance( float tol ) { Vec3 u; u.x = 0.0f; u.y = 0.9812398512f; u.z = 0.001f; u = Normalize( u ); Vec3 v; v.x = 0.0f; v.y = 10.0f; v.z = 1.0f; printf( "Tolerance: %f\n", tol ); bool result = Parallel0( u, v, tol ); printf( "Parallel0: %s\n", result ? "true" : "false" ); result = Parallel1( u, v, tol ); printf( "Parallel1: %s\n", result ? "true" : "false" ); printf( "\n" ); } int main( ) { Tolerance( 1.0e-1f ); Tolerance( 1.0e-2f ); Tolerance( 1.0e-3f ); }

Output of the program is:

Tolerance: 0.100000 Parallel0: true Parallel1: true Tolerance: 0.010000 Parallel0: true Parallel1: false Tolerance: 0.001000 Parallel0: false Parallel1: false]]>

]]>

The idea is to take an octohedron, or icosahedron, and subdivide the triangles on the surface of the mesh, such that each triangle creates 4 new triangles (each triangle creates a Triforce symbol).

One thing the stack overflow page didn’t describe is intermittent normalization between each subdivision. If you imagine subdividing an octohedron over and over without any normalization, the final resulting sphere will have triangles that vary in size quite a bit. However, if after each subdivision every single vertex is normalized then the vertices will snap to the unit sphere more often. This results in a final mesh that has triangles of closer to uniform geodesic area.

The final mesh isn’t purely geodesic and there will be variation in the size of the triangles, but it will be hardly noticeable. Sphere meshes will look super nice and also behave well when simulating soft bodies with Matyka’s pressure volume.

Here’s an example program you can use to perform some subdivisions upon an octohedron (click here to view the program’s output):

#include <cstdio> #include <cmath> #include <cassert> struct Vec3 { Vec3( ) { } Vec3( float x0, float y0, float z0 ) { x = x0; y = y0; z = z0; } float x; float y; float z; const Vec3 operator+( const Vec3& a ) { Vec3 v; v.x = x + a.x; v.y = y + a.y; v.z = z + a.z; return v; } const Vec3 operator-( const Vec3& a ) { Vec3 v; v.x = x - a.x; v.y = y - a.y; v.z = z - a.z; return v; } const Vec3 operator*( float a ) const { Vec3 v; v.x = x * a; v.y = y * a; v.z = z * a; return v; } }; float Dot( Vec3 a, Vec3 b ) { return a.x * b.x + a.y * b.y + a.z * b.z; } inline const Vec3 Cross( const Vec3& a, const Vec3& b ) { return Vec3( (a.y * b.z) - (b.y * a.z), (b.x * a.z) - (a.x * b.z), (a.x * b.y) - (b.x * a.y) ); } const Vec3 Normalize( Vec3 v ) { return v * (1.0f / std::sqrt( Dot( v, v ) )); } const int stackSize = 1024 * 8; Vec3 in[ stackSize ]; int ip = 0; Vec3 out[ stackSize ]; int op = 0; Vec3 octohedron[ 6 ]; void Subdivide( void ) { op = 0; for ( int i = 0; i < ip; i += 3 ) { Vec3 a = in[ i ]; Vec3 b = in[ i + 1 ]; Vec3 c = in[ i + 2 ]; Vec3 ab = (a + b) * 0.5f; Vec3 bc = (b + c) * 0.5f; Vec3 ca = (c + a) * 0.5f; out[ op++ ] = b; out[ op++ ] = bc; out[ op++ ] = ab; out[ op++ ] = c; out[ op++ ] = ca; out[ op++ ] = bc; out[ op++ ] = a; out[ op++ ] = ab; out[ op++ ] = ca; out[ op++ ] = ab; out[ op++ ] = bc; out[ op++ ] = ca; assert( op <= stackSize ); } for ( int i = 0; i < op; ++i ) in[ i ] = Normalize( out[ i ] ); ip = op; } int main( ) { octohedron[ 0 ] = Vec3( 1.0f, 0.0f, 0.0f ); octohedron[ 1 ] = Vec3( 0.0f,-1.0f, 0.0f ); octohedron[ 2 ] = Vec3(-1.0f, 0.0f, 0.0f ); octohedron[ 3 ] = Vec3( 0.0f, 1.0f, 0.0f ); octohedron[ 4 ] = Vec3( 0.0f, 0.0f, 1.0f ); octohedron[ 5 ] = Vec3( 0.0f, 0.0f,-1.0f ); in[ ip++ ] = octohedron[ 2 - 1 ]; in[ ip++ ] = octohedron[ 1 - 1 ]; in[ ip++ ] = octohedron[ 5 - 1 ]; in[ ip++ ] = octohedron[ 3 - 1 ]; in[ ip++ ] = octohedron[ 2 - 1 ]; in[ ip++ ] = octohedron[ 5 - 1 ]; in[ ip++ ] = octohedron[ 4 - 1 ]; in[ ip++ ] = octohedron[ 3 - 1 ]; in[ ip++ ] = octohedron[ 5 - 1 ]; in[ ip++ ] = octohedron[ 1 - 1 ]; in[ ip++ ] = octohedron[ 4 - 1 ]; in[ ip++ ] = octohedron[ 5 - 1 ]; in[ ip++ ] = octohedron[ 1 - 1 ]; in[ ip++ ] = octohedron[ 2 - 1 ]; in[ ip++ ] = octohedron[ 6 - 1 ]; in[ ip++ ] = octohedron[ 2 - 1 ]; in[ ip++ ] = octohedron[ 3 - 1 ]; in[ ip++ ] = octohedron[ 6 - 1 ]; in[ ip++ ] = octohedron[ 3 - 1 ]; in[ ip++ ] = octohedron[ 4 - 1 ]; in[ ip++ ] = octohedron[ 6 - 1 ]; in[ ip++ ] = octohedron[ 4 - 1 ]; in[ ip++ ] = octohedron[ 1 - 1 ]; in[ ip++ ] = octohedron[ 6 - 1 ]; Subdivide( ); Subdivide( ); FILE* fp = fopen( "out.txt", "w" ); for ( int i = 0; i < ip; i += 3 ) { Vec3 a = in[ i ]; Vec3 b = in[ i + 1 ]; Vec3 c = in[ i + 2 ]; fprintf( fp, "%7.4f, %7.4f, %7.4f,\n%7.4f, %7.4f, %7.4f,\n%7.4f, %7.4f, %7.4f,\n\n", a.x, a.y, a.z, b.x, b.y, b.z, c.x, c.y, c.z ); } fprintf( fp, "%d\n", op / 3 ); for ( int i = 0; i < ip; i += 3 ) { Vec3 a = in[ i ]; Vec3 b = in[ i + 1 ]; Vec3 c = in[ i + 2 ]; Vec3 n = Normalize( Cross( b - a, c - a ) ); fprintf( fp, "%7.4f, %7.4f, %7.4f,\n", n.x, n.y, n.z ); } fclose( fp ); }]]>

Given the inertia tensor of a cylinder and sphere the inertia tensor of a capsule can be calculated with the help of the parallel axis theorem. The parallel axis theorem can shift the origin that an inertia tensor is defined relative to, given just a translation vector. We care about the tensor form of the parallel axis theorem since it is probably easiest to understand from a computer science perspective:

J is the final transformed inertia. I is the initial inertia tensor. m is the mass of the object in question. R is the translation vector to shift with. E3 is the standard basis, or the identity matrix. The cross symbol is the outer product (see next paragraph).

Assuming readers are familiar with the dot product and outer product, computing the change in an inertia tensor isn’t too difficult.

The center of mass of a hemisphere is 3/8 * radius above the center of the spherical base. Knowing this and understanding the parallel axis theorem, the inertia tensor of one of the hemispheres can be easily calculated. My own derivation brought me the conclusion that the inertia tensor of capsule is:

float r = radius; float r2 = r * r; float l = length; // length is the height of the cylinder float l2 = l * l; float mass = 1/2 * volume_sphere * density // Parallel axis sphere to its own COM, then translate to the // end of the capsule float x = mass * ( 3/8 r + l/2 )^2 - mass * ( 3/8 r )^2 // The above can be simplified to float x = mass * [(3 r + 2 l) / 8] * l float z = x float y = 2/5 * mass * r2; // Final inertia tensor I for a single shifted hemisphere [ x, 0, 0 ] [ 0, y, 0 ] [ 0, 0, z ] // We can add I with the tensor J, where J is the inertia of a cylinder final = J + 2 * I

Please note the final inertia tensor assumes the capsule is oriented such that the cylinder is aligned along the y axis. A rotation matrix R can be used to transform the final inertia tensor I into world space like so: R * I * R^T. To learn why this form is used read this post.

Special thanks to Dirk Gregorius for emailing me with an error in the original draft of this post! He kindly provided the public with a nice document showing his derivation of the inertia tensor of a capsule.

Following suit to identify the source of my error, I ended up writing my own derivation down in PDF format.

]]>In the past I frequented a website called StarEdit.net in order to learn how to make scenario files for the game Starcraft: Brood War (SCBW). The scenario files could be generated with an editor, and had a simple trigger system. A lot of fun can be had making small games within these scenario files, and the games can be played over Blizzard’s Battle.net servers.

Blizzard’s stock editor was very limiting so some fan-made solutions popped up over the years. Creating triggers in these editors involved manually navigating a point-and-click GUI. The GUI was super cumbersome to use, and ended up being unruly for large numbers of triggers.

In an effort to revisit some fond memories of creating SCBW scenarios I decided to create a tool for generating triggers en masse (windows only). This post explains the details of how to use LIT.

Lua is a pretty good choice for creating some kind of tool to generate SCBW triggers. Since all that is needed from my tool is to generate some text, creating a super small and simple Lua API doesn’t take much development time, and can be very useful to use when making scenario files.

LIT is comprised of a small executable that embeds the Lua language, and a few Lua files to represents the LIT api.

I’ve written a short post here on StarEdit.net about how to setup and run LIT from the command line.

Since a SCBW trigger involves conditions and actions, using LIT involves laying out conditions and actions. There are two main resources for learning how to use LIT and they both come in the downloaded ZIP file. Inside of *ActionsExample.lua* and *ConditionsExample.lua* I’ve written out a short demonstration for how to use each unique condition and action in SCBW.

In order to learn how to use any particular condition or action, just consult these two example files.

However, since LIT is written with Lua the entire Lua language can be used to generate SCBW triggers! Since anyone that is interested in using LIT will probably not know how to write Lua code, I recommend reading some simple tutorial on using Lua before getting started. Here’s a decent looking one.

Lets take a look at writing a Binary Countoff using LIT:

-- Create a player group for creating triggers p = PlayerGroup( 1, 2, 3 ) -- Create a death counter d = Deaths( ) d:SetUnit( "Terran Marine" ) d:SetPlayer( 8 ) -- Create another death counter d2 = Deaths( ) d2:SetUnit( "Terran Marine" ) d2:SetPlayer( 7 ) -- Generate the binary countoff triggers with a loop i = 1 exponent = 1 while i < 12 do d:SetCount( exponent ) d2:SetCount( exponent ) p:Conditions( ) d:AtLeast( ) p:Actions( ) d:Subtract( ) d2:Add( ) i = i + 1 exponent = exponent * 2 end

The triggers generated by running this Lua file with LIT are: click to view.

As you can see there’s a little bit of state stored in a couple *Deaths* objects. That state is a number, a player and a unit. Using this state text is output in a straightforward manner.

My favorite part about LIT is the *include* function! When writing out a LIT file, another entire LIT file can be included into it. Look at this example:

p = PlayerGroup( 1, 2, 3 ) d = Deaths( ) d:SetUnit( "Terran Marine" ) d:SetPlayer( 8 ) d2 = Deaths( ) d2:SetUnit( "Terran Marine" ) d2:SetPlayer( 7 ) -- Copy paste ALL of the AnotherFile.lua right here include( "AnotherFile.lua" )

As the comment says, the *include* function will copy paste an entire Lua file straight into the spot where the *include* function is. A file included into another file can also include files. Files can be included through different folders, and the *include* function supports relative paths.

Lets assume that *AnotherFile.lua* holds the rest of the code from the first Binary Countoff example. If this is the case, then the output triggers will be exactly the same!

Say we have a file *A* and a file *B*. If *A* includes *B* and *B* includes *A*, then LIT will crash. Similarly, if any chain of files all form a circular inclusion, LIT will crash.

This lets users of LIT organize their Lua files in any manner they wish. One of the biggest drawbacks of using a GUI editor to create triggers, is once a couple thousand triggers exist in a single scenario file it becomes nearly impossible to efficiently navigate them. Using your operating system’s folders and file structure, organizing LIT is simple and powerful.

Here’s a link to download a demonstration map. The map contains a concept of screen-sized rooms that the player can move around with. It’s implemented with a few burrowed units on each room. The download link comes with the map file and the LIT files used to generate the map’s triggers. The most interesting file is RoomLogic.lua.

]]>For example certain hardware doesn’t have virtual memory support, or the virtual memory support can be quite lacking. A lack of virtual memory means raw allocations from the OS return real addresses to the hardware RAM. Usually virtual memory can alleviate some effects of memory fragmentation through a level of indirection, though when dealing with physical memory yourself no such alleviation exists.

This is just one example of how a software memory manager can be written and used to control memory fragmentation in a way that makes sense for the application.

There are a few main types of allocators that I myself have found pretty useful: paging, stack and heap based allocations. Each one makes specific assumptions about the types of allocations and how the memory ought be used. Due to these assumptions significant performance boosts can be reaped in ways that may not have been realistic with raw operating system allocations.

My favorite type of allocation involves the use of a simple stack. The idea is to make one large call to *malloc* or *new* and hold this piece of memory. The *Stack* itself just holds a pointer to this large chunk of memory, and an integer representing an index into the stack with an element size in bytes.

Here is what a *Stack* implementation might look like (in pseudo code):

class Stack { public: Stack( ) { m_memory = malloc( STACK_SIZE ); m_index = 0; } ~Stack( ) { assert( m_index == 0 ); free( m_memory ); } void* Allocate( int size ); Free( void* data, int size ); private: byte* m_memory; int m_index; };

Allocation can work by moving the *m_memory* pointer forward in the stack. Deallocation can work by moving the *m_memory* pointer backwards in the stack. Notice that the *Free* function requires the user to pass back in the size of the allocation! This can be avoided by storing this *size* parameter from *Allocate* inside of the *m_memory* array itself, just before the location of the returned address. Upon deallocation this value can be retrieved by moving the *data* parameter of *Free* back in memory by 4 bytes.

The advantage of the stack allocator is that it’s extremely fast and dubiously simple to implement. The limitation is that deallocations must be performed in the reverse order of allocations, since the stack itself is in LIFO order. This makes the use cases for the stack allocator pretty limited. Usually resources, like images, level files, sounds, models, etc. can be loaded into memory with a stack based allocator. Anything that has a very clear and non-variable lifespan should be able to be allocated on a stack.

One last trick is that the last allocation can be trivially resized! Often times an algorithm will require a lot of temporary scratch memory to perform some calculations, or store some state. An initial guess as to how much memory is needed can often be calculated as the worst-case scenario. Once an algorithm finishes this scratch memory can be reduced to the size actually used, if it is the last allocation on the stack. Resizing the last stack allocation involves moving the index backwards in memory.

Implementing your own heaps is pretty similar to the stack based allocator. A heap allocator will use the operating system to allocate a large chunk of memory. Subsequent calls to the heap’s *Allocate* and *Free* methods will just dip into this chunk and fetch a piece.

The heap is more versatile and general purpose than a stack allocator. The heap can be implemented with a linked list of nodes. Each node represents a piece of memory. A node can either be allocated or free. To keep track of these linked list pointers, allocation state, and size of the memory block some memory itself is required! This stuff can be stored in a separate array, or right inside the large raw chunk of memory (just like with the stack allocator).

Usually it is preferential to add a small header to each allocation to store this information. A heap node might look something like this:

struct HeapHeader { HeapHeader* next; HeapHeader* prev; int size; bool allocated; };

When the heap is first constructed it will contain a linked list of *HeapHeader* structs, but only a single header will be present, and it holds the entire piece of raw memory originally allocated by the OS upon the *Heap* allocator’s construction.

Allocating from the heap involves splitting a free *HeapHeader* into an allocated piece, and a new *HeapHeader* for the leftover space. The details of this lay mostly in the linked list implementation, and is not the focus of this article.

In order to reduce memory fragmentation it is a good idea to merge adjacent free *HeapHeader* links into a single link. This ought to be handled in the *Heap::Free* function. The details of merging free links lay mostly in the linked list implementation, and is not the focus of this article.

Here’s an example of what the *Heap* may look like in implementation:

class Heap { public: void* Allocate( int size ) { // Search linked list for a free link that can fit size header->allocated = true; // split header into two headers // mark the new header as not allocated return data + sizeof( Heapheader ); } Free( void* data ) { HeapHeader* header = data - sizeof( Heapheader ); header->allocated = false if ( header->next is free ) // Merge header into header->next merged = true; if (header->prev is free ) // Merge header into header->prev merged = true } private: lmHeader* m_memory; };

When *Heap::Allocate* is called a free link of appropriate size must be searched for. This has the time complexity of O( N ), and a lot of memory must be fetched into the cache upon allocation as the list itself is traversed. There are tricks to improve allocation performance of heaps, and a simple one would be the cache a single pointer to a free block in the heap itself. This pointer can be cached in *Heap::Free*, *Heap::Allocate*, or both. Once a new call to *Heap::Allocate* is made this cached pointer can be tested first to see it is an appropriate size.

There are two common ways to search through the links for an allocation: first fit and best fit. First fit will return the user with the first piece of memory large enough to hold the allocation. Best fit will return a chunk of memory that came from a *HeapHeader* with the smallest size that is still large enough to hold the requested allocation size.

First fit can be preferential for cache coherency, as it may prefer to allocate from the beginning of the heap and try to keep things closer together in memory. Best fit may be preferential for keeping the heap as un-fragmented as possible.

The heap based allocator intends to fight memory fragmentation through fitting links to allocation sizes, and by merging adjacent free memory blocks. This type of fragmentation is called *external fragmentation*. Another type of memory fragmentation is called *internal fragmentation*.

An internal memory fragmentation is when an allocated piece of memory is given to the user that actually holds more memory than the user requested. The user is assumed to not know about this extra piece of memory. This can provide an advantage to the allocator: all allocations can be of a fixed size, and any allocation larger than this fixed size is denied.

This lets the allocator act like an array. When an allocation is requested an empty element can be returned to the user. Upon freeing a piece of memory, the element is simply marked as free and placed into a free list.

The free list is a linked list of array elements. The memory in the free elements themselves should be used to store the pointer of each subsequent free element.

Allocation and deallocation become constant in time complexity and there is zero external memory fragmentation. In this way internal memory fragmentation is traded for external memory fragmentation.

The term “pages” comes into play when the array is filled up. Once an array is full of allocated elements another array can be allocated. Once this array is filled up, another one is allocated. Each array (aka page) can be stored in a singly linked list of pages.

The free list itself can pointer across multiple pages without any problems.

A page containing only free elements can be deleted entirely, though this feature might not need to be supported.

A paged allocator can also hold an array of singly linked lists of pages. Each element of this array can hold a list of pages that corresponds to a different element size. This can allow the paged allocator to fit different allocation requests into the most appropriate page list. A common tactic is to have pages that represent arrays with an element size of 2^N bytes, where N is usually at least 2, and smaller than some value K.

The biggest advantage of a paged allocator is zero external fragmentation. The internal fragmentation does make memory more non-homogeneous. This type of allocator will probably lower your cache line utilization. Cache line utilization would be how much memory in each cache line fetched from main memory to the CPU cache is actually used. Since internal fragmentation is a feature of a paged allocator, cache line utilization will probably suffer.

The unused memory in the pages can be reduced drastically on a per-application basis; if the users of the allocator are able to specify the element sizes of different page lists, then zero internal fragmentation can be achieved.

Instead of thinking of a paged allocator in terms of separate arrays, one might think of a simpler allocator that holds just a single array. If the elements within this array of of POD nature the array elements can be referenced by index. This lets the array grow or shrink in size as necessary, as new sized arrays can still be accessed by an old index.

Whenever the user wants a pointer to an element they first give the array an index, and a pointer is returned. This pointer is never stored anywhere! Continuous translation from index to pointer occurs -this allows the internal array itself to moved around in memory as necessary.

Users might need a little more power to refer to elements than a simple integer. Some type of handle might be needed to translate from index to pointer. Read more about handles here.

Given these three types of allocators an application should have all the variety of memory allocation necessary to run with pretty good performance. More advance allocation techniques definitely exist, and some are just combinations of the three basic allocators presented in this article.

Each allocator can be quite simple in isolation! I myself implemented a stack in about 100 lines, a paged allocator in 150, and a heap in about 250 lines of C++ code.

Further reading might include topics such as: cache coherency, memory alignment, garbage collection, virtual memory, page files (operating system pages).

]]>The dot product comes from the law of cosines. Here’s the formula:

\begin{equation}

c^2 = a^2 + b^2 – 2ab * cos\;\gamma

\label{eq1}

\end{equation}

This is just an equation that relates the cosine of an angle within a triangle to its various side lengths *a*, *b* and *c*. The Wikipedia page (link above) does a nice job of explaining this. Equation \eqref{eq1} can be rewritten as:

\begin{equation}

c^2 – a^2 – b^2 = -2ab * cos\;\gamma

\label{eq2}

\end{equation}

The right hand side equation \eqref{eq2} is interesting! Lets say that instead of writing the equation with side lengths *a*, *b* and *c*, it is written with two vectors: *u* and *v*. The third side can be represented as *u - v*. Re-writing equation \eqref{eq2} in vector notation yields:

\begin{equation}

|u\;-\;v|^2 – |u|^2 – |v|^2 = -2|u||v| * cos\;\gamma

\label{eq3}

\end{equation}

Which can be expressed in scalar form as:

\begin{equation}

(u_x\;-\;v_x)^2 + (u_y\;-\;v_y)^2 + (u_z\;-\;v_z)^2\;- \\

(u_{x}^2\;+\;u_{y}^2\;+\;u_{z}^2) – (v_{x}^2\;+\;v_{y}^2\;+\;v_{z}^2)\;= \\

-2|u||v| * cos\;\gamma

\label{eq4}

\end{equation}

By crossing out some redundant terms, and getting rid of the -2 on each side of the equation, this ugly equation can be turned into a much more approachable version:

\begin{equation}

u_x v_x + u_y v_y + u_w v_w = |u||v| * cos\;\gamma

\label{eq5}

\end{equation}

Equation \eqref{eq5} is the equation for the dot product. If both *u* and *v* are unit vectors then the equation will simplify to:

\begin{equation}

dot(\;\hat{u},\;\hat{v}\;) = cos\;\gamma

\label{eq6}

\end{equation}

If *u* and *v* are **not** unit vectors equation \eqref{eq5} says that the dot product between both vectors is equal to *cos( γ )* that has been scaled by the lengths of *u* and *v*. This is a nice thing to know! For example: the squared length of a vector is just itself dotted with itself.

If *u* is a unit vector and *v* is not, then *dot( u, v )* will return the distance in which *v* travels in the *u* direction. This is useful for understanding the plane equation in three dimensions (or any other dimension):

\begin{equation}

ax\;+\;by\;+\;cz\;-\;d\;=\;0

\end{equation}

The normal of a plane would be the vector: { *a, b, c* }. If this normal is a unit vector, then *d* represents the distance to the plane from the origin. If the normal is **not** a unit vector then *d* is scaled by the length of the normal.

To compute the distance of a point to this plane any point can be substituted into the plane equation, assuming the normal of the plane equation is of unit length. This operation is computing the distance along the normal a given point travels. The subtraction by *d* can be viewed as “translating the plane to the origin” in order to convert the distance along the normal, to a distance to the plane.

The simplest function for computing the distance to a plane (or line in 2D) would be to place a point into the plane equation. This means that we’ll have to either compute the plane equation in 2D if all we have are two points to represent the plane, and in 3D find a new tactic altogether since planes in 3D are not lines.

In my own experience I’ve found it most common to have a line in the form of two points in order to represent the parametric equation of a line. Two points can come from a triangle, a mesh edge, or two pieces of world geometry.

To setup the problem lets outline the function to be created as so:

float DistancePtLine( Vec2 a, Vec2 b, Vec2 p ) { }

The two parameters *a* and *b* are used to define the line segment itself. The direction of the line would be the vector *b – a*.

After a brief visit to the Wikipedia page for this exact problem I quickly wrote down my own derivation of the formula they have on their page. Take a look at this image I drew:

The problem of finding the distance of a point to a line makes use of finding the vector *d *that points from *p* to the closest point on the line *ab*. From the above picture: a simple way to calculate this vector would be to subtract away the portion of *a – p* that travels along the vector *ab*.

The part of *a – p* that travels along *ab* can be calculated by projecting *a – p* onto *ab*. This projection is described in the previous section about the dot product intuition.

Given the vector *d* the distance from *p* to *ab* is just *sqrt( **dot( d, d ) )*. The *sqrt* operation can be omit entirely to compute a distance squared. Our function may now look like:

float DistancePtLine( Vec2 a, Vec2 b, Vec2 p ) { Vec2 n = b - a; Vec2 pa = a - p; Vec2 c = n * (Dot( pa, n ) / Dot( n, n )); Vec2 d = pa - c; return sqrt( Dot( d, d ) ); }

This function is quite nice because it will never return a negative number. There is a popular version of this function that performs a division operation. Given a very small line segment as input for *ab* it is entirely possible to have the following function return a negative number:

float SqDistancePtLine( Vec2 a, Vec2 b, Vec2 p ) { Vec2 ab = b - a, ap = p - a, bp = p - b; float e = Dot( ap, ab ); return Dot( ap, ap ) - e * e / Dot( ab, ab ); }

It’s very misleading to have a function called “square distance” or “distance” to return a negative number. Passing in the result of this function to a *sqrt* function call can result in *NaN*s and be really nasty to deal with.

A full discussion of barycentric coordinates is way out of scope here. However, they can be used to compute distance from a point to line *segment*. The segment portion of the code just clamps a point projected into the line within the bounds of *a* and *b*.

Assuming readers are a little more comfortable with the dot product than I was when I first started programming, the following function should make sense:

float SqDistancePtSegment( Vec2 a, Vec2 b, Vec2 p ) { Vec2 n = b - a; Vec2 pa = a - p; float c = Dot( n, pa ); // Closest point is a if ( c > 0.0f ) return Dot( pa, pa ); Vec2 bp = p - b; // Closest point is b if ( Dot( n, bp ) > 0.0f ) return Dot( bp, bp ); // Closest point is between a and b Vec2 e = pa - n * (c / Dot( n, n )); return Dot( e, e ); }

This function can be adapted pretty easily to compute the closest point on the line segment to *p* instead of returning a scalar. The idea is to use the vector from *p* to the closest position on *ab* to project *p* onto the segment *ab*.

The above function works by computing barycentric coordinates of *p* relative to *ab*. The coordinates are scaled by the length of *ab* so the second if statement must be adapted slightly. If the direction *ab* were normalized then the second if statement would be a comparison with the value 1, which should make sense for barycentric coordinates.

Here’s a sample program you can try out yourself:

#include <cstdio> struct Vec2 { float x; float y; const Vec2 operator-( const Vec2& a ) { Vec2 v; v.x = x - a.x; v.y = y - a.y; return v; } const Vec2 operator*( float a ) const { Vec2 v; v.x = x * a; v.y = y * a; return v; } }; float Dot( const Vec2& a, const Vec2& b ) { return a.x * b.x + a.y * b.y; } int main( ) { Vec2 a, b, p; a.x = 1.0f; a.y = 1.0f; b.x = 5.0f; b.y = 2.0f; p.x = 3.0f; p.y = 3.0f; Vec2 n = b - a; Vec2 pa = a - p; Vec2 c = n * (Dot( n, pa ) / Dot( n, n )); Vec2 d = pa - c; float d2 = Dot( d, d ); printf( "Distance squared: %f\n", d2 ); }

The output is: “Distance squared: 2.117647″.

]]>