Compute Basis with SIMD

A while back Erin Catto made a cool post about his short function to compute a basis given a vector. The cool thing was that the basis is axis aligned if the input vector is axis aligned.

Catto noted that the function can be converted to branchless SIMD by using a select operation. So, a while ago I sat down and wrote the function, and only used it once. It didn’t really seem all too helpful to have a SIMD version, but it was fun to write anyway (I made use of Microsoft’s __Vectorcall):



Single File Libraries – |

Sean T. Barrett makes a lot of very cool single-file libraries in C. Recently he’s also been making another big list of other single-file (or two files with src/header) that he likes.

The great thing about Sean’s libraries is that they contain functions that do exactly what they intend to accomplish, without doing anything more or less. They don’t contain any extra complexity other than specifically what is needed to get the job done. This helps when preventing his libraries from creating external dependencies, meaning his libs can be deployed as a single file inclusion into a pre-existing project, without linking to external libraries or requiring additional headers.

These qualities make libraries like Sean’s extremely easy to hookup and get going. If you want to learn how to make quality libraries just look at any of the STB libraries.

Writing Libraries

Sean writes libraries in an old version of Visual Studio (I think VC6?), and codes in C. He also keeps all of his code inside of a single file while writing it — I’ve seen this on For the rest of us that aren’t as crazy or hardcore as Sean we can just use is a Perl script written by an unknown author (probably written by Richard Mitton). I found the file inside of Mitton’s single-file library “tigr”, which stands for Tiny Graphics Library. Mitton uses to recursively include a bunch of files into one larger c file. Check out the script yourself:

The idea is to make a dummy C file that includes other source files, like this:

Then can be run from the command line (assuming you have Perl installed) like so:

The script outputs some nice and quick text to stdout displaying which files were visited and packages the entire source tree into a single source file. The output file contains nice comments indicated the beginning and ending of files, like this excerpt from one of my own single-file libraries:


Mitton writes about the old joys of using the incbin command in assembly, where the assembler would embed the binary contents of a file straight into your program. It sounds like this gets dubious when dealing with linkers nowadays (I’ve had some really long link times by including large files into source code…), though it still happens from time to time in small single-file libraries.

An example is in Mitton’s tigr library where he uses a makeshift perl script “” to embed shaders and a png file (containing raster font glyphs) straight into C source code. This concept can also be seen in Omar Ocornut’s imgui library where some font files are embedded into the source. Omar seems to have used a small C program to generate the binary data as C source.

Again, I’m not sure who originally wrote this script but it was probably Mitton himself.

Finding Heap Corruption Tip on Windows

If one is fortunate enough to have access to <crtdbg.h> from Microsoft’s Visual C++ header, when running in debug mode there are a few _Crt*** functions that can  be pretty useful. My favorite ones are _CrtCheckMemory and _CrtDbgBreak (which is just a macro for __debugbreak in my case).

This function checks padding bytes around allocations made by the debug heap for buffer overruns. Usually this is not too necessary if you just do bounds checking yourself in debug mode, though sometimes old code crops up that you didn’t even write yourself, and needs debugging.

Lucky enough for me I had a good guess as to where the corruption was coming from as I had some call stacks in-tact from my crash. Finding the heap corruption was easy in my particular case since the code is deterministic and can be re-run quickly. Sprinkling this macro around will let me back-trace through the code to find exactly where the first instance of corruption happened:

A few minutes later I ended up here:

Since my breakpoint activated just after this function call, but not before, I knew the corruption came from this pesky function. Turns out it was a weird typo that was really sneaky, just barely causing an off-by-one error.

See if you can spot the error below. Here’s the answer (text is white, so highlight it to read): on line 8 chromSize needs to be changed to i.

Good If Statements – Self Documenting Code

Here’s a quick trick for documenting those pesky if-statements, especially when littered with complicated mathematics.

Here’s an example from Timothy Master’s “Pragmatic Neural Network Recipes in C++” where the author needed to test some interval bounds to see if the conditions were appropriate to perform a parabolic estimate of a step-size. Sounds, and looks, complicated:

In my opinion this author writes extremely good code. In the above example the details of the code are very well documented, however that’s just due to wonderful (albeit dubiously placed) comments. When modified slightly the code can still maintain the same readability and perhaps gain a little clarity:

The trick is to just name those little integer values that collapse down to a 1 or 0 (the comparison operators in C, like && and ||, take their arguments and return a 1 or 0). One of the most common applications of this kind of naming is when there’s a complex end condition in an algorithm. Often times this kind of termination criterion can just be called “done”:



Circular Orbits | Planetary Motion | NBody Simulation

I took a few hours yesterday to start on a simulation for a job application. Part of the simulation involves trying to figure out how to create stable nbody orbits, at least stable for some amount of time to look interesting.

Turns out that the mathematics beyond constructing a circular orbit of just two bodies is a bit far-fetch’d. We can see in this link that solving for the velocity of each planet is very straightforward. Let me take a screenshot of the final equations here:


To me this makes a lot of sense and matches my own derivation, albeit with a slightly different final form (just due to some algebra differences). I plug in the equation into my simulation and hit run, however there’s just not enough velocity to keep an orbit. See for yourself (the simulation is slowed down a lot compared to later gifs):


Each white circle is a body (like a planet) and the red circle is the system’s barycenter.

So I tinkered a bit and found out the velocity is too dim by a factor of sqrt( 2 ). My best guess is that the equations I’m dealing with have some kind of approximation involved where one of the bodies is supposed to be infinitely large. I don’t quite have the time to look into this detail (since I have much more exciting things to finish in the simulation), so I just plugged in my sqrt( 2 ) factor and got this result:


Here’s the function I wrote down to solve for orbit velocity in the above gif:

I’m able to run the simulation for thousands of years without any destabilization.

If any readers have any clue as to why the velocity is off by a factor of rad 2 please let me know, I’d be very interested in learning what the hiccup was.

Next I’ll try to write some code for iteratively finding systems of more than 2 bodies that have a stable configuration for some duration of time. I plan to use a genetic algorithm, or maybe something based off of perturbation theory, though I think a genetic algorithm will yield better results. Currently the difficulty lies in measuring the system stability. For now the best metrics I’ve come up with is measuring movement of the barycenter and bounding the entire simulation inside of a large sphere. If the barycenter moves, or if bodies are too far from the barycenter I can deem such systems as unstable and try other configurations.

One particularly interesting aspect of the barycenter is if I create a group of stationary bodies and let them just attract each other through gravity, the barycenter seems to always be dead stationary! The moment I manually initialize a planet with some velocity the barycenter moves, but if they all start at rest the barycenter is 100% stable.


A Star

Recently I’ve found a nice piece of pseudo code for implementing A Star after searching through a few lesser or incorrect pseudo code passages:

And here is the Pop function pseudo code (written by me, so it probably has a small error here or there):

My favorite thing about the pseudo code is that the closed list can be implemented with just a flag. The open list becomes the most complicated part of the algorithm. Should the open list be a sorted array, an unsorted array, a binary heap? The answer largely depends on much more memory you need to traverse.

If a small portion of memory needs to be searched all dynamic memory can be allocated up-front on one shot. Otherwise bits of memory should probably be allocated up-front, and more as necessary during the algorithms run.

Just yesterday I implemented AStar in C where my full header file looked like:

In my internal C file I only have < 150 lines of code, including some file-scope variables and math functions. Implementation was nearly a one to one transcription of the above pseudo code (so those who are recursion impaired like myself shouldn’t have any problems). This style may not be thread-safe, but hey, most AI related coded can only be done in serial anyway. My grid size was maxed out at 20×15, so pre-allocating memory for your entire search area may not be so practical as it was for me.

Still, I hope this post can provide some bits of intuition that are useful to someone.

The Concept of Polymorphism

For a while I thought of many computer science terms as useless hype. For example, as said by Jon Blow in one of his recent videos, some terms can be vague or hard to recollect. Buzzwords.

Polymorphism, semantics, encapsulation, instantiate, interface, API, functional, abstraction, indirection, blah blah blah. Reading too many of these words make my eyes glaze over.

In Compilers, Principles, Techniques and Tools by Aho, Sethi and Ullman has a keen description of polymorphism — at least it really stuck with me. Here with me I have the 1988 reprint so it actually contains C code (unlike the new “shiny” version), and in it states:

“An ordinary procedure allows the statements in its body to be executed with arguments of fixed types; each time a polymorphic procedure is called, the statements in its body can be executed with arguments of different types. The term “polymorphic” can also be applied to any piece of code that can be executed with arguments of different types. so we can talk of polymorphic functions and operators as well.

Built-in operators for indexing arrays, applying functions, and manipulating pointers are usually polymorphic because they are not restricted to a particular kind of array, function, or pointer. For example, the C reference manual states about the pointer operator &: “If the type of the operand is ‘…’, the type of the result is ‘pointer to …’.” Since any type can be substituted for “…” the operator & in C is polymorphic.”

So, some of these buzzwords are just misrepresented.

It seems the idea of polymorphism is useful to express, in code, algorithms that manipulate data structures. If the data being transformed is a data structure, then the data within the structure can sometimes be irrelevant.

For example, perhaps we have a math library that runs matrix operations on 32-bit floats. The library is used extensively in a large project. One day some strange numeric bug creeps into a complex algorithm. It would be helpful to be able to swap all 32-bit floats (the data within the “structure”, where the structure is the project) to 64-bit floats to see if more precision affects the numerically-related bug. If polymorphism is supported, perhaps through function overloads or compiler-supported binary operators, this task might be trivial.

Another example is the need to compute the height of a tree.

Another example is to find a circular dependency in a dependency graph.

Freelist Concept

A freelist is a way of retrieving some kind of resource in an efficient manner. Usually a freelist is used when a memory allocation is needed, but searching for a free block should be fast. Freelists can be used inside of general purpose allocators, or embedded directly into an optimized algorithm.

Lets say we have an array of elements, where each element is 16 bytes of memory. Our array has 32 elements. The program that arrays resides in needs to request 16 byte elements, use them, and later give them back; we have allocation of elements and deallocation of elements.

The order the elements are allocated is not related in any way to order of deallocation. In order for deallocation to be a fast operation the 16 byte element needs to be in a state such that a future allocation be handed this element to be reused.

A singly linked list can be used to hold onto all unused and free elements. Since the elements are 16 bytes each this is more than enough memory to store a pointer, or integer index, which points to the next block in the free list. We can use the null pointer, or a -1 index to signify the end of the freelist.

Allocating and deallocating can now look like:

Setting up the memory* will take some work. Each element needs to be linked together somehow, like through a pointer or integer index. If no more elements are available then more arrays of size 32 can be allocated — this means our memory is being managed with the style of a “paged allocator”, where each array can be thought of as a page.

The freelist is an important concept that can be embedded ad-hoc into more complex algorithms. Often times it is important for little pieces of software to expose a very tiny C-like interface, usually just a function or two. Having these softwares self-contain their own internal freelists is one way to achieve a simple interface.

Example of Hiding the Freelist

For example say we are computing the convex hull of a point-set through the Quick Hull algorithm. The hypothetical algorithm exposes an interface like this:

This QHull function does no explicit memory allocation and forces the user to allocate an appropriate amount of memory to work with. The bounds of this memory (how big it needs to be for the algorithm’s worst case scenario) is calculated by the ComputeMemoryBound function.

Inside of QHull often times the hull is expanded and many new faces are allocated. These faces are held on a free list. Once new faces are made, old ones are deleted. These deleted faces are pushed onto the free list. This continues until the algorithm concludes, and the user does not need to know about the details of the embedded memory management of the freelist.

Convex hull about to expand to point P. The white faces will be deleted. The see-through faces will be allocated.

A convex hull fully expanded to point P. All old faces were deleted.

The above images were found at this address:


Path of Exile – Righteous Fire Calculator in Lua

Lately I’ve been playing Path of Exile and created a Righteous Fire character. Since it’s important to make sure you have enough life regeneration and fire resistance to play with Righteous Fire I wrote a stats calculator in Lua. Input a few values at the top paragraph (change values as necessary for your character), and then run the code here:

The trickier part is that Energy Shield contributes to the life degeneration of the player when running Righteous Fire.


Printing Pretty Ascii Trees

I’m terrible at writing recursive algorithms. Absolutely horrible. So for practice I decided to try printing a binary tree where all branches have two children, all in the console. It turns out this is a pretty useful quick-n-dirty debugging tool when creating trees.

I really like formatting of Tree for linux so I decided to steal it.

After 3 straight hours of trying out stupid stuff I finally come to a working solution (my earlier attempts were around 150 lines of code, the solution was like 70):

Which will print cool trees like this one:

I then realized that I could use singly linked lists to represent trees with any number of children, which is totally necessary for the parser I am currently writing. So after 30 minutes I was able to form version 2 for arbitrary trees; this version is surprisingly right around the same length and complexity as the previous version:

And will print trees like this monster:

I also ended up finding a pretty cool implementation by a random stack-overflow user that prints trees in horizontal format (which turns out to be much more limited for the purposes of a console, since it can’t print larger trees, but might look a lot prettier for small trees):

It may be preferable to upgrade to the limited edition Box Drawing characters for best viewing pleasure:



In which case you’ll probably want to use some specific character codes (I used 195, 196 and 192 in the above image).