Category Archives: Algorithm

Game Localization and UTF-8

After some research I have decided a pretty good way to localize your game is to store all text in text files, in UTF-8 format. UTF-8 is super widely used and 100% backwards compatible with ASCII. Also the clever design of the UTF-8 encoding lets programmers treat UTF-8 buffers as naive byte arrays. UTF-8 buffers can be naively sorted and still retain valid results, as an example!


https://en.wikipedia.org/wiki/UTF-8

However sometimes UTF-16 is needed to deal with Windows APIs… So some conversions from UTF-8 to UTF-16 can be pretty useful. To make things worse there doesn’t exist good information on encoding/decoding UTF-8 and UTF-16, other than the original RFC documents. So in swift fashion I hooked up a tiny header with the help of Mitton’s UTF-8 encoder/decoder from tigr.

The result is tinyutf.h, yet another single file header library. Perfect for doing any string processing stuff, and not overly complicated in any way. Just do a quick google search for utf8 string libraries and every single one of them is absolutely nuts. They are all over-engineered and heavyweight. Probably because they all attempt to be general-purpose string libraries, which are debatably dumb to begin with.

The hard part of localization is probably the rendering of all kinds of glyphs. Rendering aside, localization can be as simple and storing original English translations in text files. A localizer (translator) can read the English and swap out all English phrases for utf8 encoded text symbols by typing the phrases in their native language. As long as localization is planned from project start, it can be very easy to support!

TwitterRedditFacebookShare

2D Collision Detection Library – tinyc2

Collision detection is pretty hard. Making 2D games is pretty fun. Wouldn’t it be nice to make a cool 2D game without fiddly around forever with collisions? With the single-file C header library tinyc2 now it’s possible to focus on the fun parts of development rather than twiddling with endless collision bugs, or ripping collision detection out of pre-made physics engines.

c2

Texture Atlases

Texture atlases for 2D games is a great optimization for batching together tons of different sprites (especially quads) with a very few number of draw calls. Making them is a pain. For pixel art or other 2D art, DXT compression is often not a great choice due to the lossyness. One good way to make atlases for 2D games is with the PNG format. Single-file loaders and savers can be used to load and save PNGs with a single function call each.

Here’s an example with the popular stb_image, along with stb_image_write:

This requires two different headers, one for loading and one for saving. Additionally the author of these headers also has a bin packing header, which could be used to make a texture atlas compiler.

However I have created a single-file header called tinydeflate. It does PNG loading, saving, and can create a texture atlas given an array of images. Internally it contains its own bin packing algorithm for sorting textures and creating atlases. Here’s an example:

The above example (if given a few more images) could output an atlas like so:

atlas

Followed by a very easy to parse text formatted file containing uv information for each atlas:

The nice thing about tinydeflate is the function to create atlases has a little bit of anti UV bleeding code inside of it. Most places on the internet will tell you to solve texture atlas bleeding by adding padding pixels around your textures. This might be important for 3D applications or when mip-mapping is important, however, I don’t have this kind of 3D experience. For 2D pixel art games it seems squeezing UVs ever so gently inside the borders of a texture works quite well, and so tinydeflate takes this approach. No complicated padding necessary.

And finally here’s an older video I did with some more information on writing the code yourself to do some of the above tasks.

Essentials of Software Engineering – With a Game Programming Focus

I’ve been commissioned to write a big document about what I think it means, and what is necessary, to become a good software engineer! If anyone finds the document interesting please do email me (email on my resume) with any questions or comments, even if you hate it and deeply disagree with me! This is the first release so some sections may be a bit skimpy and I might have some false info in there here and there.

I’m a fairly young engineer so please do take all my opinions with a healthy dose of skepticism. Nonetheless I hope the document is useful for someone! Many of my opinions come from my interpretation of much more experienced engineers I’ve come in direct contact with, and this PDF is sort of a creation of what others taught me.

As time goes on I may make edits or add new chapters, when I do I’ll modify the version number (just above the table of contents). Here is the PDF:

Download (PDF, Unknown)

Preprocessed Strings for Asset IDs

Mick West posted up on his site a really good overview of some different methods for hashing string ids and gave good motivation for optimizing this area early on in a project. Please do review his article as it’s a prerequisite to this post, and his article is just really good.

I’ve been primarily concerned with memory management of strings as they are extremely hairy to work with. For me specifically I’ve ruled out the option of a string class — they hide important details and it’s too easy to write poor (but functional) code with them. This is just my opinion. Based on that opinion I’d like to achieve these points:

  • Avoid all dynamic memory allocation, or de-allocation
  • Avoid complicated string data structures
  • Little to no run-time string traversals (like strcmp)
  • No annoying APIs that clutter the thought-space while writing/reading code
  • Allow assets to refer to one another while on-disk or in-memory without complicated pointer translations

There’s a lot I wanted to avoid here, and for good reason. If code makes use of dynamic memory, complicated data strctures, etc. that code is likely to suck in terms of both performance and maintenance. I’d like less features, less code, and some specific features to solve my specific problem: strings suck. The solution is to not use strings whenever possible, and when forced to use strings hide them under the rug. Following Jason Gregory’s example he outlined about Naughty Dog’s code base from his book “Game Engine Architecture 2nd Ed” I implemented the following solution, best shown via gif:

anim

The SID (string id) is a macro that looks like (along with a typedef):

I’ve implemented a preprocessor in C that takes an input file, finds the SID macro instances, reads the string, hashes it and then inserts the hash along with the string stored as comment. Pretty much exactly what Mick talked about in his article.

Preprocessing files is fairly easy, though supporting this function might be pretty difficult, especially if there are a lot of source files that need to be pre-processed. Modifying build steps can be risky and sink a ton of time. This is another reason to hammer in this sort of optimization early on in a project in order to reap the benefits for a longer period of time, and not have to adjust heavy laid-in-stone systems after the fact.

Bundle Away the Woes

I’ve “stolen” Mitton’s bundle.pl program for use in my current project to recursively grab source files and create a single unified cpp file. This large cpp can then be fed to the compiler and compiled as a “unity build”. Since the bundle script looks for instances of the “#include” directive code can be written in almost the exact same manner as normal C++ development. Just make CPP files, include headers, and don’t worry about it.

The only real gotcha is if someone tries to do fancy inclusions by defining macros outside of files that affect the file inclusion. Since the bundle script is only looking for the #include directive (by the way it also comments out unnecessary code inclusions in the output bundled code) and isn’t running a full-blown C preprocessor, this can sometimes cause confusion.

It seems like a large relief on the linker and leaves me to thinking that C/C++ really ought to be used as single-translation unit languages, while leaving the linker mostly for hooking together separate libraries/code bases…

Compiling code can now look more or less like this:

First collect all source into a single CPP, then preprocess the hash macros, and finally send the rest off to the compiler. Compile times should shrink, and I’ve even caught wind that modern compilers have an easier time with certain optimizations when fed only a single file (rumors! I can’t confirm this myself, at least not for a while).

Sweep it Under the Rug

Once some sort of string id is implemented in-game strings themselves don’t really need to be used all too often from a programmer’s perspective. However for visualization, tools, and editors strings are essential.

One good option I’ve adopted is to place strings for these purposes into global table in designated debug memory. This table can then be turned off or compiled away whenever the product is released. The idea I’ve adopted is to allow tools and debug visualization to use strings fairly liberally, albeit they are stored inside the debug table. The game code itself, along with the assets, refer to identifiers in hash-form. This allows product code to perform tiny translations from fully hashed values to asset indices, which is much faster and easier to manage compared to strings.

This can even be taken a step further; if all tools and debug visualizations are turned off and all that remains is a bunch of integer hash IDs, assets can then be “locked” for release. All hashed values can be translated directly into asset IDs such that no run-time translation is ever needed. For me specifically I haven’t quite thought how to implement such a system, and decided this level of optimization does not really give me a significant benefit.

Parting Thoughts

There are a couple of downsides to doing this style of compile-time preprocessing:

  • Additional complexity in the build-step
  • Layer of code opacity via SID macro

Some benefits:

  • Huge optimization in terms of memory usage and CPU efficiency
  • Can run switch statement on SID strings
  • Uniquely identify assets in-code and on-disk without costly or complicated translation

If the costs can be mitigated through implementing some kind of code pre-processor/bundler early on then it’s possible to be left with just a bunch of benefits :)

Finally, I thought it was super cool how hashes like djb2 and FNV-1a use an initializer value to start the hashing, typically a carefully chosen prime. This allows to hash a prefix string, and then feed the result off to hash the suffix. Mick explains this in his article this idea of combining hashed values as a useful feature for supporting tools and assets. This can be implemented both at compile or run-time (though I haven’t quite thought of a need to do this at compile-time yet):

 

Good If Statements – Self Documenting Code

Here’s a quick trick for documenting those pesky if-statements, especially when littered with complicated mathematics.

Here’s an example from Timothy Master’s “Pragmatic Neural Network Recipes in C++” where the author needed to test some interval bounds to see if the conditions were appropriate to perform a parabolic estimate of a step-size. Sounds, and looks, complicated:

In my opinion this author writes extremely good code. In the above example the details of the code are very well documented, however that’s just due to wonderful (albeit dubiously placed) comments. When modified slightly the code can still maintain the same readability and perhaps gain a little clarity:

The trick is to just name those little integer values that collapse down to a 1 or 0 (the comparison operators in C, like && and ||, take their arguments and return a 1 or 0). One of the most common applications of this kind of naming is when there’s a complex end condition in an algorithm. Often times this kind of termination criterion can just be called “done”:

 

Circular Orbits | Planetary Motion | NBody Simulation

I took a few hours yesterday to start on a simulation for a job application. Part of the simulation involves trying to figure out how to create stable nbody orbits, at least stable for some amount of time to look interesting.

Turns out that the mathematics beyond constructing a circular orbit of just two bodies is a bit far-fetch’d. We can see in this link that solving for the velocity of each planet is very straightforward. Let me take a screenshot of the final equations here:

orbit_math

To me this makes a lot of sense and matches my own derivation, albeit with a slightly different final form (just due to some algebra differences). I plug in the equation into my simulation and hit run, however there’s just not enough velocity to keep an orbit. See for yourself (the simulation is slowed down a lot compared to later gifs):

satellites_slow

Each white circle is a body (like a planet) and the red circle is the system’s barycenter.

So I tinkered a bit and found out the velocity is too dim by a factor of sqrt( 2 ). My best guess is that the equations I’m dealing with have some kind of approximation involved where one of the bodies is supposed to be infinitely large. I don’t quite have the time to look into this detail (since I have much more exciting things to finish in the simulation), so I just plugged in my sqrt( 2 ) factor and got this result:

satellites

Here’s the function I wrote down to solve for orbit velocity in the above gif:

I’m able to run the simulation for thousands of years without any destabilization.

If any readers have any clue as to why the velocity is off by a factor of rad 2 please let me know, I’d be very interested in learning what the hiccup was.

Next I’ll try to write some code for iteratively finding systems of more than 2 bodies that have a stable configuration for some duration of time. I plan to use a genetic algorithm, or maybe something based off of perturbation theory, though I think a genetic algorithm will yield better results. Currently the difficulty lies in measuring the system stability. For now the best metrics I’ve come up with is measuring movement of the barycenter and bounding the entire simulation inside of a large sphere. If the barycenter moves, or if bodies are too far from the barycenter I can deem such systems as unstable and try other configurations.

One particularly interesting aspect of the barycenter is if I create a group of stationary bodies and let them just attract each other through gravity, the barycenter seems to always be dead stationary! The moment I manually initialize a planet with some velocity the barycenter moves, but if they all start at rest the barycenter is 100% stable.

satellites_all

A Star

Recently I’ve found a nice piece of pseudo code for implementing A Star after searching through a few lesser or incorrect pseudo code passages:

And here is the Pop function pseudo code (written by me, so it probably has a small error here or there):

My favorite thing about the pseudo code is that the closed list can be implemented with just a flag. The open list becomes the most complicated part of the algorithm. Should the open list be a sorted array, an unsorted array, a binary heap? The answer largely depends on much more memory you need to traverse.

If a small portion of memory needs to be searched all dynamic memory can be allocated up-front on one shot. Otherwise bits of memory should probably be allocated up-front, and more as necessary during the algorithms run.

Just yesterday I implemented AStar in C where my full header file looked like:

In my internal C file I only have < 150 lines of code, including some file-scope variables and math functions. Implementation was nearly a one to one transcription of the above pseudo code (so those who are recursion impaired like myself shouldn’t have any problems). This style may not be thread-safe, but hey, most AI related coded can only be done in serial anyway. My grid size was maxed out at 20×15, so pre-allocating memory for your entire search area may not be so practical as it was for me.

Still, I hope this post can provide some bits of intuition that are useful to someone.

The Concept of Polymorphism

For a while I thought of many computer science terms as useless hype. For example, as said by Jon Blow in one of his recent videos, some terms can be vague or hard to recollect. Buzzwords.

Polymorphism, semantics, encapsulation, instantiate, interface, API, functional, abstraction, indirection, blah blah blah. Reading too many of these words make my eyes glaze over.

In Compilers, Principles, Techniques and Tools by Aho, Sethi and Ullman has a keen description of polymorphism — at least it really stuck with me. Here with me I have the 1988 reprint so it actually contains C code (unlike the new “shiny” version), and in it states:

“An ordinary procedure allows the statements in its body to be executed with arguments of fixed types; each time a polymorphic procedure is called, the statements in its body can be executed with arguments of different types. The term “polymorphic” can also be applied to any piece of code that can be executed with arguments of different types. so we can talk of polymorphic functions and operators as well.

Built-in operators for indexing arrays, applying functions, and manipulating pointers are usually polymorphic because they are not restricted to a particular kind of array, function, or pointer. For example, the C reference manual states about the pointer operator &: “If the type of the operand is ‘…’, the type of the result is ‘pointer to …’.” Since any type can be substituted for “…” the operator & in C is polymorphic.”

So, some of these buzzwords are just misrepresented.

It seems the idea of polymorphism is useful to express, in code, algorithms that manipulate data structures. If the data being transformed is a data structure, then the data within the structure can sometimes be irrelevant.

For example, perhaps we have a math library that runs matrix operations on 32-bit floats. The library is used extensively in a large project. One day some strange numeric bug creeps into a complex algorithm. It would be helpful to be able to swap all 32-bit floats (the data within the “structure”, where the structure is the project) to 64-bit floats to see if more precision affects the numerically-related bug. If polymorphism is supported, perhaps through function overloads or compiler-supported binary operators, this task might be trivial.

Another example is the need to compute the height of a tree.

Another example is to find a circular dependency in a dependency graph.

Freelist Concept

A freelist is a way of retrieving some kind of resource in an efficient manner. Usually a freelist is used when a memory allocation is needed, but searching for a free block should be fast. Freelists can be used inside of general purpose allocators, or embedded directly into an optimized algorithm.

Lets say we have an array of elements, where each element is 16 bytes of memory. Our array has 32 elements. The program that arrays resides in needs to request 16 byte elements, use them, and later give them back; we have allocation of elements and deallocation of elements.

The order the elements are allocated is not related in any way to order of deallocation. In order for deallocation to be a fast operation the 16 byte element needs to be in a state such that a future allocation be handed this element to be reused.

A singly linked list can be used to hold onto all unused and free elements. Since the elements are 16 bytes each this is more than enough memory to store a pointer, or integer index, which points to the next block in the free list. We can use the null pointer, or a -1 index to signify the end of the freelist.

Allocating and deallocating can now look like:

Setting up the memory* will take some work. Each element needs to be linked together somehow, like through a pointer or integer index. If no more elements are available then more arrays of size 32 can be allocated — this means our memory is being managed with the style of a “paged allocator”, where each array can be thought of as a page.

The freelist is an important concept that can be embedded ad-hoc into more complex algorithms. Often times it is important for little pieces of software to expose a very tiny C-like interface, usually just a function or two. Having these softwares self-contain their own internal freelists is one way to achieve a simple interface.

Example of Hiding the Freelist

For example say we are computing the convex hull of a point-set through the Quick Hull algorithm. The hypothetical algorithm exposes an interface like this:

This QHull function does no explicit memory allocation and forces the user to allocate an appropriate amount of memory to work with. The bounds of this memory (how big it needs to be for the algorithm’s worst case scenario) is calculated by the ComputeMemoryBound function.

Inside of QHull often times the hull is expanded and many new faces are allocated. These faces are held on a free list. Once new faces are made, old ones are deleted. These deleted faces are pushed onto the free list. This continues until the algorithm concludes, and the user does not need to know about the details of the embedded memory management of the freelist.

Convex hull about to expand to point P. The white faces will be deleted. The see-through faces will be allocated.

A convex hull fully expanded to point P. All old faces were deleted.

The above images were found at this address: http://www.eecs.tufts.edu/~mhorn01/comp163/algorithm.html