Author Archives: Randy Gaul

A Star

Recently I’ve found a nice piece of pseudo code for implementing A Star after searching through a few lesser or incorrect pseudo code passages:

My favorite thing about the pseudo code is that the closed list can be implemented with just a flag. The open list becomes the most complicated part of the algorithm. Should the open list be a sorted array, an unsorted array, a binary heap? The answer largely depends on much more memory you need to traverse.

If a small portion of memory needs to be searched all dynamic memory can be allocated up-front on one shot. Otherwise bits of memory should probably be allocated up-front, and more as necessary during the algorithms run.

Just yesterday I implemented AStar in C where my full header file looked like:

In my internal C file I only have < 150 lines of code, including some file-scope variables and math functions. Implementation was nearly a one to one transcription of the above pseudo code (so those who are recursion impaired like myself shouldn’t have any problems). This style may not be thread-safe, but hey, most AI related coded can only be done in serial anyway. My grid size was maxed out at 20×15, so pre-allocating memory for your entire search area may not be so practical as it was for me.

Still, I hope this post can provide some bits of intuition that are useful to someone.

TwitterRedditFacebookShare

The Concept of Polymorphism

For a while I thought of many computer science terms as useless hype. For example, as said by Jon Blow in one of his recent videos, some terms can be vague or hard to recollect. Buzzwords.

Polymorphism, semantics, encapsulation, instantiate, interface, API, functional, abstraction, indirection, blah blah blah. Reading too many of these words make my eyes glaze over.

In Compilers, Principles, Techniques and Tools by Aho, Sethi and Ullman has a keen description of polymorphism — at least it really stuck with me. Here with me I have the 1988 reprint so it actually contains C code (unlike the new “shiny” version), and in it states:

“An ordinary procedure allows the statements in its body to be executed with arguments of fixed types; each time a polymorphic procedure is called, the statements in its body can be executed with arguments of different types. The term “polymorphic” can also be applied to any piece of code that can be executed with arguments of different types. so we can talk of polymorphic functions and operators as well.

Built-in operators for indexing arrays, applying functions, and manipulating pointers are usually polymorphic because they are not restricted to a particular kind of array, function, or pointer. For example, the C reference manual states about the pointer operator &: “If the type of the operand is ‘…’, the type of the result is ‘pointer to …’.” Since any type can be substituted for “…” the operator & in C is polymorphic.”

So, some of these buzzwords are just misrepresented.

It seems the idea of polymorphism is useful to express, in code, algorithms that manipulate data structures. If the data being transformed is a data structure, then the data within the structure can sometimes be irrelevant.

For example, perhaps we have a math library that runs matrix operations on 32-bit floats. The library is used extensively in a large project. One day some strange numeric bug creeps into a complex algorithm. It would be helpful to be able to swap all 32-bit floats (the data within the “structure”, where the structure is the project) to 64-bit floats to see if more precision affects the numerically-related bug. If polymorphism is supported, perhaps through function overloads or compiler-supported binary operators, this task might be trivial.

Another example is the need to compute the height of a tree.

Another example is to find a circular dependency in a dependency graph.

Freelist Concept

A freelist is a way of retrieving some kind of resource in an efficient manner. Usually a freelist is used when a memory allocation is needed, but searching for a free block should be fast. Freelists can be used inside of general purpose allocators, or embedded directly into an optimized algorithm.

Lets say we have an array of elements, where each element is 16 bytes of memory. Our array has 32 elements. The program that arrays resides in needs to request 16 byte elements, use them, and later give them back; we have allocation of elements and deallocation of elements.

The order the elements are allocated is not related in any way to order of deallocation. In order for deallocation to be a fast operation the 16 byte element needs to be in a state such that a future allocation be handed this element to be reused.

A singly linked list can be used to hold onto all unused and free elements. Since the elements are 16 bytes each this is more than enough memory to store a pointer, or integer index, which points to the next block in the free list. We can use the null pointer, or a -1 index to signify the end of the freelist.

Allocating and deallocating can now look like:

Setting up the memory* will take some work. Each element needs to be linked together somehow, like through a pointer or integer index. If no more elements are available then more arrays of size 32 can be allocated — this means our memory is being managed with the style of a “paged allocator”, where each array can be thought of as a page.

The freelist is an important concept that can be embedded ad-hoc into more complex algorithms. Often times it is important for little pieces of software to expose a very tiny C-like interface, usually just a function or two. Having these softwares self-contain their own internal freelists is one way to achieve a simple interface.

Example of Hiding the Freelist

For example say we are computing the convex hull of a point-set through the Quick Hull algorithm. The hypothetical algorithm exposes an interface like this:

This QHull function does no explicit memory allocation and forces the user to allocate an appropriate amount of memory to work with. The bounds of this memory (how big it needs to be for the algorithm’s worst case scenario) is calculated by the ComputeMemoryBound function.

Inside of QHull often times the hull is expanded and many new faces are allocated. These faces are held on a free list. Once new faces are made, old ones are deleted. These deleted faces are pushed onto the free list. This continues until the algorithm concludes, and the user does not need to know about the details of the embedded memory management of the freelist.

Convex hull about to expand to point P. The white faces will be deleted. The see-through faces will be allocated.

A convex hull fully expanded to point P. All old faces were deleted.

The above images were found at this address: http://www.eecs.tufts.edu/~mhorn01/comp163/algorithm.html

Path of Exile – Righteous Fire Calculator in Lua

Capture

Lately I’ve been playing Path of Exile and created a Righteous Fire character. Since it’s important to make sure you have enough life regeneration and fire resistance to play with Righteous Fire I wrote a stats calculator in Lua. Input a few values at the top paragraph (change values as necessary for your character), and then run the code here:

The trickier part is that Energy Shield contributes to the life degeneration of the player when running Righteous Fire.

Printing Pretty Ascii Trees

pic

I’m terrible at writing recursive algorithms. Absolutely horrible. So for practice I decided to try printing a binary tree where all branches have two children, all in the console. It turns out this is a pretty useful quick-n-dirty debugging tool when creating trees.

I really like formatting of Tree for linux so I decided to steal it.

After 3 straight hours of trying out stupid stuff I finally come to a working solution (my earlier attempts were around 150 lines of code, the solution was like 70):

Which will print cool trees like this one:

I then realized that I could use singly linked lists to represent trees with any number of children, which is totally necessary for the parser I am currently writing. So after 30 minutes I was able to form version 2 for arbitrary trees; this version is surprisingly right around the same length and complexity as the previous version:

And will print trees like this monster:

I also ended up finding a pretty cool implementation by a random stack-overflow user that prints trees in horizontal format (which turns out to be much more limited for the purposes of a console, since it can’t print larger trees, but might look a lot prettier for small trees):

It may be preferable to upgrade to the limited edition Box Drawing characters for best viewing pleasure:

Capture

 

In which case you’ll probably want to use some specific character codes (I used 195, 196 and 192 in the above image).

Parsing C Style Expressions

Capture

Turns out that constructing a hand-written C-style parser has a few parts that were very difficult for me.

The first thing was realizing that Backus Naur Form (BNF) largely sucks if you want to hand-write your own parser. BNF is really verbose and expressing simple things like optional terminals or lists is difficult. BNF is also poor for expressing operator precedence, as many intermediate and redundant non-terminals are required to be evaluated during parse-tree derivation. As an alternative Extended Backus Naur Form is perfect for languages that plan to use hand-written parsers instead of parsers created by parser generators. Left-factoring a BNF for LL parsing is also not very useful since handling infinite recursion with hand-written code is trivial.

The second thing is that parsing expressions with various types of operators can be really difficult, especially if there’s a lack of confidence in recursion (like myself). Creating a parse tree given a string representing an expression is a very recursive problem.

In C expressions consist of atoms and operators. An atom would be a literal, constant value, identifier, or an expression wrapped in parentheses. Operators are the usual + or – kind of tokens.

If each operator has an associated precedence value there are a few different algorithms out there with references for learning. I ended up face in the dirt and in the end derived what is known as “precedence climbing“. According to Eli Bendersky precedence climbing is what is currently used by Clang to parse C++ expressions! That ought to instill some perceived merit. From what I can tell Lua 5.3 uses (well, very close to) the same method.

The idea of precedence climbing is to think of two major recursive operations:

  • Compute righthand-side node
  • Make a binary operator node and connect lefthand-side and righthand-side children nodes

The first point is the complex one (that is, conceptually complex). The algorithm starts given a lefthand-side node, however, righthand-side nodes do not come in through the input stream in tree format; the next token can represent a node that should be much deeper in the tree — this means that computing the righthand-side node ought to be the main recursive path.

Realizing that the righthand-side node computation is the recursive path led me to notice a key observation that tipped me off to a working algorithm.

Say we have the following input string as an expression: A 2 B 1 C 4 D 3 E 7 F

Numbers are operators, and the number itself is precedence (higher number is higher precedence), letters are atoms (like a const int variable). Here’s the valid parse tree:

chart

The lowest leaves are evaluated first. It’s easy to see that the tree itself encodes the operator precedence.

If we begin parsing our input string the first atom is A, which will always be a lefthand-side node for most any parsing algorithm used, and will likely be the left-most node in the tree. The next token is the 2 operator followed by B. It’s easy enough to construct the subtree of node 2 pointing to A and B.

The next input token is the operator 1 and atom C. C is bound by operator precedence to the operator 4, though the current state of the algorithm has yet to even read in the token 4. Studying this scenario is what tipped me off to a working solution; C must be treated as a lefthand-side node, though at the current state is considered a potential righthand-side node.

Wikipedia, and this link from earlier, both show great pseudo code for the precedence climbing algorithm. The main difference between the two links is wikipedia includes a nested for-loop in favor of less overall recursive calls. My own code ended up looking something like after I cleaned it up from influences of previous links:

In the end I’m quite happy with the result, and even hooked up a nice ascii-tree printer courtesy of a random stack-overflow user. Here’s a dot product and initialization trees in ascii:

My favorite part about the operator precedence climbing algorithm is how it handles parentheses and prefix unary operators: parentheses can be considered an atom, and when the atom function finds a parentheses is just calls the expression parsing function directly and returns the result! The same can be done for prefix unary operators (if they have really high precedence). The algorithm also trivially handles right-associativity. I haven’t yet thought about unary postfix operators, so if any reader has thoughts on this topic please do comment!

Here’s psuedo-y atom code:

 

Interactive Cubic Bezier Splines

Capture

Cubic Bezier splines look rather beautiful. Splines in general look great and are super useful for all sorts of cool stuff.

Creating splines usually involves defining some control points and moving those around until the desired effect is achieved. However, clicking on or interacting with the spline itself turns out to be a lot harder than just messing with control points.

In particular I want to talk about cubic Bezier splines. This kind of spline is defined as 3 * n + 1 control points, where n is the number of Bezier curves in the spline. The first four points represent the first curve, then the last point and the next three represent the next curve, and so on.

The resulting spline can look like the curved arrows in this graph:

http://www.graphviz.org/Gallery/directed/datastruct.png

Actually interacting with these splines would be pretty difficult since cartesian coordinates are not in the same basis as the spline itself. What is needed is the ability to express a relation between these two spaces.

Since I haven’t studied curves in great detail there’s much math that goes beyond me (for the time being) in solving this problem. However in Graphics Gems I there’s a section called “Solving The Nearest-Point-On-Curve Problem” where some public domain C code is given that computes the closest point on a cubic Bezier curve to a given input point. The C code is really old and actually calls malloc in a recursive function, presumably to avoid stack overflows on old machines with minimal memory. This can be converted to use a stack data structure on the process stack space, all without much work for the user.

After a few minutes I was able to set up a cool demo:

closest_point_on_bezier2

This tool would be great for some kind of interactive spline editor. It would also be perfect for colliding circles (or spheres in 3D), making the curve itself a potential candidate for a physics engine collider type. Since the curve is defined by control points, the curve can actually deform at run-time without any big complications.

Approximating the Curve

In reality a spline like this would probably just be “rasterized” to some kind of approximate shape that is easier for a physics engine to collide, rather than dealing with the exact spline itself. This is likely due to there not being a straightforward way to compute the closest point on the spline to a plane (at least, I sure don’t know how to write that code).

One tactic would be to interpolate a few points out of the curve (and by few I mean as many as needed), and then use these points as an approximate representation of the curve. This could be like converting the curve to a line strip, which might make interactivity much simpler in many cases.

Generating these points can be a piece of cake. Personally I’d just write out some lerps and call it good. If we look at the simplest Bezier curve with 2 control points it’s easy to see it’s just a single lerp (these animations were taken from the Wikipedia page for Bezier curves):

Bézier_1_big

 

If we add a third point we can perform two lerps, and then lerp the results together. Intuitively this gives us a quadratic Bezier curve, and looks like:

Bézier_2_big

 

Following suit we can use 4 control points and create a cubic Bezier curve:

Bézier_3_big

 

An easy way to convert a cubic Bezier curve to a line strip involves sampling points on curve at time t, where t is on the interval of 0 to 1. Here’s some example code:

For example, Box2D defines a “chain shape” as a line strip of segments. Generally this strip is used to approximate curved, non-moving surfaces. Splines can very easily be converted to strips and sent to Box2D to define a chain shape.

Some Spliney Links

  • http://phildogames.com/blog/spline.html
  • http://www.gamasutra.com/view/feature/131755/curved_surfaces_using_bzier_.php
  • http://www.jasondavies.com/animated-bezier/
  • http://pomax.github.io/bezierinfo/
  • http://ciechanowski.me/blog/2014/02/18/drawing-bezier-curves/

Capsule to Convex – SAT Edge Prune via Gauss Map

In 2013 Dirk Gregorius of Valve presented at GDC on the topic of the Separating Axis Theorem. In his talk he studied an useful property of the Minkowski Difference between two 3D convexes: edge pairs from each shape may or may not contribute to the convex hull of the Minkowski Difference.

It is possible to derive simple predicate functions to reduce the number of edge pair queries, and to simplify implementation, during collision detection via the Separating Axis Theorem. This article shows a derivation of this predicate for the Capsule to Convex case.

Here is the PDF containing the derivation:

Download (PDF, Unknown)

SIMD – Matrix3x3 Transpose

Just recently I finished implementing my own personal SIMD math library using SSE intrinsics. There are two major resources for learning how to write effective SIMD code:

While inspecting the DirectXMath source I came across the implementation of transposing a 4×4 matrix:

Lately I have been working only with 3×3 matrices and vectors. This is nice since often times 4×4 matrices store mostly useless data in the bottom row. In effect some kind of 3×4 matrix can be stored in memory to represent an affine transformation:

Depending on what the code is used for the rotation matrix can have scaling built in, or not. Often times only uniform scaling is desired so that chains of transformations can easily be reversed an decomposed freely.

Since I’m only dealing with 3×3 matrices I decided to cut down on the number of shuffles as best I could, and ended up with this implementation:

Only 5 shuffles are used here instead of the 8 from DirectXMath for the 4×4 transpose. I did not really take care of handling the w component of any of the __m128′s during the whole process. In general I just left the shuffles for w as 0.

I really don’t think another shuffle can be removed in the 3×3 case, so any further optimizations would probably be outside my realm of knowledge. If anyone knows of anything else interesting as far as transposition goes feel free to comment below.

Note: On Windows if anyone is wondering why my function does not incur a compiler error complaining about parameter alignment, be sure to lookup __vectorcall for Visual Studio 2013.

Minimum Bounding Sphere

Today I was reading this article by Ericson. The article is on the derivation involved in writing a function to calculate the minimum bounding sphere of a set of three or four points. The topic itself isn’t very interesting (to me) when compared to the algebra itself. Ericson skips the algebra I had trouble with, so my article here will serve the purpose to record the algebra he skipped (in case other readers had trouble, or I ever want to reference the math again). The article I am writing here directly follows Ericson’s article, so I will assume readers have read and understood Ericson’s article.

Say we have a set of three points S = {A, B, C}, which can be thought of as a triangle. The problem of calculating the minimum bounding sphere of S involves checking to see if the circumcenter of S lies within or outside the triangle S. Lets name the circumcenter P. One immediate implementation involves computing P, followed by the computation of the barycentric coordinates of P with respect to S. These coordinates can be used check if P lay within S.

However computing P of S followed by the barycentric coordinates of P with respect to S involves a lot of redundant computations. Furthermore, if P does not lay within S then P itself does not need to be computed at all, since only the barycentric coordinates were needed.

It would be nice to compute barycentric coordinates of P with respect to S directly, and only if necessary construct P.

P in relation to S can be defined as:

\begin{equation}
\label{eq1}
P = A + s*(B – A) + t*(C – A)
\end{equation}

Where \(s\) and \(t\) are barycentric coordinates of S such that \(s – t – (1.0 – s – t) = 0\) if P is within S. Since P is equidistant from A, B and C, we can express P in relation to S with the following:

\begin{equation}
\label{eq2}
dot(P – B, P – B) = dot(P – A, P – A)
\end{equation}

\begin{equation}
\label{eq3}
dot(P – C, P – C) = dot(P – A, P – A)
\end{equation}

The interesting part involves plugging \eqref{eq1} into \eqref{eq2} and \eqref{eq3} followed by a collection of terms. Since I myself am a noob when it comes to algebra I had to go look up algebraic properties of the dot product to make sure I did not screw anything up. In particular these few rules are useful:

\begin{equation}
\label{eq4}
dot(u, v + w) = dot(u, v) + dot(u, w) \\
dot(s * u, v) = s*dot(u, v) \\
-dot(u – v, u – w) = dot(v – u, u – w)
\end{equation}

Lets go over substituting \eqref{eq1} into \eqref{eq2} directly, however lets only consider the first part of \eqref{eq2} \(Dot(P – B, P – B)\) to keep it simple:

\begin{equation}
\label{eq5}
dot(A + s*(B – A) + t*(C – A) – B, A + s*(B – A) + t*(C – A) – B)
\end{equation}

Since things immediately get really long some substitutions help to prevent errors:

\begin{equation}
\label{eq6}
u = A – B\\
v = s*(B – A) + t*(C – A)\\
w = u + v \\
\end{equation}

\begin{equation}
\label{eq7}
Dot(w, u + v)
\end{equation}

\begin{equation}
\label{eq8}
Dot(w, u) + Dot(w, v)
\end{equation}

\begin{equation}
\label{eq9}
Dot(u + v, u) + Dot(u + v, v)
\end{equation}

\begin{equation}
\label{eq10}
Dot(u, u) + Dot(v, u) + Dot(u, v) + Dot(v, v)
\end{equation}

If I were to expand \eqref{eq10} on this webpage it would not fit. Instead we will make use of a few more substitutions and then arrive in the final form:

\begin{equation}
\label{eq11}
x = s*(B – A) \\
y = t*(C – A) \\
x + y = v
\end{equation}

\begin{equation}
\label{eq12}
Dot(A – B, A – B) + 2*Dot(A – B, x + y) + Dot(x + y, x + y)
\end{equation}

By following the same process we can finish the substitution and notice that:

\begin{equation}
\label{eq13}
Dot(P – A, P – A) = Dot(x + y, x + y)
\end{equation}

The final form of the substitution would be:

\begin{equation}
\label{eq14}
Dot(A – B, A – B) + \\ 2*Dot(A – B, x + y) + \\ Dot(x + y, x + y) = Dot(x + y, x + y)
\end{equation}

\begin{equation}
\label{eq15}
Dot(A – B, A – B) + 2*Dot(A – B, x + y) = 0
\end{equation}

\begin{equation}
\label{eq16}
Dot(A – B, A – B) + \\ 2*Dot(A – B, s*(B – A)) + \\ 2*Dot(A – B, t*(C – A)) = 0
\end{equation}

\begin{equation}
\label{eq17}
s*Dot(B – A, B – A) + t*Dot(B – A, C – A) = \\ (1/2)*Dot(B – A, B – A)
\end{equation}

This final equation \eqref{eq17} matches exactly what Ericson came up with on his own blog. Through a similar process \eqref{eq1} can be substituted into \eqref{eq3}, which would result in:

\begin{equation}
\label{eq18}
s*Dot(C – A, B – A) + t*dot(C – A, C – A) = \\ (1/2)*dot(C – A, C – A)
\end{equation}