- DirectXMath (reference pages, you want to find the source code and read that too!)
- Designing Fast Cross-Platform SIMD Vector Libraries

While inspecting the DirectXMath source I came across the implementation of transposing a 4×4 matrix:

XMFINLINE XMMATRIX XMMatrixTranspose( CXMMATRIX M ) { // x.x,x.y,y.x,y.y XMVECTOR vTemp1 = _mm_shuffle_ps(M.r[0],M.r[1],_MM_SHUFFLE(1,0,1,0)); // x.z,x.w,y.z,y.w XMVECTOR vTemp3 = _mm_shuffle_ps(M.r[0],M.r[1],_MM_SHUFFLE(3,2,3,2)); // z.x,z.y,w.x,w.y XMVECTOR vTemp2 = _mm_shuffle_ps(M.r[2],M.r[3],_MM_SHUFFLE(1,0,1,0)); // z.z,z.w,w.z,w.w XMVECTOR vTemp4 = _mm_shuffle_ps(M.r[2],M.r[3],_MM_SHUFFLE(3,2,3,2)); XMMATRIX mResult; // x.x,y.x,z.x,w.x mResult.r[0] = _mm_shuffle_ps(vTemp1, vTemp2,_MM_SHUFFLE(2,0,2,0)); // x.y,y.y,z.y,w.y mResult.r[1] = _mm_shuffle_ps(vTemp1, vTemp2,_MM_SHUFFLE(3,1,3,1)); // x.z,y.z,z.z,w.z mResult.r[2] = _mm_shuffle_ps(vTemp3, vTemp4,_MM_SHUFFLE(2,0,2,0)); // x.w,y.w,z.w,w.w mResult.r[3] = _mm_shuffle_ps(vTemp3, vTemp4,_MM_SHUFFLE(3,1,3,1)); return mResult; }

Lately I have been working only with 3×3 matrices and vectors. This is nice since often times 4×4 matrices store mostly useless data in the bottom row. In effect some kind of 3×4 matrix can be stored in memory to represent an affine transformation:

struct Transform { Matrix3x3 rotation; Vector3 position; };

Depending on what the code is used for the rotation matrix can have scaling built in, or not. Often times only uniform scaling is desired so that chains of transformations can easily be reversed an decomposed freely.

Since I’m only dealing with 3×3 matrices I decided to cut down on the number of shuffles as best I could, and ended up with this implementation:

struct m3 { __m128 ex, ey, ez; }; inline m3 Transpose( m3 a ) { __m128 t0 = Shuffle( a.ex, a.ey, 1, 0, 1, 0 ); __m128 t1 = Shuffle( a.ex, a.ey, 2, 2, 2, 2 ); __m128 x = Shuffle( t0, a.ez, 0, 0, 2, 0 ); __m128 y = Shuffle( t0, a.ez, 0, 1, 3, 1 ); __m128 z = Shuffle( t1, a.ez, 0, 2, 2, 0 ); m3 b; b.ex = x; b.ey = y; b.ez = z; return b; }

Only 5 shuffles are used here instead of the 8 from DirectXMath for the 4×4 transpose. I did not really take care of handling the w component of any of the __m128′s during the whole process. In general I just left the shuffles for w as 0.

I really don’t think another shuffle can be removed in the 3×3 case, so any further optimizations would probably be outside my realm of knowledge. If anyone knows of anything else interesting as far as transposition goes feel free to comment below.

Note: On Windows if anyone is wondering why my function does not incur a compiler error complaining about parameter alignment, be sure to lookup __vectorcall for Visual Studio 2013.

]]>Say we have a set of three points S = {A, B, C}, which can be thought of as a triangle. The problem of calculating the minimum bounding sphere of S involves checking to see if the *circumcenter* of S lies within or outside the triangle S. Lets name the circumcenter P. One immediate implementation involves computing P, followed by the computation of the barycentric coordinates of P with respect to S. These coordinates can be used check if P lay within S.

However computing P of S followed by the barycentric coordinates of P with respect to S involves a lot of redundant computations. Furthermore, if P does not lay within S then P itself does not need to be computed at all, since only the barycentric coordinates were needed.

It would be nice to compute barycentric coordinates of P with respect to S directly, and only if necessary construct P.

P in relation to S can be defined as:

\begin{equation}

\label{eq1}

P = A + s*(B – A) + t*(C – A)

\end{equation}

Where \(s\) and \(t\) are barycentric coordinates of S such that \(s – t – (1.0 – s – t) = 0\) if P is within S. Since P is equidistant from A, B and C, we can express P in relation to S with the following:

\begin{equation}

\label{eq2}

dot(P – B, P – B) = dot(P – A, P – A)

\end{equation}

\begin{equation}

\label{eq3}

dot(P – C, P – C) = dot(P – A, P – A)

\end{equation}

The interesting part involves plugging \eqref{eq1} into \eqref{eq2} and \eqref{eq3} followed by a collection of terms. Since I myself am a noob when it comes to algebra I had to go look up algebraic properties of the dot product to make sure I did not screw anything up. In particular these few rules are useful:

\begin{equation}

\label{eq4}

dot(u, v + w) = dot(u, v) + dot(u, w) \\

dot(s * u, v) = s*dot(u, v) \\

-dot(u – v, u – w) = dot(v – u, u – w)

\end{equation}

Lets go over substituting \eqref{eq1} into \eqref{eq2} directly, however lets only consider the first part of \eqref{eq2} \(Dot(P – B, P – B)\) to keep it simple:

\begin{equation}

\label{eq5}

dot(A + s*(B – A) + t*(C – A) – B, A + s*(B – A) + t*(C – A) – B)

\end{equation}

Since things immediately get really long some substitutions help to prevent errors:

\begin{equation}

\label{eq6}

u = A – B\\

v = s*(B – A) + t*(C – A)\\

w = u + v \\

\end{equation}

\begin{equation}

\label{eq7}

Dot(w, u + v)

\end{equation}

\begin{equation}

\label{eq8}

Dot(w, u) + Dot(w, v)

\end{equation}

\begin{equation}

\label{eq9}

Dot(u + v, u) + Dot(u + v, v)

\end{equation}

\begin{equation}

\label{eq10}

Dot(u, u) + Dot(v, u) + Dot(u, v) + Dot(v, v)

\end{equation}

If I were to expand \eqref{eq10} on this webpage it would not fit. Instead we will make use of a few more substitutions and then arrive in the final form:

\begin{equation}

\label{eq11}

x = s*(B – A) \\

y = t*(C – A) \\

x + y = v

\end{equation}

\begin{equation}

\label{eq12}

Dot(A – B, A – B) + 2*Dot(A – B, x + y) + Dot(x + y, x + y)

\end{equation}

By following the same process we can finish the substitution and notice that:

\begin{equation}

\label{eq13}

Dot(P – A, P – A) = Dot(x + y, x + y)

\end{equation}

The final form of the substitution would be:

\begin{equation}

\label{eq14}

Dot(A – B, A – B) + \\ 2*Dot(A – B, x + y) + \\ Dot(x + y, x + y) = Dot(x + y, x + y)

\end{equation}

\begin{equation}

\label{eq15}

Dot(A – B, A – B) + 2*Dot(A – B, x + y) = 0

\end{equation}

\begin{equation}

\label{eq16}

Dot(A – B, A – B) + \\ 2*Dot(A – B, s*(B – A)) + \\ 2*Dot(A – B, t*(C – A)) = 0

\end{equation}

\begin{equation}

\label{eq17}

s*Dot(B – A, B – A) + t*Dot(B – A, C – A) = \\ (1/2)*Dot(B – A, B – A)

\end{equation}

This final equation \eqref{eq17} matches exactly what Ericson came up with on his own blog. Through a similar process \eqref{eq1} can be substituted into \eqref{eq3}, which would result in:

\begin{equation}

\label{eq18}

s*Dot(C – A, B – A) + t*dot(C – A, C – A) = \\ (1/2)*dot(C – A, C – A)

\end{equation}

Here’s a quick mock header of some linked list nodes for reference:

// Singly linked list struct Node { Node* next; Data data; }; // Doubly linked list struct Node { Node* next; Node* prev; Data data; };

In general singly linked lists are more complicated to manage once removal of nodes is required. Since no explicit prev pointer is stored in memory a temporary variable is often kept on the stack while traversing a singly linked list. This means more complicated code that clogs the user’s focus.

Even though a doubly linked list requires twice the memory they are usually still preferred over singly linked lists, even when a singly linked list could get the job done without any additional time complexity. Often times linked lists are useful in complex algorithms, and if there’s a chance to simplify the implementation of a complex algorithm by using a doubly linked list, then that chance is probably worth the taking.

When I first implemented a doubly linked list and tested its performance out against std::list I couldn’t quite get it to perform well.

Naive insertion and removal of list nodes often has to check for NULL pointers, which represent the front and back of the linked list. Here’s an example of what removal might look like to give you the idea of how many if-statements could be necessary (code not tested, I just typed it out here on the spot):

void List::Remove( Node* node ) { if ( node->next ) { if ( node->prev ) { node->next->prev = node->prev; node->prev->next = node->next; } else { node->next->prev = node->prev; head = node->next; } } else { if ( node->prev ) { node->prev->next = NULL; } else { head = NULL; } } }

There are two if statements hit every single time this function is called. When the CPU comes across a branch is loads instructions based on which path of execution it deems most likely. This is called *branch prediction*. If this prediction is incorrect the loaded code must be unloaded, and then the appropriate code must be re-loaded.

This branch missing probably going to be a very fast CPU operation since executing code is almost definitely in the L-1 code cache. Despite it being fast modern CPU still operate through a pipeline, and branch misses can still garble up whatever pipelining is happening. In the end a branch miss is a performance hit, and should be avoided when appropriate.

A common linked list optimization is to use a dummy head and tail node. These nodes sit in memory along with the list data structure. Upon list initialization they point their next and previous pointers to one another, and NULL out the pointers to represent the front and back of the list.

With this optimization the only case that user nodes will ever encounter is the case in the first two if statements (assuming both were true). The removal code can now look something like (again, not tested):

void List::Remove( Node* node ) { node->next->prev = node->prev; node->prev->next = node->next; }

This is one kind of optimization the std implements. After doing this myself my list performed evenly with the std’s implementation.

Intrusively linked lists invert the definition of what a node is. Traditionally a linked list node contains some data. An intrusive list has the data contain the node:

struct Data { Data* next; Data* prev; // actual data goes somewhere in here... };

This scheme is nice since now nodes do not need to be allocated separately from the data. If the number of data elements is known, then the exact number of nodes needed can also be known.

C++ templates can be used to create a generic intrusively linked list implementation, able to define nodes inside of any data type. C macros can also be used to the same effect. In this way an intrusively linked list can be used in pretty much the same way a normal linked list is.

One major downside to intrusively linked lists is that they add in extra memory to your data. This can be a big deal if some code is very performance sensitive. If cache line utilization is important, then the percentage of data actually used in each line becomes important. Sometimes these pointers get in the way and clutter the lines. This cluttering is something to be aware of.

On the flip side many algorithms can run on arrays of data. Instead of storing explicit pointers to represent prev and next connections, indices into an array can be used. This can make entire data structures memcpy-able, or serializable just by dumping bits to a stream. Additionally, the pointers stored directly within data will often be accessed as the exact same time (depending on the algorithm), which results in very high cache line utilization.

It all depends on the scenario.

When dealing with intrusive linked lists it can often be really weird to define where in memory dummy nodes would reside. Are we to create dummy pieces of *Data*? What if the algorithm needs lists to be constantly created and destroyed? What if the algorithm can have as many lists as there are nodes? Suddenly the algorithm might need twice as many dummy nodes as actual nodes!

For example imagine a hash table implementing with collision chaining. If we wish to use doubly linked lists dummy nodes are probably out of the question if you care about all the wasted memory. However, it does suck to take a performance hit constantly testing for NULL node connections.

It is possible to remove the dummy nodes in many cases. *Data* elements can be initialized to point to themselves. In this way each element is itself a doubly linked list with one node. To insert a second node is a matter of making both nodes point to each other. Inserting a third node should use the exact same code as inserting the second node (and not require any branching since NULL indices/pointers do not exist), and so on.

In many cases an intrusive circular doubly linked list (boy, isn’t that a mouthful) can be the perfect solution to a hard problem! I will leave it as an exercise to research or implement this circular style of linked list.

Another name for this type of list would be a “sentinel intrusive list”, where a sentinel node can be used to bound a list traversal. Since our linked lists are circular we can start at any node, traverse the list, and once we reach the node we started upon our traversal is complete.

]]>Wouldn’t std::girlfriend be great? We can plug in any type of girlfriend we want into the template parameters and the compiler will just generate one for us! Why in the world would std::girlfriend be omit from STL?

Oh of course, std::girlfriend was never implemented because everyone is just going to put in way too many specific template types (super hot, not crazy) and it’ll just end in a bunch of “failed to specialize template” error messages. And then the moment too many of the template parameters are removed we’ll just get a bunch of “multiple symbols defined” linker errors! Maybe it was a good idea to never implement std::girlfriend in the first place. After all, a girlfriend prefixed with std might make one thing of something other than C++…

Jokes aside I brought up the fact that inline is totally useless for inlining. The only real reason to use the inline keyword (in my opinion) is to able to define functions within a header. Well, I brought it up as a joke, but not really a joke, and that’s the joke.

The inline keyword and .inl files can actually be a pretty nice organizational tool for code, and I’ve found it helps users that didn’t write the implementation understand the code.

Say we are implementing some kind of algorithm that stores elements in an array. Elements need to refer to one another (perhaps to build intrusive linked lists), although these arrays ought to be relocated in memory without requiring any complex copy routines; a single memcpy should yield a new and valid copy.

One way to do so is to make use of array indices instead of pointers. Usually a myriad of small helper functions will arise to clean up all of the array indexing that usually ensues shortly after this kind of code crops up. It’s a huge pain to look into a .cpp and have to continually navigate passed a lot of tiny and trivial helper functions just to understand the algorithm.

These small helpers can be swept to the side into a .inl file. The .inl file signature immediately tells the user what kind of code resides within (either templates or inlined functions), and usually this kind of code isn’t very necessary to understand the more heavy duty code within the .cpp file.

Here’s a mock example:

// ComplexAlgorithm.h struct Data { int elementCapacity; int elementCount; Element* elements; // More things here... }; void ComplexAlgorithm( Data* memory ); #include "ComplexAlgorithm.inl"

// ComplexAlgorithm.cpp #include "ComplexAlgorithm.h" void SmallerAlgorithmUsedInComplexAlgorithm( Data* memory ) { // ... } void ComplexAlgorithmHelper( Data* memory ) { // ... } void ComplexAlgorithm( Data* memory ) { // ... }

// ComplexAlgorithm.inl inline Element* Data::GetElement( int i ) { return elements + i; } inline i32 Data::NextIndex( int i ) { return elements + elements[ i ].next; } inline void AnotherHelper( ) { } inline void YetAnotherHelper( ) { }

Aren’t these example files pretty easy going to read? I’m sure you at least scanned the .inl file briefly, and will probably never really need to look at it again. Time will be well spent in the .cpp file with less code to clog your brain. And who knows, maybe the compiler (or perhaps the linker) actually cares a little bit when we type the inline keyword.

]]>The trick is super simple: just process the first element outside of the loop to set up your initial conditions, then form your loop to skip the first element. An assumption would be made that there’s at least one element in the array to process. Here’s an example for computing an AABB:

AABB aabb; aabb.min = verts[ 0 ]; aabb.max = verts[ 0 ]; for ( int i = 1; i < vertCount; ++i ) { aabb.min = Min( aabb.min, verts[ i ] ); aabb.max = Max( aabb.max, verts[ i ] ); }

Usually I myself would have written this kind of code like so and not given any more thought to it:

AABB aabb; aabb.min = Vec3( FLT_MAX, FLT_MAX, FLT_MAX ); aabb.max = Vec3( -FLT_MAX, -FLT_MAX, -FLT_MAX ); for ( int i = 0; i < vertCount; ++i ) { aabb.min = Min( aabb.min, verts[ i ] ); aabb.max = Max( aabb.max, verts[ i ] ); }

This second code chunk is arguably just slightly more esoteric and is definitely a little less efficient for no good reason.

One could also skip the first element when finding the min/max of any sort of array, like for example: dot product results. Though simple it’s pretty nice to find small ways to write slightly better code.

]]>// Reflection demonstration by Randy Gaul // Reflection lets C++ users store information about the different types // of data their program uses. This means non-templated structs are used // to store relevant information about types that would usually be lost // during compilation. These TypeInfo structs can be used during run-time // to write more type-aware code. This means that reflection can be used // to automate certain processes that deal with interpreting memory. // Examples of areas that can be automated: script binding, serialization, // visual editor support, etc. #include <cstdio> #include <cassert> #include <cstring> // Stores information about a type. Things like the name of a type, // size in bytes, and other things are important to keep track of. struct TypeInfo { const char* name; int size; // You can add an array of TypeInfo pointers here to represent // relationships between different types (like inheritance parent // child relationships, or struct/class data members). }; // Used to store a pointer to a TypeInfo struct as a static parameter // of a templated static function. This lets the user lookup TypeInfo // pointers given a template type parameter. template < typename T > struct TypeLookupByTemplate { static TypeInfo* GetType( TypeInfo* typeInfo ) { static TypeInfo* s_typeInfo = NULL; if ( !s_typeInfo ) { s_typeInfo = typeInfo; } return s_typeInfo; } }; // You can make this larger if you need to #define MAX_TYPE_INFOS 128 TypeInfo typeInfos[ MAX_TYPE_INFOS ]; int typeInfosCount; // Add a type to the type infos array. For now this just stores // name and size of a type. #define ADD_TYPE( T ) \ do { \ assert( typeInfosCount < MAX_TYPE_INFOS ); \ TypeInfo* typeInfo = typeInfos + typeInfosCount++; \ typeInfo->size = sizeof( T ); \ typeInfo->name = #T; \ TypeLookupByTemplate< T >::GetType( typeInfo ); \ } while ( 0 ); // Allows code to get a pointer to a TypeInfo through a template type lookup ID. #define GET_TYPE_BY_TEMPLATE( T ) TypeLookupByTemplate< T >::GetType( NULL ) TypeInfo* GetTypeByString( const char* typeNameCharPointer ) { for ( int i = 0; i < typeInfosCount; ++i ) { if ( !strcmp( typeNameCharPointer, typeInfos[ i ].name ) ) { return typeInfos + i; } } // Unable to find a specific type by string assert( false ); return NULL; } // Lookup a type info struct by string ID. // Looking up a type with an O( N ) loop will probably be fast enough for most // use cases. If this ever becomes a problem a hash table can be used here. #define GET_TYPE_BY_STRING( typeNameCharPointer ) GetTypeByString( typeNameCharPointer ) // A Variable struct contains a pointer to some memory. The typeInfo pointer // stores information about the type of data that the void pointer points to. // This is useful to pass many types of data to a single function, thus red- // ucing code duplication in many areas. struct Variable { // Functions to initialize the data and typeInfo pointers void Set( void* dataPtr, TypeInfo* typeInfoPtr ) { data = dataPtr; typeInfo = typeInfoPtr; } template < typename T > void Set( T& typedData ) { data = &typedData; typeInfo = GET_TYPE_BY_TEMPLATE( T ); } // If any code would like to retrieve the explicit data the code // must provide a templated type to cast to. template < typename T > T& GetValue( ) { // An assert here can force type safety assert( GET_TYPE_BY_TEMPLATE( T ) == typeInfo ); return *(T*)data; } void* data; TypeInfo* typeInfo; }; int main( ) { // Add some type information to the reflection system ADD_TYPE( int ); ADD_TYPE( float ); // Print out stored type information from the reflection system TypeInfo* intTypeInfo = GET_TYPE_BY_STRING( "int" ); printf( "%s\n", intTypeInfo->name ); printf( "%d\n", intTypeInfo->size ); TypeInfo* floatTypeInfo = GET_TYPE_BY_TEMPLATE( float ); printf( "%s\n", floatTypeInfo->name ); printf( "%d\n", floatTypeInfo->size ); // Create an integer and then create a Variable that describes the integer // and holds a pointer to the integer in memory. int x = 10; Variable var; var.Set( x ); // Variables let you query name and size information printf( "%s\n", var.typeInfo->name ); printf( "%d\n", var.typeInfo->size ); // Variables also let you cast to the contained type printf( "%d\n", var.GetValue< int >( ) ); }

]]>

Here’s the link to Ericson’s blog post on tolerances: Tolerances Revisited. Please note there’s a little bit of ambiguity about whether the test should care if the vectors point in the same direction or not. In general it doesn’t really matter since the point of the slides is numeric robustness.

The gist of the slides is that the scale of the vectors in question matters for certain applications. Computing a good relative epsilon seems difficult, and maybe different epsilon calculations would be good for different applications. I’m not sure!

Here’s a demo you can try out yourself to test two of the solutions from the slides (tests returning true should be considered false positives):

#include <cstdio> #include <cmath> struct Vec3 { float x; float y; float z; Vec3 operator*( float a ) { Vec3 v; v.x = x * a; v.y = y * a; v.z = z * a; return v; } }; float Dot( Vec3 a, Vec3 b ) { return a.x * b.x + a.y * b.y + a.z * b.z; } Vec3 Normalize( Vec3 a ) { return a * (1.0f / std::sqrt( Dot( a, a ) )); } bool Parallel0( Vec3 a, Vec3 b, float tol ) { a = Normalize( a ); b = Normalize( b ); float dot = 1.0f - std::abs( Dot( a, b ) ); return dot < tol; } bool Parallel1( Vec3 a, Vec3 b, float tol ) { if ( Dot( a, b ) < 0 ) { b.x = -b.x; b.y = -b.y; b.z = -b.z; } float k = a.y / b.y; b = b * k; float x = std::abs( a.x - b.x ); float y = std::abs( a.y - b.y ); float z = std::abs( a.z - b.z ); if ( x < tol && y < tol && z < tol ) return true; return false; } void Tolerance( float tol ) { Vec3 u; u.x = 0.0f; u.y = 0.9812398512f; u.z = 0.001f; u = Normalize( u ); Vec3 v; v.x = 0.0f; v.y = 10.0f; v.z = 1.0f; printf( "Tolerance: %f\n", tol ); bool result = Parallel0( u, v, tol ); printf( "Parallel0: %s\n", result ? "true" : "false" ); result = Parallel1( u, v, tol ); printf( "Parallel1: %s\n", result ? "true" : "false" ); printf( "\n" ); } int main( ) { Tolerance( 1.0e-1f ); Tolerance( 1.0e-2f ); Tolerance( 1.0e-3f ); }

Output of the program is:

Tolerance: 0.100000 Parallel0: true Parallel1: true Tolerance: 0.010000 Parallel0: true Parallel1: false Tolerance: 0.001000 Parallel0: false Parallel1: false]]>

]]>

The idea is to take an octohedron, or icosahedron, and subdivide the triangles on the surface of the mesh, such that each triangle creates 4 new triangles (each triangle creates a Triforce symbol).

One thing the stack overflow page didn’t describe is intermittent normalization between each subdivision. If you imagine subdividing an octohedron over and over without any normalization, the final resulting sphere will have triangles that vary in size quite a bit. However, if after each subdivision every single vertex is normalized then the vertices will snap to the unit sphere more often. This results in a final mesh that has triangles of closer to uniform geodesic area.

The final mesh isn’t purely geodesic and there will be variation in the size of the triangles, but it will be hardly noticeable. Sphere meshes will look super nice and also behave well when simulating soft bodies with Matyka’s pressure volume.

Here’s an example program you can use to perform some subdivisions upon an octohedron (click here to view the program’s output):

#include <cstdio> #include <cmath> #include <cassert> struct Vec3 { Vec3( ) { } Vec3( float x0, float y0, float z0 ) { x = x0; y = y0; z = z0; } float x; float y; float z; const Vec3 operator+( const Vec3& a ) { Vec3 v; v.x = x + a.x; v.y = y + a.y; v.z = z + a.z; return v; } const Vec3 operator-( const Vec3& a ) { Vec3 v; v.x = x - a.x; v.y = y - a.y; v.z = z - a.z; return v; } const Vec3 operator*( float a ) const { Vec3 v; v.x = x * a; v.y = y * a; v.z = z * a; return v; } }; float Dot( Vec3 a, Vec3 b ) { return a.x * b.x + a.y * b.y + a.z * b.z; } inline const Vec3 Cross( const Vec3& a, const Vec3& b ) { return Vec3( (a.y * b.z) - (b.y * a.z), (b.x * a.z) - (a.x * b.z), (a.x * b.y) - (b.x * a.y) ); } const Vec3 Normalize( Vec3 v ) { return v * (1.0f / std::sqrt( Dot( v, v ) )); } const int stackSize = 1024 * 8; Vec3 in[ stackSize ]; int ip = 0; Vec3 out[ stackSize ]; int op = 0; Vec3 octohedron[ 6 ]; void Subdivide( void ) { op = 0; for ( int i = 0; i < ip; i += 3 ) { Vec3 a = in[ i ]; Vec3 b = in[ i + 1 ]; Vec3 c = in[ i + 2 ]; Vec3 ab = (a + b) * 0.5f; Vec3 bc = (b + c) * 0.5f; Vec3 ca = (c + a) * 0.5f; out[ op++ ] = b; out[ op++ ] = bc; out[ op++ ] = ab; out[ op++ ] = c; out[ op++ ] = ca; out[ op++ ] = bc; out[ op++ ] = a; out[ op++ ] = ab; out[ op++ ] = ca; out[ op++ ] = ab; out[ op++ ] = bc; out[ op++ ] = ca; assert( op <= stackSize ); } for ( int i = 0; i < op; ++i ) in[ i ] = Normalize( out[ i ] ); ip = op; } int main( ) { octohedron[ 0 ] = Vec3( 1.0f, 0.0f, 0.0f ); octohedron[ 1 ] = Vec3( 0.0f,-1.0f, 0.0f ); octohedron[ 2 ] = Vec3(-1.0f, 0.0f, 0.0f ); octohedron[ 3 ] = Vec3( 0.0f, 1.0f, 0.0f ); octohedron[ 4 ] = Vec3( 0.0f, 0.0f, 1.0f ); octohedron[ 5 ] = Vec3( 0.0f, 0.0f,-1.0f ); in[ ip++ ] = octohedron[ 2 - 1 ]; in[ ip++ ] = octohedron[ 1 - 1 ]; in[ ip++ ] = octohedron[ 5 - 1 ]; in[ ip++ ] = octohedron[ 3 - 1 ]; in[ ip++ ] = octohedron[ 2 - 1 ]; in[ ip++ ] = octohedron[ 5 - 1 ]; in[ ip++ ] = octohedron[ 4 - 1 ]; in[ ip++ ] = octohedron[ 3 - 1 ]; in[ ip++ ] = octohedron[ 5 - 1 ]; in[ ip++ ] = octohedron[ 1 - 1 ]; in[ ip++ ] = octohedron[ 4 - 1 ]; in[ ip++ ] = octohedron[ 5 - 1 ]; in[ ip++ ] = octohedron[ 1 - 1 ]; in[ ip++ ] = octohedron[ 2 - 1 ]; in[ ip++ ] = octohedron[ 6 - 1 ]; in[ ip++ ] = octohedron[ 2 - 1 ]; in[ ip++ ] = octohedron[ 3 - 1 ]; in[ ip++ ] = octohedron[ 6 - 1 ]; in[ ip++ ] = octohedron[ 3 - 1 ]; in[ ip++ ] = octohedron[ 4 - 1 ]; in[ ip++ ] = octohedron[ 6 - 1 ]; in[ ip++ ] = octohedron[ 4 - 1 ]; in[ ip++ ] = octohedron[ 1 - 1 ]; in[ ip++ ] = octohedron[ 6 - 1 ]; Subdivide( ); Subdivide( ); FILE* fp = fopen( "out.txt", "w" ); for ( int i = 0; i < ip; i += 3 ) { Vec3 a = in[ i ]; Vec3 b = in[ i + 1 ]; Vec3 c = in[ i + 2 ]; fprintf( fp, "%7.4f, %7.4f, %7.4f,\n%7.4f, %7.4f, %7.4f,\n%7.4f, %7.4f, %7.4f,\n\n", a.x, a.y, a.z, b.x, b.y, b.z, c.x, c.y, c.z ); } fprintf( fp, "%d\n", op / 3 ); for ( int i = 0; i < ip; i += 3 ) { Vec3 a = in[ i ]; Vec3 b = in[ i + 1 ]; Vec3 c = in[ i + 2 ]; Vec3 n = Normalize( Cross( b - a, c - a ) ); fprintf( fp, "%7.4f, %7.4f, %7.4f,\n", n.x, n.y, n.z ); } fclose( fp ); }]]>