It has been some time since I have tended to my personal blog, so I feel like I’m coming back to visit my neglected pet as a young child. That sounds sort of strange, but I do feel a little bit of shame in procrastinating for so long! I have so many things to do; tech demos, update resume, refactor the Portfolio section of the website, write some more posts, write posts for GameDev.TutsPlus. So many things. Anyway, it feels good to have the time in the Summer semester to do all these things.
Data Oriented Design
Data oriented design is a newer way of thinking that seemed to have come forth primarily from the processing power of modern hardware contrasted with abysmally low memory access speeds. Both an average CPU and GPU are ridiculously powerful, and memory access patterns are the cause of all performance bottlenecks I have encountered in all my programming endeavors thus far.
A natural tendency of a game engine during development is for over-complication to arise. Over complicating a solution to a problem is a tell-tale sign of a bad designer. So in architecting a game engine simplicity is something to strive for. However simplicity alone needs to be coupled with performance and brevity (nobody likes thousands and thousands of lines of simple code, either). The ideal game engine is one that is highly efficient and simple. Out of these needs Data Oriented Design has arisen.
Benefits of Simplicity
Sure, a simple program can easily be read. It can also easily be modified and extended as a result of being read. This is a no brainer for most of us, however what may not be immediately realized is that due to the ability to read and modify simple code leads to something interesting. If one manages to solve a complex problem in a simple way, it can be acceptable for that solution to be highly specific or specialized.
A generic solution to any problem would of course be best, but often times coming up with a generic solution takes too much time to be realistic. Even so large amounts of time are often justified towards creating generic solutions within complex systems. If you’re going to add code to a complex system it will inherently take time, and reusability becomes a necessity.
In this case a more specific solution that can be reused less would suffice. However if such a specific solution is incredibly simple, efficient and has brevity, then it suddenly becomes less a big deal if the solution is not so generic. On top of this, if this simple solution is also self-contained has no side effects it can be parallelized trivially.
Benefits of DOD
As my various links describe, DOD provides: simple, optimized and trivially parallelized code. If this is true, then DOD is something that every pogrammer should consider employing whenever optimization is a necessity.
Back to Data Oriented Design
So what is Data Oriented Design (DOD)? There are not very many solid articles on actually implementing DOD, but I can do a quick link roll of what I have collected so far:
I myself am currently writing large portions of a 3D game engine, and in doing so I want to make use of DOD in some key areas, namely physics, graphics, and game objects/components. So how would one actually use DOD in these areas?
I like to think of DOD like this: Keep all of your data in large contiguous arrays within memory. Then, whenever an operation needs to be performed on some of the data, think in terms of performing this operation on each piece of data sequentially, all in one traversal. Hardware accesses memory fastest when it is already in the CPU cache, so traversing memory linearly allows for the best cache performance, especially when prefetches are used (more on prefetches later).
Contrast such memory access with a traditional Object Oriented approach: an object updates each of its components, and each component has memory stored in entirely different areas. Therefor each game object, upon update, touches memory in physics, graphics, and ai related areas. The AI related area also likely touches back into (many times) the physics memory. All of this memory jumping and cache missing is incurred many times per object, and each object does this. It would be much more efficient if each segment of memory was updated and operated upon one at a time, rather than the abstract idea of a “game object” to be updated one at a time.
This last paragraph highlights the entire contrast between DOD and OOP. OOP focuses on identifying data with an abstract name, and associates functionality with that name. DOD focuses on identifying what memory can be laid contiguous, and what operations can be done on the entire st of data in a linear fashion. So in reality implementing DOD is merely a matter of changing how you think when you write code.
I decided to store all my objects and components within a simple std::vector-like container called a SlotAllocator. I have no idea what the actual name for such a data structure is though. Growing the internal array requires memory copies from one location to another, so the ability to copy objects from one memory location to another should not break your program. Using array indices instead of pointers is a great way to implement such an idea.
Since DOD is all about transforming data sets from one state to another in as linear as possible a fashion, some specialized data structures are going to be needed.
DOD Data Structures
I have already mentioned my SlotAllocator, which is pretty much a std::vector with some slight modifications (or in other words it is an array). The biggest difference is that whenever an object is deleted, instead of copying every subsequent object one index backwards, I take the rightmost object and swap it into the free’d space. This is a simple operation that keeps all the active objects in the container contiguous in memory.
I will be trying this container out to store game objects and components. I do not currently have enough of a working engine all together in order to do meaningful performance tests, but that time will eventually come.
Another structure I ran into a need of is a hash table. I have two types of hash tables: intrusive and non-intrusive. Lets start with the intrusive table.
This here is the best intrusive hash table I have ever seen. It was written by Patrick Wyatt, and is awesome. The reason an intrusive hash table is so great is that the node used to contain whatever data to be stored in the table, is stored within the data itself. This allows for the data to be stored in somewhere else in the program (in an array), and the node itself is right there as well. A more traditional hash table that uses chaining resolution will have to allocate nodes upon collision. In an intrusive scheme the node will already exist within the object. I created my own intrusive hash table in the image of Patrick Wyatt’s.
However there is sometimes the need to hash something that cannot contain a node, for example pointers. If a need for hashing pointers arises, either a non-intrusive table must be utilized or a wrapping object around both a node and the pointer must be created and inserted into an intrusive table. As such a standard non-intrusive table is required.
Making use of the aformentioned SlotAllocator I fashioned a simple table with chaining resolution. However the nodes themselves are stored in a contiguous array (due to the nature of the SlotAllocator), and instead of pointers I used array indices. Traversing the internal chains is a matter of jumping around with array access. This allows the table to grow and shrink to accommodate different load factors as desired.
Does such a table actually have performance benefits over slightly different styles? I’m sure the contiguous memory is a great plus, but actual profiling can’t be done just quite yet.
The last thing I would like to discuss is the handle manager. Perfect for managing anything in pointer form that needs to be created and destroyed regularly, which is just about everything active in a game. The handle manager translates a 32 bit integer (handle) into a pointer. Handles can be passed around the engine freely, and when a pointer is required the handle is translated into the desired pointer. The translated pointer is then used and immediately thrown away and not stored. This continual translation from handle to pointer allows for the actual pointer address within the manager to be changed, and all subsequent translations will retrieve the updated address.
Handle( uint32 index, uint32 counter, uint32 type );
inline operator uint32( void ) const;
uint32 m_index : 14;
uint32 m_counter : 14;
uint32 m_type : 4;
STATIC_ASSERT( sizeof Handle == sizeof( uint32 ) );
Handle::operator uint32( void ) const
return m_type << 28 | m_counter << 14 | m_index;
Implementation is a matter of creating a bit field within the 32 bytes. A good explanation and example implementation can be found on Noel Llopis’s blog here. I myself created my own version of his handle manager with some tweaks and the like.
Now that the preliminary data structures have been created it is time for me to really start using them within our 3D game engine. Future posts will likely contain all the juicy details of such ventures! Wish me luck.