Data Oriented Design : Starting Up

Personal Update

It has been some time since I have tended to my personal blog, so I feel like I’m coming back to visit my neglected pet as a young child. That sounds sort of strange, but I do feel a little bit of shame in procrastinating for so long! I have so many things to do; tech demos, update resume, refactor the Portfolio section of the website, write some more posts, write posts for GameDev.TutsPlus. So many things. Anyway, it feels good to have the time in the Summer semester to do all these things.

Data Oriented Design

Data oriented design is a newer way of thinking that seemed to have come forth primarily from the processing power of modern hardware contrasted with abysmally low memory access speeds. Both an average CPU and GPU are ridiculously powerful, and memory access patterns are the cause of all performance bottlenecks I have encountered in all my programming endeavors thus far.

A natural tendency of a game engine during development is for over-complication to arise. Over complicating a solution to a problem is a tell-tale sign of a bad designer. So in architecting a game engine simplicity is something to strive for. However simplicity alone needs to be coupled with performance and brevity (nobody likes thousands and thousands of lines of simple code, either). The ideal game engine is one that is highly efficient and simple. Out of these needs Data Oriented Design has arisen.

Benefits of Simplicity

Sure, a simple program can easily be read. It can also easily be modified and extended as a result of being read. This is a no brainer for most of us, however what may not be immediately realized is that due to the ability to read and modify simple code leads to something interesting. If one manages to solve a complex problem in a simple way, it can be acceptable for that solution to be highly specific or specialized.

A generic solution to any problem would of course be best, but often times coming up with a generic solution takes too much time to be realistic. Even so large amounts of time are often justified towards creating generic solutions within complex systems. If you’re going to add code to a complex system it will inherently take time, and reusability becomes a necessity.

In this case a more specific solution that can be reused less would suffice. However if such a specific solution is incredibly simple, efficient and has brevity, then it suddenly becomes less a big deal if the solution is not so generic. On top of this, if this simple solution is also self-contained has no side effects it can be parallelized trivially.

Benefits of DOD

As my various links describe, DOD provides: simple, optimized and trivially parallelized code. If this is true, then DOD is something that every pogrammer should consider employing whenever optimization is a necessity.

Back to Data Oriented Design

So what is Data Oriented Design (DOD)? There are not very many solid articles on actually implementing DOD, but I can do a quick link roll of what I have collected so far:

I myself am currently writing large portions of a 3D game engine, and in doing so I want to make use of DOD in some key areas, namely physics, graphics, and game objects/components. So how would one actually use DOD in these areas?

I like to think of DOD like this: Keep all of your data in large contiguous arrays within memory. Then, whenever an operation needs to be performed on some of the data, think in terms of performing this operation on each piece of data sequentially, all in one traversal. Hardware accesses memory fastest when it is already in the CPU cache, so traversing memory linearly allows for the best cache performance, especially when prefetches are used (more on prefetches later).

Contrast such memory access with a traditional Object Oriented approach: an object updates each of its components, and each component has memory stored in entirely different areas. Therefor each game object, upon update, touches memory in physics, graphics, and ai related areas. The AI related area also likely touches back into (many times) the physics memory. All of this memory jumping and cache missing is incurred many times per object, and each object does this. It would be much more efficient if each segment of memory was updated and operated upon one at a time, rather than the abstract idea of a “game object” to be updated one at a time.

This last paragraph highlights the entire contrast between DOD and OOP. OOP focuses on identifying data with an abstract name, and associates functionality with that name. DOD focuses on identifying what memory can be laid contiguous, and what operations can be done on the entire st of data in a linear fashion. So in reality implementing DOD is merely a matter of changing how you think when you write code.

I decided to store all my objects and components within a simple std::vector-like container called a SlotAllocator. I have no idea what the actual name for such a data structure is though. Growing the internal array requires memory copies from one location to another, so the ability to copy objects from one memory location to another should not break your program. Using array indices instead of pointers is a great way to implement such an idea.

Since DOD is all about transforming data sets from one state to another in as linear as possible a fashion, some specialized data structures are going to be needed.

DOD Data Structures

SlotAllocator

I have already mentioned my SlotAllocator, which is pretty much a std::vector with some slight modifications (or in other words it is an array). The biggest difference is that whenever an object is deleted, instead of copying every subsequent object one index backwards, I take the rightmost object and swap it into the free’d space. This is a simple operation that keeps all the active objects in the container contiguous in memory.

I will be trying this container out to store game objects and components. I do not currently have enough of a working engine all together in order to do meaningful performance tests, but that time will eventually come.

HashTable

Another structure I ran into a need of is a hash table. I have two types of hash tables: intrusive and non-intrusive. Lets start with the intrusive table.

This here is the best intrusive hash table I have ever seen. It was written by Patrick Wyatt, and is awesome. The reason an intrusive hash table is so great is that the node used to contain whatever data to be stored in the table, is stored within the data itself. This allows for the data to be stored in somewhere else in the program (in an array), and the node itself is right there as well. A more traditional hash table that uses chaining resolution will have to allocate nodes upon collision. In an intrusive scheme the node will already exist within the object. I created my own intrusive hash table in the image of Patrick Wyatt’s.

However there is sometimes the need to hash something that cannot contain a node, for example pointers. If a need for hashing pointers arises, either a non-intrusive table must be utilized or a wrapping object around both a node and the pointer must be created and inserted into an intrusive table. As such a standard non-intrusive table is required.

Making use of the aformentioned SlotAllocator I fashioned a simple table with chaining resolution. However the nodes themselves are stored in a contiguous array (due to the nature of the SlotAllocator), and instead of pointers I used array indices. Traversing the internal chains is a matter of jumping around with array access. This allows the table to grow and shrink to accommodate different load factors as desired.

Does such a table actually have performance benefits over slightly different styles? I’m sure the contiguous memory is a great plus, but actual profiling can’t be done just quite yet.

HandleManager

The last thing I would like to discuss is the handle manager. Perfect for managing anything in pointer form that needs to be created and destroyed regularly, which is just about everything active in a game. The handle manager translates a 32 bit integer (handle) into a pointer. Handles can be passed around the engine freely, and when a pointer is required the handle is translated into the desired pointer. The translated pointer is then used and immediately thrown away and not stored. This continual translation from handle to pointer allows for the actual pointer address within the manager to be changed, and all subsequent translations will retrieve the updated address.

Implementation is a matter of creating a bit field within the 32 bytes. A good explanation and example implementation can be found on Noel Llopis’s blog here. I myself created my own version of his handle manager with some tweaks and the like.

Moving Forward

Now that the preliminary data structures have been created it is time for me to really start using them within our 3D game engine. Future posts will likely contain all the juicy details of such ventures! Wish me luck.

TwitterRedditFacebookShare

6 thoughts on “Data Oriented Design : Starting Up

  1. Kevin Yudi Utama

    Hi Randy, I am new to data oriented design and I am currently building a physics engine from scratch. I want to ask how do you apply data oriented design in physics engine. Most of physics engine operation need a pair of object like a pair of collider and a pair of rigid body when collision resolution phase. This pair of object are most likely not contiguous on memory. When island are created, the set of bodies are most likely not contiguous on memory. Accessing this colliders and rigidbody using handles(if we use swap with the last trick when destroying object, It means we need some way to translate handle into index in the array) instead of pointer add another level of indirection which causes more cache misses. I have ask similar question on http://gamedev.stackexchange.com/questions/111971/data-oriented-design-in-physics-engine

    Reply
    1. Randy Gaul Post author

      Bodies, collision pairs, etc. can all be allocated in a packed array if you want to try to keep them close together in memory. Since islands can form in arbitrary configurations keeping bodies and colliders packed together in memory will not really help you much in terms of cache coherency. This is because any two bodies can collide with the colliders of any other two bodies at any time, so sorting/packing these in memory is moot. As for the islands themselves, islands are usually created via DFS (or union find), and so accessing each contact arbiter will also be fairly random and not make good use of a packed array.

      Finally we have the collision solver. The collision solver can be formed to easily run along an array and transform data. It might be smart to pull in necessary data from rigid bodies into a temporary packed array for the solver. The solver can then iterate over the array without unnecessary cache misses, and SIMD can be utilized here. Once finished, the cached array can be written back to memory (rigid bodies).

      As for collision detection, at least in 3D, usually only a couple shapes can fit in the cache at any given time. In my own work I’ve found that making collision information (in model space) instanced whenever necessary, and multi-threaded is a good way to speed this portion up. Also it’s a good idea to try to run collision detection routines as little as possible with clever frame-by-frame data caching, and good broadphase.

      Reply
  2. KimS

    Hi there, great article! I got curious to know more about your SlotAllocator, more specifically how do you handle resizing when the internal array gets full and you simply can’t just do swapping-with-the-last trick. That is one of the small things I think people don’t quite talk enough about DOD: how to use collectors to make your data linearly allocated when you do not know in advance the maximum number of possible entries in the collection (i.e. when you can’t use an array because you do not know its size in advance).

    Thanks for the paper and for your time!

    Reply
    1. Randy Gaul Post author

      Good question. There are two styles to retrieve an element from an array. 1) pointer directly pointing at the element. 2) index used to define an offset from a base pointer. Often times the best method is up for discussion and depends on project constraints. In general method 2 is more popular. A “handle” can be an index into an array (i.e. offset from a base pointer). Whenever an element from the array must be retrieved an offset is used to locate the element’s pointer. In the event the array grows the base pointer will change, however, as long as client code does not store pointers directly a new translation will occur with the new base pointer once growing is completed. It all works out.

      If method 1) is chosen all pointers must be updated upon array growth.

      Does this make sense? Hope that helps!

      Reply
      1. KimS

        Many thanks for the fast feedback! And also for the detailed explanation. Indeed, what you said makes quite a bit of sense. The only thing that still hunts me a bit is the following: considering that such a thing can only be done with a dynamic array, how do you handle the deallocation of memory? You create a wrapper ‘delete’ function towards your custom container’s struct/class for it to release the used memory? Thanks again for your time!

        Reply

Leave a Reply

Your email address will not be published. Required fields are marked *