Welcome to the fourth post in a series of blog posts about how to implement a custom game engine in C++. As reference I’ll be using my own open source game engine SEL. Please refer to its source code for implementation details not covered in this article. Files of interest are
Batching and Batches
Did I say “simple” sprite batching? I meant dead simple!
Modern graphics card drivers (except Mantle???) do a lot of stuff, and it makes for passing information over to the GPU (or retrieving it) really slow on the PC platform. Apparently this is more of a non-issue on consoles, but meh we’re not all working with consoles now are we?
The solution: only send data one time. It’s latency that kills performance and not so much the amount of data. This means that we’ll do better at utilizing a GPU if we send a lot of data all at once as opposed to sending many smaller chunks.
This is where “batching” comes in. A batch can be thought of as a function call. In OpenGL you’ll see something like
DrawArrays, and in DirectX something else. These types of functions send chunks of data to the GPU in a “batch”
Sprites and 2D
Luckily it’s really easy to draw sprites in 2D: you can use a static quad buffer and instance by sending transforms to the GPU, or you can precompute transformed quads on the CPU and pass along a large vertex buffer, or anything in-between.
However computing the batches is slightly trickier. For now lets assume we have sprites with different textures and different zOrders.
In order to send a batch to the GPU we must only draw sprites with the same texture. This is because we can only render instances (or a large vertex array) with a given texture in order to lower draw calls. So we must gather up all sprites with the same texture and render in the correct order according to their zOrders.
If you are able to store your sprites in pod-structures then you’ll be in luck: you can in-place sort a large array of pods really easily using
std::sort. If not, then you can at least make an array of pointers or handles and sort those. You’ll have extra indirection, but so be it.
std::sort requires STL compatible iterators, and you’ll want one with random access (array index-style access). Here’s an example with a std::vector:
bool SpriteSort( const Sprite& A, const Sprite& B )
if(A.zOrder == B.zOrder)
return A.Texture < B.Texture;
return A.zOrder < B.zOrder;
std::sort( sprites.begin( ), sprites.end( ), SpriteSort );
The sort implementation within your package of the STL is likely going to be quicksort.
This sort will sort by zOrder first, and if zOrders are matching then sort by texture. This packs a lot of the sprites with similar textures next to each other in the draw list.
From here it’s just a matter of looping over the sprite array and finding the beginning/end of each segment with the same texture.
A few simple operations can be done here to ensure that computing batches goes as fast as possible. The first is to get all of your sprites together in a single array and sort the array in-place. This is easily done if your sprites are mere pods. This ensures very high locality of reference when transforming the sprite array.
The second optimization is to only sort the sprite array when it has been modified. If a new sprite is added to the list you can sort the whole thing. However there is no need to sort the draw list (sprite array) every single frame if no changes are detected.
I was able to render well over 8k dynamic sprites on-screen in my own preliminary tests. I believe I actually ended up being fill-rate bound instead of anything else. This is much more than necessary for any game I plan on creating.