Any competent software engineer will have spent significant time working with low level memory management. Even though the operating system code is written for will often provide some kind of allocation and deallocation mechanism, application specific assumptions can be made to increase memory related performance.
For example certain hardware doesn’t have virtual memory support, or the virtual memory support can be quite lacking. A lack of virtual memory means raw allocations from the OS return real addresses to the hardware RAM. Usually virtual memory can alleviate some effects of memory fragmentation through a level of indirection, though when dealing with physical memory yourself no such alleviation exists.
This is just one example of how a software memory manager can be written and used to control memory fragmentation in a way that makes sense for the application.
Types of Allocators
There are a few main types of allocators that I myself have found pretty useful: paging, stack and heap based allocations. Each one makes specific assumptions about the types of allocations and how the memory ought be used. Due to these assumptions significant performance boosts can be reaped in ways that may not have been realistic with raw operating system allocations.
Stack Based Allocation
My favorite type of allocation involves the use of a simple stack. The idea is to make one large call to malloc or new and hold this piece of memory. The Stack itself just holds a pointer to this large chunk of memory, and an integer representing an index into the stack with an element size in bytes.
Here is what a Stack implementation might look like (in pseudo code):
m_memory = malloc( STACK_SIZE );
m_index = 0;
assert( m_index == 0 );
free( m_memory );
void* Allocate( int size );
Free( void* data, int size );
Allocation can work by moving the m_memory pointer forward in the stack. Deallocation can work by moving the m_memory pointer backwards in the stack. Notice that the Free function requires the user to pass back in the size of the allocation! This can be avoided by storing this size parameter from Allocate inside of the m_memory array itself, just before the location of the returned address. Upon deallocation this value can be retrieved by moving the data parameter of Free back in memory by 4 bytes.
The advantage of the stack allocator is that it’s extremely fast and dubiously simple to implement. The limitation is that deallocations must be performed in the reverse order of allocations, since the stack itself is in LIFO order. This makes the use cases for the stack allocator pretty limited. Usually resources, like images, level files, sounds, models, etc. can be loaded into memory with a stack based allocator. Anything that has a very clear and non-variable lifespan should be able to be allocated on a stack.
One last trick is that the last allocation can be trivially resized! Often times an algorithm will require a lot of temporary scratch memory to perform some calculations, or store some state. An initial guess as to how much memory is needed can often be calculated as the worst-case scenario. Once an algorithm finishes this scratch memory can be reduced to the size actually used, if it is the last allocation on the stack. Resizing the last stack allocation involves moving the index backwards in memory.
Implementing your own heaps is pretty similar to the stack based allocator. A heap allocator will use the operating system to allocate a large chunk of memory. Subsequent calls to the heap’s Allocate and Free methods will just dip into this chunk and fetch a piece.
The heap is more versatile and general purpose than a stack allocator. The heap can be implemented with a linked list of nodes. Each node represents a piece of memory. A node can either be allocated or free. To keep track of these linked list pointers, allocation state, and size of the memory block some memory itself is required! This stuff can be stored in a separate array, or right inside the large raw chunk of memory (just like with the stack allocator).
Usually it is preferential to add a small header to each allocation to store this information. A heap node might look something like this:
When the heap is first constructed it will contain a linked list of HeapHeader structs, but only a single header will be present, and it holds the entire piece of raw memory originally allocated by the OS upon the Heap allocator’s construction.
Allocating from the heap involves splitting a free HeapHeader into an allocated piece, and a new HeapHeader for the leftover space. The details of this lay mostly in the linked list implementation, and is not the focus of this article.
In order to reduce memory fragmentation it is a good idea to merge adjacent free HeapHeader links into a single link. This ought to be handled in the Heap::Free function. The details of merging free links lay mostly in the linked list implementation, and is not the focus of this article.
Here’s an example of what the Heap may look like in implementation:
void* Allocate( int size )
// Search linked list for a free link that can fit size
header->allocated = true;
// split header into two headers
// mark the new header as not allocated
return data + sizeof( Heapheader );
Free( void* data )
HeapHeader* header = data - sizeof( Heapheader );
header->allocated = false
if ( header->next is free )
// Merge header into header->next
merged = true;
if (header->prev is free )
// Merge header into header->prev
merged = true
When Heap::Allocate is called a free link of appropriate size must be searched for. This has the time complexity of O( N ), and a lot of memory must be fetched into the cache upon allocation as the list itself is traversed. There are tricks to improve allocation performance of heaps, and a simple one would be the cache a single pointer to a free block in the heap itself. This pointer can be cached in Heap::Free, Heap::Allocate, or both. Once a new call to Heap::Allocate is made this cached pointer can be tested first to see it is an appropriate size.
There are two common ways to search through the links for an allocation: first fit and best fit. First fit will return the user with the first piece of memory large enough to hold the allocation. Best fit will return a chunk of memory that came from a HeapHeader with the smallest size that is still large enough to hold the requested allocation size.
First fit can be preferential for cache coherency, as it may prefer to allocate from the beginning of the heap and try to keep things closer together in memory. Best fit may be preferential for keeping the heap as un-fragmented as possible.
The heap based allocator intends to fight memory fragmentation through fitting links to allocation sizes, and by merging adjacent free memory blocks. This type of fragmentation is called external fragmentation. Another type of memory fragmentation is called internal fragmentation.
An internal memory fragmentation is when an allocated piece of memory is given to the user that actually holds more memory than the user requested. The user is assumed to not know about this extra piece of memory. This can provide an advantage to the allocator: all allocations can be of a fixed size, and any allocation larger than this fixed size is denied.
This lets the allocator act like an array. When an allocation is requested an empty element can be returned to the user. Upon freeing a piece of memory, the element is simply marked as free and placed into a free list.
The free list is a linked list of array elements. The memory in the free elements themselves should be used to store the pointer of each subsequent free element.
Allocation and deallocation become constant in time complexity and there is zero external memory fragmentation. In this way internal memory fragmentation is traded for external memory fragmentation.
The term “pages” comes into play when the array is filled up. Once an array is full of allocated elements another array can be allocated. Once this array is filled up, another one is allocated. Each array (aka page) can be stored in a singly linked list of pages.
The free list itself can pointer across multiple pages without any problems.
A page containing only free elements can be deleted entirely, though this feature might not need to be supported.
A paged allocator can also hold an array of singly linked lists of pages. Each element of this array can hold a list of pages that corresponds to a different element size. This can allow the paged allocator to fit different allocation requests into the most appropriate page list. A common tactic is to have pages that represent arrays with an element size of 2^N bytes, where N is usually at least 2, and smaller than some value K.
The biggest advantage of a paged allocator is zero external fragmentation. The internal fragmentation does make memory more non-homogeneous. This type of allocator will probably lower your cache line utilization. Cache line utilization would be how much memory in each cache line fetched from main memory to the CPU cache is actually used. Since internal fragmentation is a feature of a paged allocator, cache line utilization will probably suffer.
The unused memory in the pages can be reduced drastically on a per-application basis; if the users of the allocator are able to specify the element sizes of different page lists, then zero internal fragmentation can be achieved.
Instead of thinking of a paged allocator in terms of separate arrays, one might think of a simpler allocator that holds just a single array. If the elements within this array of of POD nature the array elements can be referenced by index. This lets the array grow or shrink in size as necessary, as new sized arrays can still be accessed by an old index.
Whenever the user wants a pointer to an element they first give the array an index, and a pointer is returned. This pointer is never stored anywhere! Continuous translation from index to pointer occurs -this allows the internal array itself to moved around in memory as necessary.
Users might need a little more power to refer to elements than a simple integer. Some type of handle might be needed to translate from index to pointer. Read more about handles here.
Given these three types of allocators an application should have all the variety of memory allocation necessary to run with pretty good performance. More advance allocation techniques definitely exist, and some are just combinations of the three basic allocators presented in this article.
Each allocator can be quite simple in isolation! I myself implemented a stack in about 100 lines, a paged allocator in 150, and a heap in about 250 lines of C++ code.
Further reading might include topics such as: cache coherency, memory alignment, garbage collection, virtual memory, page files (operating system pages).