Circular Linked Lists and Branching

Since linked lists are such an essential topic I’ve taken some extra care to learn efficient ways of using them. The simplest kind of linked list to conceptualize is the singly linked list. There are tons of online resources for learning the basics about linked lists, so I’ll assume readers are familiar with the concept.

Here’s a quick mock header of some linked list nodes for reference:

In general singly linked lists are more complicated to manage once removal of nodes is required. Since no explicit prev pointer is stored in memory a temporary variable is often kept on the stack while traversing a singly linked list. This means more complicated code that clogs the user’s focus.

Even though a doubly linked list requires twice the memory they are usually still preferred over singly linked lists, even when a singly linked list could get the job done without any additional time complexity. Often times linked lists are useful in complex algorithms, and if there’s a chance to simplify the implementation of a complex algorithm by using a doubly linked list, then that chance is probably worth the taking.

When I first implemented a doubly linked list and tested its performance out against std::list I couldn’t quite get it to perform well.

Naive insertion and removal of list nodes often has to check for NULL pointers, which represent the front and back of the linked list. Here’s an example of what removal might look like to give you the idea of how many if-statements could be necessary (code not tested, I just typed it out here on the spot):

There are two if statements hit every single time this function is called. When the CPU comes across a branch is loads instructions based on which path of execution it deems most likely. This is called branch prediction. If this prediction is incorrect the loaded code must be unloaded, and then the appropriate code must be re-loaded.

This branch missing probably going to be a very fast CPU operation since executing code is almost definitely in the L-1 code cache. Despite it being fast modern CPU still operate through a pipeline, and branch misses can still garble up whatever pipelining is happening. In the end a branch miss is a performance hit, and should be avoided when appropriate.

A common linked list optimization is to use a dummy head and tail node. These nodes sit in memory along with the list data structure. Upon list initialization they point their next and previous pointers to one another, and NULL out the pointers to represent the front and back of the list.

With this optimization the only case that user nodes will ever encounter is the case in the first two if statements (assuming both were true). The removal code can now look something like (again, not tested):

This is one kind of optimization the std implements. After doing this myself my list performed evenly with the std’s implementation.

Intrusive Lists

Intrusively linked lists invert the definition of what a node is. Traditionally a linked list node contains some data. An intrusive list has the data contain the node:

This scheme is nice since now nodes do not need to be allocated separately from the data. If the number of data elements is known, then the exact number of nodes needed can also be known.

C++ templates can be used to create a generic intrusively linked list implementation, able to define nodes inside of any data type. C macros can also be used to the same effect. In this way an intrusively linked list can be used in pretty much the same way a normal linked list is.

One major downside to intrusively linked lists is that they add in extra memory to your data. This can be a big deal if some code is very performance sensitive. If cache line utilization is important, then the percentage of data actually used in each line becomes important. Sometimes these pointers get in the way and clutter the lines. This cluttering is something to be aware of.

On the flip side many algorithms can run on arrays of data. Instead of storing explicit pointers to represent prev and next connections, indices into an array can be used. This can make entire data structures memcpy-able, or serializable just by dumping bits to a stream. Additionally, the pointers stored directly within data will often be accessed as the exact same time (depending on the algorithm), which results in very high cache line utilization.

It all depends on the scenario.

Circular Lists (Sentinel)

When dealing with intrusive linked lists it can often be really weird to define where in memory dummy nodes would reside. Are we to create dummy pieces of Data? What if the algorithm needs lists to be constantly created and destroyed? What if the algorithm can have as many lists as there are nodes? Suddenly an algorithm might need twice as many dummy nodes as actual nodes!

It is possible to remove the dummy nodes in some cases. Data elements can be initialized to point to themselves. In this way each element is itself a doubly linked list with one node. To insert a second node is a matter of making both nodes point to each other. Inserting a third node should use the exact same code as inserting the second node (and not require any branching since NULL indices/pointers do not exist), and so on.

In many cases an intrusive circular doubly linked list (boy, isn’t that a mouthful) can be the perfect solution to a hard problem! I will leave it as an exercise to research or implement this circular style of linked list.

Another name for this type of list would be a “sentinel intrusive list”, where a sentinel node can be used to bound a list traversal. Since our linked lists are circular we can start at any node, traverse the list, and once we reach the node we started upon our traversal is complete.


Leave a Reply

Your email address will not be published. Required fields are marked *