# What is there to Hate about References?

I find most usage of references annoying cruft. Often the arguments I see or hear that are “pro-reference” make the same lame points that most of the internet makes:

• Pointers are dangerous
• Pointers are ambiguous and confusing
• NULL pointers lead to undefined behavior and crashes

Just google “pointers and references” and you’ll see bad advice everywhere. A new programmer seeing these bullet points is likely to get hyped about using references everywhere. Seeing advice like this just sort of upsets some part of me. Perhaps it’s because when the above statements are plastered onto websites they state them as fact.

In an effort to not make the same annoying mistake as every other article on the internet I’ll present my opinion as an opinion. By stating something as an opinion the reader will immediately begin to read with a certain amount of skepticism. This might coax newer readers into thinking for themselves, which ought to be the goal of writing an educational article on the first place. Writing step-by-step instructions on how not to use “dangerous pointers” is the worst way to write on the topic of pointers and references.

I know I sound pretty bitter. I recall a time when I browsed the internet and looked for advice on this exact topic. It takes time to unlearn bad things, and so this post was born.

## Memory Matters

Where things are in memory is a big deal. Memory access is commonly a bottleneck in real-time applications, and code that has ambiguous memory accesses patterns upsets me. Imagine peering into a large function that is operating on some kind of data. Scanning the middle of the function a few lines of code are encountered:

Just be reading this code it isn’t immediately understandable as what the variable d is. What is happening here? In order to know what the scope of d is some scrolling or manual code navigation will ensue. How long will this take? How much does it cut down on focus while the reader is just trying to understand the code?

Often times for member variables of an internal class or struct will be appended with the m_ prefix. This is nice as readers immediately know the scope and implications of all uses of a member variable. There’s an implicit this pointer being accessed somehow, and the variable’s scope is relative to this class’s definition.

In this case there’s no such nice prefix. d can either be a reference or value type. There’s no way to know without some kind of intellisense. Mousing over a variable to see what the type is, given a nice IDE, is not really that big of a deal. The big deal here is that if you have to mouse over something to get an idea of what sort of memory this represents. Just take a look at this code:

What sort of questions might the user have about this code? Clearly d is a pointer to some memory. This immediately informs the reader about the nature of the code. is likely to be some kind of output, or perhaps a specific element in an array. It is definitely not just a lone variable on the stack. No intellisense is needed to get this information, no mousing over or code navigation is needed just to understand the idea of assigning a value to some non-local stack memory. The programmer reading this code might be able to focus slightly better on understanding the code due to the use of a pointer.

Sure d could technically be a *NULL* pointer (gasp), but in reality this is a non-issue. The only times checking for NULL pointers is important is when user-input is being handled. A lot of code doesn’t deal with user input! For internal code I’d make the argument that memory not on the local stack scope (local to at least the function currently executing” should almost always be referred to by pointer. Internal code can make assumptions on how pointers are used and not care about the NULL case. Internal code often solves difficult problems, and needs to be efficient (in the scope of real-time applications). Anything that fragments reader focus is bad, even taking a moment to mouse of a variable to see if it’s a reference or not.

## Another Example

In the above snippet imagine that the joined AABB is being written to, by finding the AABB that bounds both a and b. Perhaps in this specific case it is fairly obvious that joined is memory that is being written to by the MergeAABBs function. This is probably because joined was quite well named, despite being passed by reference to MergeAABBs. However this function might have been written in a way that returns a new AABB entirely by value, and only operates on a local stack copy of joined. In this case the code would compile and run perfectly fine, but joined would have unitialized memory. This might lead to a crash or assert, thus lower iteration time and programmer focus.

Now lets look at the use of a “dangerous” pointer:

In this code snippet, no matter what the third parameter is named as, it is as obvious as possible that the function MergeAABBs is operating on some memory passed to it, and does not return anything useful in this particular use-case. The contents of the function MergeAABBs is probably obvious as well, I know I can imagine how it’s implemented without even looking; there’s just no ambiguity.

The name of variables should be meaningful to the problem the code is solving. Requiring a naming convention for code clarity simply because of an arbitrary reference function parameter is an unnecessary constraint! Naming things is hard enough without random convention limitations.

Sure if some idiot passed in a NULL pointer to MergeAABBs it will crash, but how often does this happen in practice? What kind of programmers are you hiring that actually get caught up in this kind of problem? Real-life competent engineers aren’t going pass in a NULL pointer and will appreciate good code written with “dangerous and ambiguous” pointers.

When a function takes only floats and writes to a float it’s pretty much worst-case for reference ambiguity. Which float is being written to in the next code snippet?

Is the triangle actually {a, b, c}, or some other combination of parameters? Which of the arguments are float arrays (vectors) or just floats? Which ones are written to, and which are read only? Some kind of code navigation is needed to know for sure. By convention uvw might represent the name of barycentric coordinates for a triangle, but perhaps this specific piece of code was solving a particular problem where the derivation named them something else? It’s just ambiguous without a pre-defined naming convention, of which is imposed in the middle of non-related algorithms.

Here’s the pointer version; note how non-ambiguous this code is:

## Useful References

I currently know of a single use of references that I really like, and that’s a const reference passed to a function as an argument, and sometimes the returning of a const reference.

Passing a const reference to a function means that this is a read-only value, and is definitely not pointing to an array. It is also a pretty common convention for operator overloading. The only downside is that the dot access operator may be mistaken as a value-type access, instead of a pointer access.

Returning a const reference might make sense sometimes, but usually I’m of the opinion that a pointer is better. Returning a pointer just abides by all my previous points about memory access. If the user retrieves a const pointer from a function, the explicit -> access makes it very clear that this memory came from somewhere else!

References are also able to capture temporary rvalues. This can make the life-time of such temporary values more explicit.

Sometimes a million dereferences is just too many. In some cases the lack of the dereference operator is nice and adds to code readability. However, in this case references are just an aid and live only on the stack. The pointer is what is actually kept and stored, in order to keep the code clean and up-front. Here’s an example:

An equivalent, but more verbose and annoying version can be constructed with pointers:

## “Advanced C++” and Generic Programming

“Advanced C++” features (in quotations for sarcasm, like much of the rest of the article) are useful sometimes, there’s no doubt about. Templates, classes, and all the weirdness therein is sometimes necessary. The amount of code duplication and boilerplate that these features can be remove makes them important.

However, a lot of code is type-static and very specific. Code often solves very particular, specific problems. A lot of newer students (me) and colleagues get all caught up in the features and just end up wasting their time. When I say waste time, I mean they were actually trying to finish a project, instead of just learn about C++ and the uses thereof.

One might view C++ from the perspective that all the “advanced features” just can egg-on a programmer into over engineering their code into a weird mess of indirection and verbose templated code. Crazy inlined callbacks, type agnostic code, verbose namespaces and whatnot. Many times it’s just useless cruft, and a specific implementation for a single problem will be simpler, easier to read, and smaller in code size.

All of this ranting comes down the the point of: references let code operate in a slightly more agonistic manner, which is great for templates. The dot operator does it all! This makes sense for code that needs templating, but often doesn’t make sense for a lot of code (which was what the previous portions of the article pointed out!).

However, good generic code is so incredibly difficult to design and come up with that hardly anybody should be doing it. Good code that is used by multiple people is at least an order of magnitude harder to write than good code that has a single specific use case. Templates, references, classes, these things might make it tempting to try out all the features to make a “generic program” that “can be re-used in the future”. I’ll tell it how it is: simple code that is type static, specialized for the specific problem it is solving, and doesn’t use advanced features is probably an order of magnitude more reusable (and performant) than “generic” code, simply because it’s easier to write.

## Form an Opinion

As a reader, think for yourself and make your own judgment calls based on your own experience. This means that a good way to take advantage of the knowledge of an experienced programmer is to try out their advice with an almost skeptical attitude. Just don’t look for step-by-step instructions on how to be a computer scientist. Nobody wants to work with a mindless programmer that writes bad code, because then the good programmers will be busy cleaning it up.

# C++ Function Binding

Welcome to the first post in a series of blog posts about how to implement a custom game engine in C++. As reference I’ll be using my own open source game engine SEL. Please refer to its source code for implementation details not covered in this article.

I would like to thank John Edwards for his contribution to my education in the areas of reflection and function binding. You can thank him too by checking out their games at thatgamecompany!

Function binding in C++ is the act of being able to trigger a function given some form of input. Usually this applies to C or C++ by means of calling any function in a generic way. This can be achieved easily during compile-time in C++ by using some templates along with decltype. This is useful for:

• Script binding
• Many others

The idea is to capture the pointer to a function (or method) and pass its type around in code as a template parameter. An instance of the template type is created as a template constant. This is possible with the usual compilers as function pointers are of integral type.

The rest of the work involves creating a nice wrapper to pack arguments together and get them to the template constant pointer in a generic fashion.

I’ve created some nice slides on the topic and some demo source code. There is one slide with a video that will not play from the pdf, I attached the video to the bottom of this post. Hope this helps someone out there.

Source code demo is here. If you wish to view a fully featured example, please see my project SEL.

# Powerful C++ Messaging

A prerequisite to this information is most of the previous C++ type introspection stuff I have been writing about for a while now. Assuming the previous information has been covered, lets move on:

There exists a design of messaging, specifically for C++, of which has minimal downsides and many positive advantages. Ideally messaging should not involve any polling or implicitly required searching (as in searching through game space to see who to message, which requires expensive collision queries). It should also have a very intuitive usage, and not be very complex to work with.

If such a messaging system can be achieved then inter-object communication can be setup, to create game logic, within a scripting language.

Here are some slides I wrote on this topic for my university, but are available for public viewing:

# C++ Reflection Part 6: Lua Binding

Binding C/C++ functions to Lua is a tedious, error prone and time consuming task when done by hand. A custom C++ introspection system can aide in the automation of binding any callable C or C++ function or method a breeze. Once such a functor-like object exists the act of binding a function to Lua can look like this, as seen in a CPP file:

The advantage of such a scheme is that only a single CPP file would need to be modified in order to expose new functionality to Lua, allowing for efficient pipe-lining of development cycles.

Another advantage of this powerful functor is that communication and game logic can be quickly be created in a script, loaded from a text file, or even setup through a visual editor. Here is a quick example of what might be possible with a good scripting language:

In the above example a simple enemy is supposed to follow some target object. If the target is close enough then the enemy damages it. If the target dies, the enemy flashes a bright color and then acquires a new target.

The key here is the message subscription within the initialization routine. During run-time objects can subscribe to know about messages emitted by any other object!

So by now hopefully one would have seen enough explanation of function binding to understand how powerful it is. I’ve written some slides on the topic available in PDF format here (do note that these slides were originally made for a lecture at my university):

# Templates and Metaprogramming

Recently I’ve begun to take over a club called Game Engine Architecture Club at DigiPen. The founder of the club graduated and asked me to try running it. I’ve found it be to a lot of fun! The first lecture I did was on template metaprogramming in C++ (TMP).

The lecture was recorded live and is currently up on DigiPen’s youtube channel for public viewing. Though the slides are visible during the video, I don’t think I can share the slides themselves publicly.

Hope you enjoy the video!

# C++ Reflection: Type MetaData: Part 3 – Improvements

In our last article we learned how to store information about a classes’s members, however there are a couple key improvements that need to be brought to the MetaData system before moving on.

The first issue is with our RemQual struct. In the previous article we had support for stripping off qualifiers such as *, const or &. We even had support for stripping off an R-value reference. However, the RemQual struct had no support for a pointer to a pointer. It is weird that RemQual would behave differently than RemQual, and so on. To solve this issue we can cycle down, at compile time, the type through the RemQual struct recursively, until a type arrives at the base RemQual definition. Here’s an example:

As you can see, this differs a bit from our previous implementation. The way it works is by passing in a single type to the RemQual struct via typename T. Then, the templating matches the type provided with one of the overloads and feeds the type back into the RemQual struct with less qualifiers. This acts as some sort of compile-time “recursive” qualifier stripping mechanism; I’m afraid I don’t know what to properly call this technique. This is useful for finding out what the “base type” of any given type.

It should be noted that the example code above does not strip pointer qualifiers off of a type. This is to allow the MetaData system to properly provide MetaData instances of pointer types; which is necessary to reflect pointer meta.

It should be noted that in order to support pointer meta, the RemQual struct will need to be modified so it does not strip off the * qualifier. This actually applies to any qualifier you do not wish to have stripped.

There’s one last “improvement” one could make to the RemQual struct that I’m aware of. I don’t actually consider this an improvement, but more of a feature or decision. There comes a time when the user of a MetaData system may want to write a tidbit of code like the following:

Say the user wants to send a message object from one place to another. Imagine this message object can take three parameters of any type, and the reflection system can help the constructor of the message figure out the types of the data at run-time (how to actually implement features like this will be covered once Variants and RefVariants are introduced). This means that the message can take three parameters of any type and then take them as payload to deliver elsewhere.

However, there’s a subtle problem with the “Message ID” in particular. Param1 and Param2 are assumed to be POD types like float or int, however “Message ID” is a const char * string literal. My understanding of string literals in C++ is that they are of the type: const char[ x ], x being the number of characters in the literal. This poses a problem for our templated MetaCreator, in that every value of x will create a new MetaData instance, as the templating treats each individual value of x as an entire new type. Now how can RemQual handle this? It gets increasingly difficult to actually manage Variants and RefVariant constructors for string literals for reasons explained here, though this will be tackled in a later article.

There are two methods of handling string literals that I am aware of; the first is to make use of some form of a light-weight wrapper. A small wrapper object can contain a const char * data member, and perhaps an unsigned integer to represent the length, and any number of utility functions for common string operations (concat, copy, compare, etc). The use of such a wrapper would look like:

The S would be the class type of the wrapper itself, and the constructor would take a const char *. This would require every place in code that handles a string literal to make use of the S wrapper. This can be quite annoying, but has great performance benefits compared to std::string, especially when some reference counting is used to handle the heap allocated const char * data member holding the string data in order to avoid unnecessary copying. Here’s an example skeleton class for such an S wrapper:

As I mentioned before, I found this to be rather annoying; I want my dev team and myself to be able to freely pass along a string literal anywhere and have MetaData handle the type properly. In order to do this, a very ugly and crazy solution was devised. There’s a need to create a RemQual struct for every [ ] type for all values of x. This isn’t possible. However, it is possible to overload RemQual for a few values of x, at least enough to cover any realistic use of a string literal within C++ code. Observe:

The macro ARRAY_OVERLOAD creates a RemQual overload with a value of x. The __COUNTER__ macro (though not standard) increments by one each time the macro is used. This allows for copy/pasting of the ARRAY_OVERLOAD macro, which will generate a lot of RemQual overloads. I created a file with enough overloads to cover any realistically sized string literal. As an alternative to the __COUNTER__ macro, __LINE__ can be used instead, however I imagine it might be difficult to ensure you have one definition per line without any gaps. As far as I know, __COUNTER__ is supported on both GNU and MSVC++.

Not only will the ARRAY_OVERLOAD cover types of string literals, but it will also cover types with array brackets [ ] of any type passed to RemQual.

The second issue is the ability to properly reflect private data members. There are three solutions to reflecting private data that I am aware of. The first is to try to grant access to the MetaData system by specifying that the MetaCreator of the type in question is a friend class. I never really liked the idea of this solution and haven’t actually tried it for myself, and so I can’t really comment on the idea any further than this.

The next possible solution is to make use of properties. A property is a set of three things: a gettor; a settor; a member. The gettor and settor provide access to the private member stored within the class. The user can then specify gettors and/or settors from the ADD_MEMBER macro. I haven’t implemented this method myself, but would definitely like if I find the time to create such a system. This solution is by far the most elegant of the three choices that I’m presenting. Here’s a link to some information on creating some gettor and settor support for a MetaData system like the one in this article series. This can potentially allow a MetaData system to reflect class definitions that the user does not have source code access to, so long as the external class has gettor and settor definitions that are compatible with the property reflection.

The last solution is arguably more messy, but it’s easier to implement and works perfectly fine. I chose to implement this method in my own project because of how little time it took to set up a working system. Like I said earlier, if I have time I’d like to add property support, though right now I simply have more important things to finish.

The idea of the last solution is to paste a small macro inside of your class definitions. This small macro then pastes some code within the class itself, and this code grants access to any private data member by using the NullCast pointer trick. This means that in order to reflect private data, you must have source code access to the class in question in order to place your macro. Here’s what the new macros might look like, but be warned it gets pretty hectic:

The META_DATA macro is to be placed within a class, it places a couple declarations for NullCast, AddMember and RegisterMetaData. The DEFINE_META macro is modified to provide definitions for the method declarations created by the META_DATA macro. This allows the NullCast to retrieve the type to cast to from the DEFINE_META’s TYPE parameter. Since AddMember method is within the class itself, it can now have proper access to private data within the class. The AddMember definition within the class then forwards the information it gathers to the AddMember function within the MetaCreator.

In order for the DEFINE_META api to remain the same as before, the META_DATA macro creates a RegisterMetaData declaration within the class itself. This allows the ADD_MEMBER macro to not need to user to supply to type of class to operate upon. This might be a little confusing, but imagine trying to refactor the macros above. Is the RegisterMetaData macro even required to be placed into the class itself? Can’t the RegisterMetaData function within the MetaCreator call AddMember on the class type itself? The problem with this is that the ADD_MEMBER macro would require the user to supply the type to the macro like this:

This would be yet another thing the user of the MetaData system would be required to perform, thus cluttering the API. I find that by keeping the system as simple as possible is more beneficial than factoring out the definition of RegisterMetaData from the META_DATA macro.

Here’s an example usage of the new META_DATA and DEFINE_META macros:

The only additional step required here is for the user to remember to place the META_DATA macro within the class definition. The rest of the API remains as intuitive as before.

Here’s a link to a compileable (in VS2010) example showing everything I’ve talked about in the MetaData series thus far. The next article in this series will likely be in creating the Variant type for PODs.

# C++ Reflection: Type MetaData: Part 2 – Type Reduction and Members

In the last post we learned the very basics of setting up a reflection system. The whole idea is that the user manually adds types into the system using a single simple macro placed within a cpp file, DEFINE_META.

In this article I’ll talk about type deduction and member reflection, both of which are critical building blocks for everything else.

First up is type deduction. When using the templated MetaCreator class:

Whenever you pass in a const, reference, or pointer qualifier an entire new templated MetaCreator will be constructed by the compiler. This just won’t do, as we don’t want the MetaData of a const int to be different at all from an int, or any other registered type. There’s a simple, yet very quirky, trick that can solve all of our problems. Take a look at this:

I’m actually not familiar with the exact terminology to describe what’s going on here, but I’ll try my best. There’s many template overloads of the first RemQual struct, which acts as the “standard”. The standard is just a single plain type T, without any qualifiers and without pointer or reference type. The rest of the templated overloaded version all contain a single typedef which lets the entire struct be used to reference a single un-qualified type by supplying any of the various overloaded types to the struct’s typename param.

Overloads for the R-value reference must be added as well in order to strip down to the bare type T.

Now that we have our RemQual (remove qualifiers) struct, we can use it within our META macros to refer to MetaData types. Take a look at some example re-writes of the three META macros:

The idea is you feed in the typedef’d type from RemQual into the MetaCreator typename param. This is an example of using macros well; there’s no way to screw up the usage of them, and they are still very clean and easy to debug as there isn’t really any abuse going on. Feel free to ignore specific META macros you wouldn’t actually use. I use all three META_TYPE, META and META_STR. It’s a matter of personal preference on what you actually implement in this respect. It will likely be pretty smart to place whatever API is created into a namespace of it’s own, however.

And that covers type deduction. There are some other ways of achieving the same effect, like partial template specialization as covered here, though I find this much simpler.

Next up is register members of structures or classes with the MetaData system. Before anything continues, lets take a look at an example Member struct. A Member struct is a container of the various bits of information we’d like to store about any member:

This member above is actually almost exactly what implementation I have in my own reflection as it stands while I write this; there’s not a lot needed. You will want a MetaData instance to describe the type of data contained, a name identifier, and an unsigned offset representing the member’s location within the containing object. The offset is exceptionally important for automated serialization, which I’ll likely be covering after this article.

The idea is that a MetaData instance can contain various member objects. These member objects are contained within some sort of container (perhaps std::vector).

In order to add a member we’ll want another another very simple macro. There are two big reasons a macro is efficient in this situation: we can use stringize; there’s absolutely no way for the user to screw it up.

Before showing the macro I’d like to talk about how to retrieve the offset. It’s very simple. Take the number zero, and turn this into a pointer to a type of object (class or struct). After the typecasting, use the -> operator to access one of the members. Lastly, use the & operator to retrieve the address of the member’s location (which will be offset from zero by the -> operator) and typecast this to an unsigned integer. Here’s what this looks like:

This is quite the obtrusive line of code we have here! This is also a good example of a macro used well; it takes a single parameter and applies it to multiple locations. There’s hardly any way for the user of this macro to screw up.

NullCast is a function I’ll show just after this paragraph. All it does is return a pointer to NULL (memory address zero) of some type. Having this type pointer to address zero, we then use the ADD_MEMBER macro to provide the name of a member to access. The member is then accessed, and the & operator provides an address to this member with an offset from zero. This value is then typecasted to an unsigned integer and and passed along to the AddMember function within the macro. The stringize operator is also used to pass a string representation of the member to the AddMember function, as well as a MetaData instance of whatever the type of data the member is.

Now where does this AddMember function actually go? Where is it from? It’s actually placed into a function definition. The function AddMember itself resides within the MetaCreator. This allows the MetaCreator to call the AddMember function of the MetaData instance it holds, which then adds the Member object into the container of Members within the MetaData instance.

Now, the only place that this AddMember function can be called from, building from the previous article, is within the MetaCreator’s constructor. The idea is to use the DEFINE_META macro to also create a definition of either the MetaCreator’s constructor, or a MetaCreator method that is called from the MetaCreator’s constructor. Here’s an example:

As you can see this formation is actually very intuitive; it has C++-like syntax, and it’s very clear what is going on here. A GameObject is being registered in the Meta system, and it has members of ID, active, and components being added to the Meta system. For clarity,  here’s what the GameObject’s actual class definition might look like (assuming component based architecture):

// This boolean should always be true when the object is alive. If this is
// set to false, then the ObjectFactory will clean it up and delete this object
// during its inactive sweep when the ObjectFactory’s update is called.
bool active;
std::vector components;

Now lets check out what the new DEFINE_META macro could look like:

The RegisterMetaData declaration is quite peculiar, as the macro just ends there. What this is doing is setting up the definition of the RegisterMetaData function, so that the ADD_MEMBER macro calls are actually lines of code placed within the definition. The RegisterMetaData function should be called from the MetaCreator's constructor. This allows the user to specify what members to reflect within a MetaData instance of a particular type in a very simple and intuitive way.

Last but not least, lets talk about the NullCast function real quick. It resides within the MetaCreator, as NullCast requires the template's typename MetaType in order to return a pointer to a specific type of data.

And that's that! We can now store information about the members of a class and deduce types from objects in an easily customizeable way.

Here's a link to a demonstration program, compileable in Visual Studio 2010. I'm sure this could compile in GCC with a little bit of house-keeping as well, but I don't really feel like doing this as I need to get to bed! Here's the output of the program. The format is . For the object, members and their offsets are printed:

Now you might notice at some point in time, you cannot reflect private data members! This detail will be covered in a later article. The idea behind it is that you require source code access to the type you want to reflect, and place a small tiny bit of code inside to gain private data access. Either that or make the MetaCreator a friend class (which sounds like a messy solution to me).

And here we have all the basics necessary for automated serialization! We can reflect the names of members, their types, and offsets within an object. This lets the reflection register any type of C++ data within itself.

# Generic Programming in C

Generic programming is style of programming that is typeless, as in the type of data to be used is not determined at the time the code was written. In C++ this is achieved with templating. I know of two ways to perform generic programming in C. One way is to use pointer typecasting. It’s important to understand that when you typecast a pointer no data is modified within the pointer; how the data the pointer points to is interpreted is changed. This can be utilized to create generic algorithms and functionality.

For example a linked list data structure can be created utilizing pointer typecasting where the data to be held by a node is typecasted to a single type no matter what the actual type of the data is. Usually a void * is used in this case to prevent a dereference to data of unknown type. However, this method can only be used when access to the data held by the node (the data pointed by the void *) does not need to be accessed.

A more well-rounded (in my opinion) approach to generic programming in C is to make use of the preprocessor directive ##.

## pastes two tokens together to create a new token. Usage of the ## operator is known as token pasting, or token concatenation. There’s a whole lot of documentation on the ## operator. If you’re not familiar with it, you need to do some familiarizing before reading on.

All it does is take two tokens within a macro and stick them together to create a new token. Example:

In the above example varval will need to be defined in order to avoid compiler errors, perhaps by using it as the name of a variable to define.

Token pasting is utilized to solve an age old problem in C of creating various functions to achieve the same problem. For example, I’ve written many linked lists in C and one day, I finally decide “I will never write another linked list in C”. I’m sick and tired of having to write a new set of functions for each type of data that I need a new linked list for.

By using the ## operator one can automate the process of duplicating code to create new functions and new definitions based off of data type. This is much like templating in C++, except not nearly as user-friendly, yet still fun and satisfying to try.

Since there’s a great need in C to duplicate code for various data types, and the ## operator allows us to create new tokens with two other tokens, we can use the type of the data to create a new name for a new set of functionality for each type of data desired.

Examine the following:

As you can see, it can be really annoying to write different code for different types of data, when the code itself is highly redundant or even identical. However by providing the GENERIC_MAX macro a data type, a new function can be automatically generated by the C preprocessor! The MAX token is concatenated with the TYPE parameter to form a new function definition. Providing the GENERIC_MAX macro with type int will result in a new function being defined named int_MAX. Creating new sets of functionality of anything can be as simple as declaring or defining with a couple calls to a macro.

It might seem a bit self-defeating however to have to figure out what the new name of your generated function is going to be in order to call it. A very simple utility macro should be used to wrap around your generated generic functions. More information on how to do this is shown later in the post.

This can be taken further and be applied to even an abstract data type, such as a linked list. I’ve constructed myself a generic linked list.

The above code is an example definition of a macro that defines a generic node type. The type of the node is used in creating new name definitions wherever the node’s type is required in the structure definition.

Above is an example of how to define a macro to define a function to create a new node of specified type based on the previous node definition.

You might be thinking that it would be a bit annoying to call so many different define and declare macros for each structure and function. You can actually just declare all structures and function prototypes within a single macro, I called mine DECLARE_LIST, which declares all structure types and function prototypes, as well as some utility macros required to use the generic linked list. I also have a DEFINE_LIST macro that defines all functions used in the generic linked list.

For abstract data types such as lists or stacks or queues, you’ll probably want some additional utility functions to actually call the various generic functions you’ve defined. Here is an example set of utility functions I’ve made for my generic linked list:

As you can see wrappers for the various generic functions are needed to properly call them (unless you want to manually figure out what the name of the function generated for whatever type you’re currently using is). There’s also a couple macros for getting the generated type associated with a provided type.

Drawbacks:
There is one major drawback to the ## operator strategy; you cannot use * in the type name, or spaces in the type name. If you need to use a pointer as your type, you’ll have to typedef your pointer into a single valid token and pass the typedef’d type to the generic macros.

There is another thing that can be considered a drawback: you cannot have a pointer to a macro. As such debugging macros can be very very tedious. I myself had to generate a processed file and compile with the expanded macro in order to pinpoint where I originally had some compiling errors when creating my generic linked list.