Stupid Enumeration Tricks

From EDM2
Jump to: navigation, search

Written by Dean Roddey

Introduction

An enumeration, for those C/C++ programmers who have not used them, is a data type which can be given a finite (usually small) set of named values. Therefore enumerations are useful for representing real world values that naturally have a finite set of values, each of which has a meaningful name. The days of the week are an obvious example of an enumerated value since it has a small set of legal values, each of which has a name that is meaningful. Enumerations have traditionally been weak in the C/C++ language world, as compared to Modula2 and Ada for instance. But don't despair because the power of C++ enumerations can be easily expanded with a couple of smart template functions and/or macros.

Some C++ aficionados will tell you that use of any enumerated values in C++ is a sign of bad object architecture, but you should take this with a grain of salt. Replacing all enumerations with classes can result in significant code bloat and performance hits. It is true that enumerations, used as they normally are, are often weak spots in the code because they are not easily extensible, so you must make a judgment call as to what is appropriate for you given the particular constraints you are under.

The classic example of bad enumeration use is the program that deals with a set of vehicles represented by a structure. It creates an enumerated value that enumerates the types of vehicles and uses this to know, at runtime, what type of vehicle each structure represents. This type of code will tend to have lots of switch statements to do "this if it's a truck, that if it's a car, and the other if it's a boat." When a new vehicle type is required, finding and fixing all of these instances can be a nightmare. The compiler will not help, so it is a totally manual operation. OO was invented partly to avoid this type of non-extensible architecture. This article will give you some hints to help you end run some of these arguments against enumerations and make them work for you instead of against you, in those situations where handling the situation via polymorphism is just too high a price to pay.

The sample code in this article assumes the existence of a string class, which is not implemented here for reasons of space. Since any class system will provide a string class with similar capabilities, you should be able to quickly adapt the code to your particular C++ system.

The Fight Against Ambiguity

One of the most obvious uses of enumerations is to avoid ambiguity in C++ overloading. Because of the sometimes arcane built in type conversions that C++ inherited from C, overloading functions can sometimes be very difficult because of the ambiguity created by convertible types. One way you can often avoid the ambiguity is to use enumerations to create unique types. Although the enumerated value is still just a set of numbers, the compiler can tell the difference between it and an integral type because it considers any enumeration as a unique type.

// One way to provide a justified output function
const long Left = 0;
const long Center = 1;
const long Right = 2;

//
// This won't work because our formatting value has the
// same type as a data type we want to format. The
// compiler can't tell the difference.
//
void String::operator<<(long lJustification) {}
void String::operator<<(long lToFormat) {}

//
//  Do it like this instead. Now the justification value,
//  though still just a number really, has a unique type
//  that is distinguishable by the compiler.
//
enum eJustify
{
    eJustify_Left
    , eJustify_Center
    , eJustify_Right
};

void String::operator<<(eJustify eJustification) {}
void String::operator<<(long lToFormat) {}

// So now we can safely do this
long   lTmp = 10;
String strTmp;
strTmp << eJustify_Left
       << lTmp
       << "\n";

Figure 1: Using enumerations to avoid confusion with overloaded methods that have integer arguments.

Look at Figure 1. It is a trivial example of this kind of enumeration usage. If you want to be able to format values into your string class using different justifications, you need to be able to provide a mechanism to set the justification. If you just define some constant long values as left, center, and right, that would work OK. But, when you later wanted to format in an integral value, the compiler could not tell the difference between the desire to set the justification and to format an integral value of that same long type.

The obvious way out of this deadlock is to make the justification an enumerated type. This lets the compiler understand the difference between the justification indicator and the value to be formatted. In the second part of the figure an eJustify enumeration is created to take the place of the defined justification constants and, with minimal changes, the system works much better.

Use of enumerations in this way also makes it much more difficult to accidentally call a function with transposed arguments. An example is a function that takes a day of the week and a value to add to an array of counters, with one slot for each day of the week. If the days of the week was just a defined integral value, the compiler will have no day if you called it with the values backwards. This kind of error might not show up clearly, even during testing.

So you can make your programs so much more compile time type safe by using enumerations when it makes sense to do so. This works because the compiler will give you a warning if you try to set an enumerated value using an integral value, which would happen if you called the function with transposed parameters. Moving as much code confirmation to compile time as possible is what C++ is all about, so do it when you can.

Cheating To Beat Ambiguity

You can actually even cheat and use an enumeration in a technically illegal way to beat ambiguity. If we wanted to extend Figure 1 to set the width of the field within which the justified formatting is to take place, we would need a way to pass some arbitrary field width value into the string object via the insertion operator. We obviously don't want to make an enumerated type with a value for every possible field width we would want, but we don't really have to.

We can just define an enumeration called eFieldWidth, and give it just one dummy value. We can then write an insertion operator for the eFieldWidth type, just as we did for the justification type. Then we can cast any value to eFieldWidth in order to pass it into the string object, like this:

strTmp << eFieldWidth(10)
       << eLeft
       << lValue;

Sure, 10 is not a defined value of eFieldWidth, but it does not matter in this case. The recipient of the bogus value knows that it is bogus so it is safe to do. The payoff is that we now have an unambiguous way to pass in any field width value we want (within the limits of an enumerated value in your particular compiler, but that's generally not a limit for a field width.)

Boolean Enums

If your compiler does not support the new official C++ Boolean type (as mine does not) then you can provide your own Boolean enumeration in the meantime. Later, you can replace it when the C++ Boolean type becomes widely used. If you do this, you will quickly discover one of the downsides to using your own Boolean enumeration, namely that the return from the negation operator, called operator!() in C++, is going to be an integral value. A good C++ compiler will give you a hard time if you attempt to assign an integral value to an enumeration so negating the enumerated value requires a cast in order to get the returned value back into your own Boolean type.

// Define our boolean enum
enum eBoolean{eFalse, eTrue};

// Declare a test boolean value
eBoolean  bTest = eTrue;

// Without a negation operator you need this
bTest = eBoolean(!bTest);

// Declare a negation operator
inline eBoolean operator!(eBoolean bState)
{
    return eBoolean(!bState);
}

// Now you can do this
bTest = !bTest;

Figure 2: Boolean enumerations.

You can get around this type of problem by providing a very simple inline negation operator. Look at the code in Figure 2. A simple negation operator allows you to generate much cleaner code. Just put the inline operator definition in the same header that provides the Boolean type so that any code that can see the Boolean type can see the negation operator.

Formatting Enumerations

An obvious problem with C++ enumerations is that the language does not automatically support formatting them to a textual representation. And, even if it did, the literal name of an enumerated value is often not what you want to display for human consumption. An obvious way around this problem is to provide a insertion method for your string class. Once an enumeration can be formatted to a string, the string can then be formatted to other places, leveraging the universality of strings.

#include <strobject.Hpp>

// Define our test enumeration
enum eWeekDays
{
    eWeekDay_Sunday
    , eWeekDay_Monday
    , eWeekDay_Tuesday
    , eWeekDay_Wednesday
    , eWeekDay_Thursday
    , eWeekDay_Friday
    , eWeekDay_Saturday

    , eWeekDays_Min = eWeekDay_Sunday
    , eWeekDays_Max = eWeekDay_Saturday
};

String& operator<<(String& strTarget, eWeekDays eDay)
{
    if (eDay == eWeekDay_Sunday)
        strTarget = "Sunday";
    else if (eDay == eWeekDay_Monday)
        strTarget = "Monday"

  [do the rest here]

    else if (eDay == eWeekDay_Saturday)
        strTarget = "Saturday";
    else
        strTarget = "????"

    return strTarget;
}

void main()
{
    // A string to format to
    String  strFormat;

    // An enum to format
    eWeekDays eDay = eWeekDay_Monday;

    // Format the enum to the string
    strFormat << eDay;

    // Dump it out for test
    cout << "Day: " << strFormat;
}

Figure 3: Using enumerations to specify operations.

Look at Figure 3. In this sample, a simple enumeration for the days of the week is created to use in the sample code. The reason for the particular naming style and the extra min/max values will be discussed later. Following the enumeration is an insert operator method that takes a string object and a weekday enumeration. It then formats the appropriate value into the string for the value of the enumeration. If the value is not a valid weekday, it formats "????" into the string. You might want to generate an exception here or take some other action. At the end it returns a reference to the target string object, as a good insertion operator should.

The benefit of this type of enum formatting function is that it centralizes the formatting of the enum value in once place. Yes it does use an if statement to format the values and it is affected by value added to or removed from enumeration; however, it is in one place where you know to look. The outside world is still shielded from these changes. If you consistently use this type of architecture, updating the formatting function after changes to the enumeration will become second nature.

Enumerating Enumerations

The most glaring problem with C++ enumerations is that... well you cannot enumerate them. In other words, you cannot (without prior knowledge of them) loop through all of the values of an enumeration and do something for each value. In an object-oriented system, anywhere that you have a hard coded case statement to process each possible value of an enumeration, you may be setting yourself up for problems if the enumeration changes. But, if you could enumerate all of the values in your enumeration, in such a way that the code would not be affected by the addition or removal of values, you could avoid using classes and virtual methods in some trivial cases where an enumeration would do just as well without the overhead.

In order to achieve this kind of flexibility with enumerations, you just need to define your enums consistently and provide two specially named values. A consistent enumeration style is a laudable goal regardless of your desire to use the tricks discussed here so, given that you don't mind doing the right thing to begin with, you probably will not consider the needed infrastructure much of a burden.

// A macro to generate the standard enum functions
#define StdEnumTricks(eEnumType) \
inline void operator++(eEnumType& eVal) \
{ \
    eVal = eEnumType(eVal+1); \
} \
\
inline void operator++(eEnumType& eVal, int)  \
{ \
    eVal = eEnumType(eVal+1); \
} \
\
inline void operator--(eEnumType& eVal) \
{ \
    eVal = eEnumType(eVal-1); \
} \
\
inline tCIDLib::TVoid operator--(eEnumType& eVal, int)  \
{ \
    eVal = eEnumType(eVal-1); \
} \
\
inline eEnumType eEnumMax(eEnumType) \
{ \
    return eEnumType##_Max; \
} \
\
inline eEnumType eEnumMin(eEnumType) \
{ \
    return eEnumType##_Min; \
}

Figure 4: StdEnumTricks() macro.

Look at the code in Figure 4. In this case, the eWeekDays enum from Figure 2 is assumed. Notice the style used. Each value starts with eWeekDay_ and there is are two values named eWeekDays_Min and eWeekDays_Max, which are assigned the minimum and maximum values of the enumeration. Note that they have the prefix eWeekDays_, which is the same as the overall enumeration name.

Following the enumeration is a macro, named StdEnumTricks(), that will generate a set of inline functions. The first set of functions are increment and decrement operators that can be used to increment and decrement the enumeration, both prefix and postfix. The last functions are to get the minimum and maximum values of the enum just by having its type.

Bringing It Together

By having these inline functions available, you can now write code that does something for each value of an enumeration, and which will continue to do so without modification despite changes to the enumerated type. Look at Figure 4, which uses the code from Figures 2 and 3 to output all of the values in the weekdays enumeration.

#include <strobject.hpp>

// Generate the standard tricks
StdEnumTricks(eWeekDays)

// Add another insertion for streams
ostream& operator<<(ostream& osTarget, eWeekDays eDay)
{
    String  strTmp;

    strTmp << eDay;
    osTarget << strTmp;
    return osTarget;
}

void main()
{
    // Format all the days of the week to stdout
    for (eWeekDays eDay=eEnumMin(eWeekDays);
            eDay <= eEnumMax(eWeekDays); eDay++)
    {
        cout << "Day: " << eDay << "\n";
    }
}

Figure 5: Using the StdEnumTricks() macro.

Figure 5 adds an additional insert operator that formats the enumerated value directly to an output stream. Actually it internally formats first to a string object and then outputs the string object to the stream. Since a stream insertion operator will already exist for strings, this scheme allows you to avoid duplicating the formatting code. It's not really that inefficient since a temporary string would still be used to format the value no matter where it was actually done.

The Big Picture

In my class system, each DLL has a dedicated module that is for implementation of enum insertion methods. For a DLL name Foo.DLL, I would have a Foo_Type.Hpp and Foo_Type.Cpp module for this type of code. For each enum that I want to be formatable I provide a string and output stream insertion operator. The consistency of this type of architecture means that any user of my DLLs always knows instantly where to look for such stuff, and that I know where to go for any code that might be affected by a change to an enum. This is just a stylistic nicety of course, not a requirement.

Not all enumerated values make sense for this type of treatment. As presented, the code only works for enums that have contiguous values, not for enums used as bit masks or non-contiguous enums. You can still implement similar tricks for these enums, but they cannot be generated by the standard enum tricks macro. You would have to implement another macro that does things slightly differently and do some of the code manually perhaps.

If not for the need to obtain the minimum and maximum values of an enum by type, you could use template functions instead of macro, which almost anyone would prefer. However I do not know how the template system could be used to generate the same kind of code as the eEnumMin() and eEnumMax() functions from Figure 3, since they depend on textual replacement. If you do know how, then please pass this information on to me and I will pass it on to others.

Another very obvious addition to the StdEnumTricks macro, is a function called something like TestEnum(eVal) that would test the passed value against the min/max values of its type and throw an exception if out of range. The macro could easily generate very specific error information on the enum type, the legal values, the tested value, and the file/line where the test was done. This would simplify debugging greatly. By adding a little conditional code, in order to define slightly different versions during debug and production builds, you could no-op out the tests for production code to improve performance.

Summary

As you can see, enums have a lot of untapped potential that just requires a little effort to tease out. Luckily, the flexibility of C++ and its ability to define operators allows you to pretty simply access this potential. I'm sure that there are many other examples of slick enumeration usage that you can come up with. So keep an eye out for ways to put them to use. Don't just forget the C++'ism that enumerations are bad object code, because often times they are. But also don't let that warning stop you from using enumerations to your advantage where it makes sense.

References

See The C++ Annotated Reference Manual for the details of enumerations under C++. ISBN 0-201-51459-1