Features To Avoid Like the Plague

Despite the length of this note, there are numerous features in C++ that I haven't explained. I'm sure each feature has its advocates, but despite programming in C and C++ for over 15 years, I haven't found a compelling reason to use them in any code that I've written (outside of a programming language class!)

Indeed, there is a compelling reason to avoid using these features -- they are easy to misuse, resulting in programs that are harder to read and understand instead of easier to understand. In most cases, the features are also redundant -- there are other ways of accomplishing the same end. Why have two ways of doing the same thing? Why not stick with the simpler one?

I do not use any of the following features in Nachos. If you use them, caveat hacker.

  1. Multiple inheritance. It is possible in C++ to define a class as inheriting behavior from multiple classes (for instance, a dog is both an animal and a furry thing). But if programs using single inheritance can be difficult to untangle, programs with multiple inheritance can get really confusing.

  2. References. Reference variables are rather hard to understand in general; they play the same role as pointers, with slightly different syntax (unfortunately, I'm not joking!) Their most common use is to declare some parameters to a function as reference parameters, as in Pascal. A call-by-reference parameter can be modified by the calling function, without the callee having to pass a pointer. The effect is that parameters look (to the caller) like they are called by value (and therefore can't change), but in fact can be transparently modified by the called function. Obviously, this can be a source of obscure bugs, not to mention that the semantics of references in C++ are in general not obvious.

  3. Operator overloading. C++ lets you redefine the meanings of the operators (such as + and >>) for class objects. This is dangerous at best ("exactly which implementation of '+' does this refer to?"), and when used in non-intuitive ways, a source of great confusion, made worse by the fact that C++ does implicit type conversion, which can affect which operator is invoked. Unfortunately, C++'s I/O facilities make heavy use of operator overloading and references, so you can't completely escape them, but think twice before you redefine '+' to mean ``concatenate these two strings''.

  4. Function overloading. You can also define different functions in a class with the same name but different argument types. This is also dangerous (since it's easy to slip up and get the unintended version), and we never use it. We will also avoid using default arguments (for the same reason). Note that it can be a good idea to use the same name for functions in different classes, provided they use the same arguments and behave the same way -- a good example of this is that most Nachos objects have a Print() method.

  5. Standard template library. An ANSI standard has emerged for a library of routines implementing such things as lists, hash tables, etc., called the standard template library. Using such a library should make programming much simpler if the data structure you need is already provided in the library. Alas, the standard template library pushes the envelope of legal C++, and so virtually no compilers (including g++) can support it today. Not to mention that it uses (big surprise!) references, operator overloading, and function overloading.

  6. Exceptions. There are two ways to return an error from a procedure. One is simple -- just define the procedure to return an error code if it isn't able to do it's job. For example, the standard library routine malloc returns NULL if there is no available memory. However, lots of programmers are lazy and don't check error codes. So what's the solution? You might think it would be to get programmers who aren't lazy, but no, the C++ solution is to add a programming language construct! A procedure can return an error by ``raising an exception'' which effectively causes a goto back up the execution stack to the last place the programmer put an exception handler. You would think this is too bizarre to be true, but unfortunately, I'm not making this up.

While I'm at it, there are a number of features of C that you also should avoid, because they lead to bugs and make your code less easy to understand. See Maguire's "Writing Solid Code" for a more complete discussion of this issue. All of these features are legal C; what's legal isn't necessarily good.

  1. Pointer arithmetic. Runaway pointers are a principal source of hard-to-find bugs in C programs, because the symptom of this happening can be mangled data structures in a completely different part of the program. Depending on exactly which objects are allocated on the heap in which order, pointer bugs can appear and disappear, seemingly at random. For example, printf sometimes allocates memory on the heap, which can change the addresses returned by all future calls to new. Thus, adding a printf can change things so that a pointer which used to (by happenstance) mangle a critical data structure (such as the middle of a thread's execution stack), now overwrites memory that may not even be used.

    The best way to avoid runaway pointers is (no surprise) to be very careful when using pointers. Instead of iterating through an array with pointer arithmetic, use a separate index variable, and assert that the index is never larger than the size of the array. Optimizing compilers have gotten very good, so that the generated machine code is likely to be the same in either case.

    Even if you don't use pointer arithmetic, it's still easy (easy is bad in this context!) to have an off-by-one errror that causes your program to step beyond the end of an array. How do you fix this? Define a class to contain the array and its length; before allowing any access to the array, you can then check whether the access is legal or in error.

  2. Casts from integers to pointers and back. Another source of runaway pointers is that C and C++ allow you to convert integers to pointers, and back again. Needless to say, using a random integer value as a pointer is likely to result in unpredictable symptoms that will be very hard to track down.

    In addition, on some 64 bit machines, such as the Alpha, it is no longer the case that the size of an integer is the same as the the size of a pointer. If you cast between pointers and integers, you are also writing highly non-portable code.

  3. Using bit shift in place of a multiply or divide. This is a clarity issue. If you are doing arithmetic, use arithmetic operators; if you are doing bit manipulation, use bitwise operators. If I am trying to multiply by 8, which is easier to understand, x << 3 or x * 8? In the 70's, when C was being developed, the former would yield more efficient machine code, but today's compilers generate the same code in both cases, so readability should be your primary concern.

  4. Assignment inside conditional. Many programmers have the attitude that simplicity equals saving as many keystrokes as possible. The result can be to hide bugs that would otherwise be obvious. For example:

        if (x = y) {
          ...
    

    Was the intent really x == y? After all, it's pretty easy to mistakenly leave off the extra equals sign. By never using assignment within a conditional, you can tell by code inspection whether you've made a mistake.

  5. Using #define when you could use enum. When a variable can hold one of a small number of values, the original C practice was to use #define to set up symbolic names for each of the values. enum does this in a type-safe way -- it allows the compiler to verify that the variable is only assigned one of the enumerated values, and none other. Again, the advantage is to eliminate a class of errors from your program, making it quicker to debug.