Despite the length of this note, there are numerous
features in C++ that I haven't explained. I'm sure each feature
has its advocates, but despite programming in C and C++ for over 15
years, I haven't found a compelling reason to use them in any code
that I've written (outside of a programming language class!)
Indeed, there is a compelling reason to avoid using these features -- they are
easy to misuse, resulting in programs that are harder to read and understand
instead of easier to understand. In most cases, the features are also
redundant -- there are other ways of accomplishing the same end. Why have
two ways of doing the same thing? Why not stick with the simpler one?
I do not use any of the following features in Nachos.
If you use them, caveat hacker.
- Multiple inheritance. It is possible in C++ to define
a class as inheriting behavior from multiple classes (for instance,
a dog is both an animal and a furry thing). But if programs
using single inheritance can be difficult to untangle, programs
with multiple inheritance can get really confusing.
- References. Reference variables are rather hard to
understand in general; they play the same role as pointers, with
slightly different syntax (unfortunately, I'm not joking!)
Their most common use is to declare some parameters to a function
as reference parameters, as in Pascal. A call-by-reference
parameter can be modified by the calling function, without the callee
having to pass a pointer. The effect is that parameters look
(to the caller) like they are called by value (and therefore can't change),
but in fact can be transparently modified by the called function.
Obviously, this can be a source of obscure bugs, not to mention
that the semantics of references in C++ are in general not obvious.
- Operator overloading. C++ lets you redefine the meanings
of the operators (such as + and
>>) for class objects.
This is dangerous at best ("exactly which implementation of '+' does
this refer to?"), and when used in non-intuitive ways, a
source of great confusion, made worse by the fact that C++ does
implicit type conversion, which can affect which operator
is invoked. Unfortunately, C++'s I/O facilities
make heavy use of operator overloading and references, so you
can't completely escape them, but think twice before you redefine
'+' to mean ``concatenate these two strings''.
- Function overloading. You can also define different functions
in a class with the same name but different argument types. This is also
dangerous (since it's easy to slip up and get the unintended version),
and we never use it. We will also avoid using default arguments (for the
same reason). Note that it can be a good idea to use the same name for
functions in different classes, provided they use the same
arguments and behave the same way -- a good example of this is that
most Nachos objects have a Print() method.
- Standard template library. An ANSI standard has emerged for a
library of routines implementing such things as lists, hash tables,
etc., called the standard template library. Using such a library
should make programming much simpler if the data structure you need
is already provided in the library. Alas, the standard template
library pushes the envelope of legal C++, and so virtually no
compilers (including g++) can support it today. Not to mention that
it uses (big surprise!) references, operator overloading, and
function overloading.
- Exceptions. There are two ways to return an error from
a procedure. One is simple -- just define the procedure to return
an error code if it isn't able to do it's job. For example,
the standard library routine malloc returns NULL if there
is no available memory. However, lots of programmers are lazy and
don't check error codes. So what's the solution? You might think
it would be to get programmers who aren't lazy, but no, the C++ solution
is to add a programming language construct! A procedure can
return an error by ``raising an exception'' which effectively
causes a goto back up the execution stack to the last
place the programmer put an exception handler. You would think
this is too bizarre to be true, but unfortunately,
I'm not making this up.
While I'm at it, there are a number of features of C that you also
should avoid, because they lead to bugs and make your code less easy
to understand. See Maguire's "Writing Solid Code" for a more complete
discussion of this issue. All of these features are legal C;
what's legal isn't necessarily good.
- Pointer arithmetic. Runaway pointers are a principal source
of hard-to-find bugs in C programs, because the symptom of this happening
can be mangled data structures in a completely different part of the program.
Depending on exactly which objects are allocated on the heap in which
order, pointer bugs can appear and disappear, seemingly at random.
For example, printf sometimes allocates memory on the heap,
which can change the addresses returned by all future calls to new.
Thus, adding a printf can change things so that a pointer
which used to (by happenstance) mangle a critical data structure
(such as the middle of a thread's execution stack), now overwrites memory
that may not even be used.
The best way to avoid runaway pointers is (no surprise) to be
very careful when using pointers. Instead of iterating
through an array with pointer arithmetic, use a separate index
variable, and assert that the index is never larger than the size
of the array. Optimizing compilers have gotten very good, so that the
generated machine code is likely to be the same in either case.
Even if you don't use pointer arithmetic, it's still easy
(easy is bad in this context!) to have an off-by-one errror
that causes your program to step beyond the end of an array.
How do you fix this? Define a class to contain the array
and its length; before allowing any access to the array,
you can then check whether the access is legal or in error.
- Casts from integers to pointers and back. Another source
of runaway pointers is that C and C++ allow you to convert
integers to pointers, and back again. Needless to say, using a
random integer value as a pointer is likely to result in unpredictable
symptoms that will be very hard to track down.
In addition, on some 64 bit machines, such as the Alpha, it is
no longer the case that the size of an integer is the same as the
the size of a pointer. If you cast between pointers and integers,
you are also writing highly non-portable code.
- Using bit shift in place of a multiply or divide.
This is a clarity issue. If you are doing arithmetic, use
arithmetic operators; if you are doing bit manipulation,
use bitwise operators. If I am trying to multiply by 8, which is
easier to understand, x << 3 or x * 8? In the 70's,
when C was being developed, the former would yield more efficient
machine code, but today's compilers generate the same code in both
cases, so readability should be your primary concern.
- Assignment inside conditional. Many programmers have the attitude
that simplicity equals saving as many keystrokes as possible.
The result can be to hide bugs that would otherwise be obvious.
For example:
if (x = y) {
...
Was the intent really x == y? After all, it's pretty easy
to mistakenly leave off the extra equals sign. By never using
assignment within a conditional, you can tell by code inspection
whether you've made a mistake.
- Using #define when you could use enum.
When a variable can hold one of a small number of values,
the original C practice was to use #define to set up
symbolic names for each of the values. enum does this
in a type-safe way -- it allows the compiler to verify
that the variable is only assigned one of the enumerated values,
and none other. Again, the advantage is to eliminate a class of
errors from your program, making it quicker to debug.