Kerberos: Where Did It Come From, and Why?

Brian Tung <brian@isi.edu>

Authentication comes out of a desire for authorization; it's rare that you want to know who someone is, just for the sake of knowing. In fact, the two are linked so closely in people's minds that for a long time, it wasn't recognized that they were distinct. In the story of Ali Baba and the Forty Thieves, the cave where the thieves stored their stolen goods remained shut until someone uttered the phrase, "Open, Sesame!" With the benefit of hindsight, we can see this as two steps. First, the cave identifies the speaker as one who knows the magic phrase; and second, the cave door is actually opened based on that identification. The main reason that breaking things down this way was not so obvious, of course, is that the authentication was used for only that one purpose.

It's also worth noting, by the way, that voice recognition is not part of the authentication mechanism here. All that matters is whether you know the magic phrase or not. How you say it doesn't matter at all; the cave door will open equally easily in any event.

That admittedly fictional instance illustrates one particular mode of authentication, which we can call something you know. Another mode of authentication is something you have. If you have ever had to write a check at the supermarket, say, you know that you usually have to display your driver's license (or some other form of ID) in order to prove that it's you who's writing a check out of your account, rather than someone else. The assumption is that only you ever have your license; in addition, a photograph provides additional binding.

Still another mode of authentication is something you are. Here's a classical example. In the Bible, in the Book of Judges, there's the story of the conflict between the Gileadites and the Ephraimites on the banks of the River Jordan. The leader of the Gileadites set up a guard at a ford on the river, and this guard's duty was to ask everyone who wished to pass whether they were an Ephraimite or a Gileadite.

Well, since the guard made it abundantly clear that he was a Gileadite, to answer "Ephraimite" was to ask for certain death. But the guard didn't rely only on the person's say-so. He also asked each person to say the word "shibboleth," which is a Hebrew word meaning either "ear of corn" or "flooding river." The word itself was recognized by Gileadite and Ephraimite alike, but the Ephraimites spoke a dialect of Hebrew that lacked the "sh" sound. They could only pronounce the word as "sibboleth," whereupon the guard discovered their true hailing, and slaughtered them.

More recently, we have biometric authentication, where some mechanism performs a retinal scan, or a fingerprint scan, or anything else that is supposed to identify your actual person. Although, strictly speaking, retinas and fingerprints are things that you have, rather than are, it's generally understood that people usually won't steal eyeballs and fingers just to masquerade as someone else. (There are exceptions, naturally, and these have to be properly accounted for.)

These three modes of authentication--something you know, something you have, and something you are--were first identified as such by Steve Carlton, John Taylor, and John Wyszynski in a 1988 paper, "Alternate Authentication Mechanisms." Practically all authentication mechanisms in use today can be reduced to one or more of these modes. Combinations of these modes are common: in order to construct transactions at an ATM, for example, you have to both have the ATM card, and know the PIN.

The most common mode in the network world is something you know, and the canonical example of this is password authentication. In its purest form, when a user tries to access a server, the server requires the user to provide a password. If the password given matches what the server expects, the user is allowed access; if not, access is denied.

One problem with this is that in order to check the password, the server has to have a database of passwords (usually, one for each user), and that's a point of vulnerability, especially to insiders. Moreover, the passwords in many early systems were kept in a file that many programs consulted for more routine purposes, such as linking a user name to a real name, or to group membership. The /etc/passwd file is an obvious example.

To get around this, most systems today use a mechanism called hashed passwords. Instead of storing the actual password p for each user, the server runs p through a one-way hash function H, and stores the result H(p). A hash function is something that takes an arbitrary piece of data and efficiently reduces it to a short key, usually of fixed length. For example, a function that breaks up a stream of bytes into 32-bit chunks and XORs all the chunks together is a hash function with a 32-bit output.

A one-way hash is one in which it is computationally infeasible, given a key, to find a piece of data that reduces to that key. The XOR hash just described is clearly not one-way, since every key hashes to itself. But they do exist, and the algorithm used on many older Unix machines is based on DES with a known key and initialization vector. On newer machines, those passwords are hashed with MD5 (Message Digest 5). It is also possible, though uncommon, to hash passwords with the SHA-1 hash (Secure Hash Algorithm 1).

Incidentally, hash functions are sort of the witch's brew of cryptography. Most encryption functions are based at least in part on some fancy math; for example, the various public key algorithms are based on large number factorization, or discrete logarithms, or similarly hard math, and even DES is based on some interesting combinatorics, although the S-boxes are essentially alchemy. But one-way hash functions are really eye-of-newt concoctions, because the very idea is to make the operations as confusing and non-invertible as possible, which is not the idea with encryption functions, obviously. They gain their security through their large range: the output of MD5 is 128 bits long, and all outputs are possible, so there are 2128 different possible outputs. The output of SHA-1 is even longer--160 bits long, by default--so it has 2160 distinct outputs. Empirically, the most efficient attack is brute force, so that it will take 2128 different inputs before you find a matching output for MD5, and 2160 different inputs before you find a match for SHA-1.

In any event, once the hashed password is stored, each time the user wants to authenticate to the server, the user (or the user's client program) sends the password over to the server. (Can you see why the password isn't hashed before being sent to the server?) The server then hashes the password given by the user, and then tries to match it to the hashed password on record in the database. If the two match, the user is permitted access.

There's still a problem with this, though, even with hashed passwords, and that is that it's possible to eavesdrop, to overhear the password being given. In the case of the cave of the Forty Thieves, Ali Baba was able to discover the password by hiding nearby when the cave was opened legitimately by one of the Thieves. In the network world, if a password has to be sent over the network, it will be possible for anyone on the network to hear it. It may not help if it is only exposed on a private network, for that only reduces the domain of potential attackers to insiders, and there is plenty of empirical evidence to suggest that many, maybe even most, attacks are mounted by insiders [1].

The obvious approach is just to protect the password in transit, but how? Any protection mechanisms that prevents eavesdroppers from gaining useful information about the password relies on some prior agreement between the endpoints. Today, we have public key algorithms for that (and even they rely on an out-of-band prior agreement--an accepted certification authority), but before the late 1970s, that wasn't really an option.

The idea that people eventually hit upon was, in fact, not to protect the password at all, but to use the password to protect something else. Any password can be transformed into an encryption key by running it through a hash function, extracting the needed bits, and forming the key out of those bits. For example, a DES key consists of 64 bits, of which 8 are parity bits; that is, they are wholly determined by the other 56 bits and ensure that the key has not been accidentally corrupted. (The bits are not sufficient to ensure that the key hasn't been tampered with, though.) You could take a password, run it through MD5, take the first 56 bits, compute the 8 parity bits, and there's your key.

Once you generate the key, you can use it to encrypt some plaintext--say, a timestamp. The resulting ciphertext is then sent to the server, which attempts to decrypt it with the same key, stored in a local database. If the message decrypts to a recent timestamp, the server accepts the user's authentication, and regular business then proceeds.

This approach has a number of flaws. One is that it doesn't scale very well. Each transaction requires a separate authentication exchange, and if a server receives a large volume of requests, it could spend most of its time doing authentication and very little of it at all doing actual work. It also requires each server to have a separate copy of the keys for all its users. We can manage this scalability problem by instituting a key distribution center, or KDC. This server maintains a database containing all the user keys. It also maintains keys for all the servers as well, for a reason we'll see in a moment.

When a user wants to access an application server, he first sends an authentication request to the KDC. This request contains the name of the user and the name of the desired server, so it can be represented as {user, server}. In response, the KDC generates a random value, which we'll call Ks, and encrypts this value twice: once with the user's key, Kuser, to yield {Ks}Kuser, and once with the server's key, Kserver, to yield {Ks}Kserver. Because the value is random and encrypted, {Ks}Kuser and {Ks}Kserver are safe to send over an open network.

Upon receiving the two ciphertexts, the user sends his request to the application server, along with {Ks}Kserver. Now, both the user and the server have a shared value that only they (and the KDC) can know, and they can use that shared secret to perform authentication. Ostensibly, the shared secret could be used as the actual password in a traditional password-based authentication. More likely, though, it will again be used as a key, called a session key, to encrypt something like a timestamp, or a challenge in a challenge-response scheme with the server.

In Kerberos parlance, {Ks}Kserver is called a ticket, since it is submitted by the user to gain access to the server. The other message, {Ks}Kuser, is sometimes also called a ticket, but more often it's vaguely referred to as the user's credentials (with respect to that particular application server).

The introduction of the KDC removes the need for a separate copy of the user key database for each application server, but there's another security issue. The problem arises because the session key is sometimes used not only to authenticate a user, by encrypting a timestamp, for example, but also to provide confidentiality, by encrypting bulk data. A user might encrypt bank account data, for instance, with the session key.

If an attacker can intercept outgoing messages from the user, he can substitute his own name for that of the application server (provided that he is registered with the KDC). The KDC will then send back two encrypted messages, but instead of sending back {Ks}Kuser and {Ks}Kserver, it sends back {Ks}Kuser and {Ks}Kattacker. Now the attacker can obtain the session key. When the user tries to authenticate to the server, the attacker can intercept that message, too, and masquerade as the server, and read any of the bank account data (or whatever sensitive data) the user might send.

In response to this vulnerability, Roger Needham and Michael Schroeder developed what is today called the Needham-Schroeder protocol. (Needham, by the way, also had a hand in developing the use of hashed passwords.) In Needham-Schroeder, the ticket contains not only the session key, but the user's name at all--that is, {Ks, user}Kserver. Similarly, the user's credentials contain the server's name in addition to the session key--that is, {Ks, server}Kuser. In fact, in the original Needham-Schroeder protocol, the ticket, the session key, and the server's name were all together encrypted with the user's key. But there's no need to encrypt the already-encrypted ticket, so in present-day Needham-Schroeder, the ticket is not encrypted along with the rest of the reply.

This enhancement eliminates the forged request problem, since the user can verify from his credentials that the KDC's reply really is constructed for the intended server. However, there is yet another problem. If a user accesses a server multiple times, the KDC generates a new session key each time. This makes sense because the session key is intended to be temporary--you're not supposed to be able to use it for long periods of time. As its name suggests, it is usually associated with a single application session.

However, in our protocol as it currently stands, there's no way to identify a previously used key. If an attacker substitutes an old KDC reply for a new one, the user and server could be tricked into using an old key. Each re-use means that more data is encrypted using the same key--more data for the attacker to try to crack the re-used session key. If the attacker ever succeeds in cracking the session key, it can then masquerade as the user to the server, without requiring any KDC intervention at all. For that reason, Needham-Schroeder also adds a nonce, a one-time value, into the user's request, which now looks like {user, server, nonce}. That nonce is returned back to the user by the KDC in both encrypted parts, so the user can verify that the KDC reply really is in response to the most recent request.

That satisfies the principal security issues with respect to the user, but it leaves an open one with the server: How does the server know that the user's application request to the server is a fresh one, rather than an old one being replayed by an attacker? After all, the server has no way to know what nonce the legitimate user used most recently. It's true that the attacker shouldn't have the old session key, but he may not need it. If the user's transaction with the server paid electronic money to the attacker, the attacker could simply replay the request to get paid the same amount multiple times.

To avoid this problem, the server can initiate a challenge-response session, in which the server creates its own random nonce, encrypts it with the session key, and sends it to the user. In response, the user must decrypt this second nonce with its own copy of the session key, decrement it, re-encrypt it with the session key, and return it to the server. This adds an additional message exchange, which is generally to be avoided, but it does serve to establish that someone with the session key--generally the legitimate user--is the one sending the application request.

That so far is the basic Needham-Schroeder protocol. Not long after it was originally published, Dorothy Denning and Giovanni Sacco pointed out that if the attacker did manage to obtain the session key, he can reuse it as often as he likes, provided he has the associated ticket. Such an attacker could answer any challenge from the server, so he could mimic the legitimate user in every way. Denning and Sacco proposed that the protocol use timestamps instead of nonces. The timestamp is added to the ticket contents, so the server can identify an out-of-date session key. The user can also encrypt the timestamp with the session key, saving a message exchange. The user sends this encrypted timestamp (and other stuff, which we won't get into right now), which is called the authenticator, to the server to establish that this is a fresh request and not a replay.

This modified version of Needham-Schroeder is the basis of today's Kerberos. Kerberos began life as the authentication mechanism for MIT's Project Athena, then the "next-generation distributed computing" environment. Three versions were developed internally by Steve Miller and Cliff Neuman. Version 4 was the first to be released publicly, in 1989. Version 5 was released in 1993, and is the current Internet standard, although it is being revised and clarified as we speak, and a new RFC or two should be issued for it in the near future. Both are used "in the wild," to at least some extent, although we all recommend Version 5 over Version 4. For the purposes of this essay, however, they both use the same basic principles.

Modern Kerberos adds still a few more things to the protocol. There is a workstation identifier, to make it even harder for an attacker to use stolen tickets from a different machine. There is also a validity period encoded into the credentials and tickets, so that each ticket has a specified lifetime--typically 8 or 10 hours--although this can be modified on request, subject to local policy.

Perhaps the most significant addition is the introduction of the ticket-granting service, or TGS. The reason for the TGS is simple: in the protocol described thus far, each time you want to access a new server, you must submit a new request, and you get a new reply from the KDC. Each reply is encrypted with your user key. That means that you must type in your password each time, or else have it (or the key) cached on your workstation. Either way, you're adding a new vulnerability. What's more, each reply constitutes another sample of ciphertext for an on-the-wire attacker, making it easier for him to deduce your key. Finally, having to type in a password over and over again encourages people to choose shorter and simpler passwords, which is bad.

Consequently, Kerberos adds a new level of indirection. The KDC is split up into two closely-related services, the authentication service, or AS, and the TGS. The AS is the analogue of the original KDC, and maintains the user key database. Instead of requesting tickets for application servers directly, you first request a ticket for the TGS. This ticket has a special name; it's called the ticket-granting ticket, or TGT. (It's also often called the initial ticket, for obvious reasons.) This TGT, like any ordinary ticket, has an associated session key. Only at this point do you submit the request to the TGS for a ticket to the application server. The TGS has the application server keys, and it sends you the ticket you really wanted originally, plus the session key for use with the application server. This reply is encrypted not with your user key, but rather the session key associated with the TGT. That way, you don't have to type in your password or have it cached, and the key being used repeatedly is random and not password-derived. Finally, even the TGT has a limited lifetime of 8 or 10 hours, rather than the weeks or months that your user key is supposed to be good for.

It's sort of like when you visit some workplaces. You show your regular ID to get a guest ID for the workplace. Now, when you want to visit various rooms, instead of showing your regular ID repeatedly, which might make it vulnerable to being lost or stolen, you show your guest ID, which is only valid for a short time anyway. If it's stolen, you can get it invalidated and be issued a new one quickly and easily--something you couldn't do with your regular ID.

The AS and TGS together constitute the new KDC. They are logically distinct, but in practice, they are usually implemented as one program. The requests are differentiated by message type. The messages to and from the AS are tagged as KRB_AS_REQ and KRB_AS_REP; those to and from the TGS are tagged as KRB_TGS_REQ and KRB_TGS_REP; and those to and from the application server (naturally) as KRB_AP_REQ and KRB_AP_REP. That way, when the KDC program receives a request, it knows whether to act as the AS or the TGS.

Kerberos is able to withstand a certain amount of intrusion. If a user's workstation is compromised, about the worst that can happen is that the user's key will be exposed. More often, only session keys and tickets are exposed, and these are only good for a limited time. Application servers are a bit more vulnerable. Since there's no person hiding inside the server to type in the key each time a new request comes in, the key is stored on the server host. But if the KDC--with its database of user and server keys--is compromised, the security of Kerberos is totally obliterated, and a complete re-keying of all users and servers is generally required.

There are other attacks as well. The user keys, as mentioned above, are password-derived. Now, we all know that we're supposed to pick passwords that are hard to guess, but in practice, many of us choose poor passwords. (Of course, I'm not talking about you, since you pick good passwords, don't you?) An attacker can easily apply a dictionary attack on the password, in which he processes a given ciphertext with keys derived from words selected from a short list, and examines the resulting plaintexts to see if they came out right. In order to do that, of course, the attacker needs to have a ciphertext.

But the AS automatically hands one out for the asking. The initial message, the KRB_AS_REQ, is the only completely unsecured message in the protocol. It only contains the user's name, the server's name, and the nonce (plus some other things that don't prevent this attack). Anyone at all can generate that, and get a ciphertext whose content is largely known. (The attacker also gets the server ticket, but the key used to encrypt that is random and will only succumb to a brute-force attack on the keyspace.) He can then run a dictionary attack at his leisure. Such an attacker needs no special network placement.

Today's machines are so fast that a dictionary attack on even a few hundred thousand words is easy. There's enough memory to cache a hundred thousand keys, too, so that you don't even need to create new keys on the fly; just take the list of password-derived keys, and go to town on the KRB_AS_REP.

In response to that, Version 5 introduces preauthentication. As its name suggests, a preauthentication message is an attachment to the KRB_AS_REQ that provides the KDC some assurance that the person making the request is actually the user mentioned in the request. The request proper still looks the same; the preauthentication--or preauth, as it's often called for short--is simply appended to it.

Or, rather, not so simply. Version 4 used a bit-goes-here method of specifying the message format, which is typical of RFCs. However, various things like the big-endian/little-endian conflict created problems with this approach. To address this, Version 4 adopted a receiver-makes-right policy. Whoever is sending a message encodes integers, for example, just as they're represented in the local architecture. In order to identify the endianness of the encoding, the sender also sets a bit in the message, called the B bit. The receiver has the responsibility of either taking the integers as they are, or flipping them around, based on the value of the B bit and the receiver's own endianness.

Around the time that Version 5 was being developed, ASN.1, which stands for Abstract Syntax Notation One, was gaining acceptance in the public-key arena. ASN.1 message format specifications look a bit like BNF grammars, so what you specify is a grammar for your messages. This grammar can then be compiled to produce an encoder and decoder for your specific messages, and it's architecture-independent, so that ASN.1 takes care of problems such as endianness automatically.

It seemed like a good idea at the time. So Kerberos Version 5 used ASN.1 to specify its message formats. In this case, the Kerberos messages are defined as a SEQUENCE of message components (some of which are SEQUENCES in their own right). It's possible in ASN.1 to define optional components, and the preauths are specified as OPTIONAL SEQUENCEs.

Unfortunately, ASN.1 can lead to a lot of bloat, unless you are very familiar and experienced with it, and the Version 5 designers were not. As a result, Kerberos messages ballooned substantially, to the point that fragmentation is a growing concern and Kerberos, which was generally run on UDP in the Version 4 days, is shifting over to TCP. ASN.1 can also make it harder for protocol implementors to get a good idea of the overall structure of the message, because of the hierarchical way in which the messages are defined. A byword in Kerberos circles these days is "Friends don't let friends use ASN.1," but it's pretty much too late for Kerberos. It's stuck with ASN.1 for the foreseeable future.

Thus far, Kerberos can scale all right to a moderate number of users and servers. However, as your network grows, it becomes increasingly ungainly to have a single KDC manage all the keys. It also provides a single point of failure, which is ugly. Therefore, Version 4 added the notion of realms, which are Kerberos administrative domains. Each realm has its own KDC, which maintains keys only for those users and servers in its realm. The KDC is now broken up into three logically distinct entities: the AS and TGS, as before, and the remote TGS, which services requests from other realms. (The remote TGS still uses the TGS message formats, though.)

So long as a user only accesses servers in its own realm, nothing is changed. In order to access servers in other realms, however, an extra step is added. First the user contacts the AS in order to authenticate to the TGS. Then he contacts the TGS to authenticate to the remote TGS. Then (finally!), he contacts the remote TGS to authenticate to the end server. In order to make this all work, the two realms--the user's realm and the server's realm--must share a cross-realm key, which is used for just this one purpose. The ticket given to the user by the TGS is encrypted with this key, and the remote TGS uses it to decrypt the ticket and authenticate the user.

This helps Kerberos scale, but only a little. Suppose you have 100 realms, each with 100 users and servers. Instead of having a single KDC with 10,000 keys, you now need 100 KDCs, each with only 100 ordinary keys, but each one also needs 99 cross-realm keys, if you want complete connectivity. That's almost 5,000 extra keys to manage, throughout the entire system. So Kerberos Version 5 added the possibility of transiting through a sequence of realms, so you only need a connected graph of realms, rather than a complete one. For example, the realms could be connected by a realm or a star network, and in that way drastically reduce the number of cross-realm keys needed.

You may be wondering how Kerberos knows which sequence of realms to go through in order to get to the destination. You can specify the sequence explicitly, but by default, Version 5 uses the realm name hierarchy, and realm names are generally based on domain names. In principle, this allows anyone in any realm to authenticate to a server in any other, but in practice, this would require the establishment of top-level realms and KDCs, which would each have to maintain an unreasonably large number of keys. (Hence the development of a public-key tie-in for Kerberos, but that's a matter for another essay.) So this hierarchy is generally used only within large companies, at best.

One last story about Kerberos. It seems that when Kerberos Version 4 was first released, there was a great desire to export it, because there was a certain amount of Kerberized software (software written to use Kerberos for authentication) that people wanted to use abroad--in particular, for various reasons, in Australia. Such software expects the Kerberos libraries to be there, and fails if it isn't. One problem: Kerberos made heavy use of DES, which was classified as munitions at the time. (This has since been relaxed.) The State department told MIT that they couldn't export the software implementations unless they took out not only all the DES calls, but all the comments about DES calls, too, so that--superficially, at least--you couldn't tell where the DES calls were supposed to go.

John Kohl wrote a program called Piranha to take out all the DES calls and annotations, and Doug Rickard commented, "And we are left with nothing but the bones." (And has been regretting that comment ever since.) As a result, exportable Kerberos without the DES calls was named Bones.

You can't stop a security weenie, though. The Kerberos protocol was publishable, even if the code wasn't, so that any reasonably knowledgeable programmer could figure out where all the DES calls were supposed to go. As you might expect, once Bones landed safely on Australian shores, a fellow named Errol Young added all the encryption back to Bones, creating Encrypted Bones, or E-Bones for short. It's functionally very close (although not quite identical, I hear) to ordinary Kerberos.


[1] It's not really possible to be more precise than this. I once attended a workshop on insider attacks at RAND in Santa Monica, and someone--I forget who--threw out the statistic that 80 percent of all attacks are insider jobs. Even putting aside the objection that no one really knew how to define "insider" precisely, there is also the problem that intrusion detection is (even today) a tricky proposition, and our false negative rate is nowhere near low enough for even one digit of precision. The best we have to rely on is anecdotal evidence, and that only shows us that it's a big problem, but not how big.

Copyright (c) 2002 Brian Tung