Directories and Security

Directories

Directories are collections of files. Their primary function is to impose a naming system on files and organize them relative to each other. Directories provide a mechanism for collecting the names of files together so that a user and find the file(s) they're interested in. They are an important piece of metadata.

Flat Directories

The simplest kind of directory is a simple flat list of filenames and the information needed to find the respective file contents. These are rarely used today, although they still do exist. The Palm Pilot has a flat file name space. IBM MVS, which will never completely die, has a flat file name space.

Flat file systems are not often used because they provide no inherent organization to the files and are difficult to make efficient for large collections of files.

Name space collisions have to be avoided in a flat space, so generally some naming convention is followed to prevent them. A common one is to prepend filenames with the user's name. For example, a collection of my files might have names like:


     
     DISK1.FABER.HOME.PROFILE
     DISK1.FABER.PROJECT1.GRADES
     
     
Problems with this include enforcing the conventions (what if other users choose $ to separate parts of the file name?) and the inefficiency of holding all the system's files in one big list. Consider scanning the system directory to print all the files that I own. Either the whole table will have to be searched, or it will have to be kept sorted. Keeping a large list sorted adds overhead, and scanning a large table linearly is not efficient.

Also updating such a centralized structure will require synchronization - all file creations and deletions will have to enforce exclusion on the table, or we have to introduce a very fine-grained locking mechanism.

Basically, a single directory doesn't scale easily to many users, both in terms of technical operation and user behavior.

Hierarchical Directories

A natural organization of files is into a hierarchy. That is files can be seen as being in related classes and each class has a directory.

figure
This is implemented as a tree of directories where each entry in the directory describes either a file or another directory. If the hierarchy is chosen carefully, the result is many small directories. Scanning each one is a reasonable amount of work, and synchronization is maintained for each directory. Because separate tasks can be confined to separate directories contention can be made rare.
figure

Besides the efficiency issues, hierarchies are a natural way to organize many systems of data. The desktop metaphor of files in folders and cabinets underscores this (although the hierarchy afforded by hierarchical file systems is richer because there can be nearly arbitrary levels of nesting).

Given that we want to impose such a structure on our file names, we have to describe a syntax to find a file in the tree. This is done by giving a path through the directory tree. The strings that represent such paths are called pathnames. A character is chosen as the path separator. A path is then the list of directories traversed in order to reach the file. Using / as a separator, a pathname for the file labelled file in the above diagram is /etc/ast/fn.

It's inconvenient to name files with their entire pathname all the time, after all we put related files in the same directory so they'd be close together, and specifying the long pathname hides that. One solution is to add the concept of a current directory to the system and allow paths to be specified relative to it. Paths that begin with the path separator are absolute pathnames; paths that do not are relative and have the current directory prepended.

To facilitate relative naming, many filesystems have special names that refer to the current directory and the parent directory. These are often . and .. , respectively.1 An example relative pathname is ../test/halt, which means the file test in the directory test which is a subdirectory of the current directory.

Links

Directories impose a naming structure on files, and in some systems offer the opportunity to give a file multiple names. If the information about a file (it's attributes and OS data) are not stored directly in a directory entry, a file may be pointed to by entries in several directories. For example the file in the directory tree above can be named as /etc/ast/fn or /etc/jim/f2. Such multiple naming is called linking.

There are 2 major forms of links: hard and soft. A hard link is a direct link from the directory entry to the internal file data. All hard links are equivalent, and in file systems that support them, a file cannot be deleted without deleting all its hard links. In general hard links are restricted to parts of the file system that share internal information. To preserve the tree structure of hierarchical file systems, they generally can't link to directories.

Soft links are a path translation (often also called a symbolic link ). They are a pathname that points to the file (or directory) on which to operate. These paths can be absolute or relative. Because they are a visible pathname translation, they are often allowed to point to directories because programs that rely on the hierarchical nature of the file system (like system utilities) can detect and ignore them. Because they are a translation, soft links can access any parts of the file system they can address, but because they are not linked closely with the internal structure of the file system, they may not be updated when a file is deleted or moved. It's possible that a symbolic link can point to a file that no longer exists. This is called a dangling pointer problem by analogy with the same problem involving freed memory in a program.

Directory Operations

Like files, directories have well defined operations:

Create: Allocate space for a new directory and create the special directories in it.

Delete: Remove a directory. Most OSes require the directory to be empty.

Open: Analogous to file open.

Read: Get the information about one or more files.

Write: Change the information about one or more files.

Metadata manipulation: Change the permissions or some other field associated with this directory.

Link: Add a link to an existing file.

Unlink: Remove a link to an existing file (if this is the last link and the file is closed, this usually implies removal of the file).

Rename: Really covered by write, as is file renaming

Other Directory Systems

Although hierarchical systems are by far the most common, there are some other interesting ways to think about file naming:

Naming Systems

The mapping from pathname to filename is just one example of a naming system. A naming system maps a string to a resource (or maybe just to another string). Being able to name a resource is the first step in being able to manipulate it. Some other interesting naming systems are:

Most successful systems have a significant naming component. The decision of what elements in a system to name, and how to name them is significant.


Converted from groff by Ted Faber
Please mail me any problems or comments.