The File Name Space: Directories and Path Names

Next: Logical Volumes Up: Concepts and Facilities Previous: Files, Open-File Objects, and

The File Name Space: Directories and Path Names

The file name space seen by a client of OS is a conglomeration of the local file name spaces of a set of machines. Roughly speaking, mapping a name to a file proceeds by using part of the name to determine a machine and then using the rest of the name to determine an individual file on that machine. The actual semantics of names is fairly complicated, but was designed to be able to handle several important cases:

1.: Naming a specific file located on a specific machine.
2.: Naming a replicated (read-only) file located on any of a set of machines.
3.: Naming a generic file relative to the real user name of the process.

The name of a file or related entity in the name space is called a path name, as given by the nonterminal <pathname> in the following BNF grammar:

<pathname> ::= <machine> <abspath> | <abspath> | <relpath>
<machine>  ::= # <element> | <machine> : <element>
<element>  ::= $u | <mname>
<mname>    ::= <one or more characters, excluding # and :>
<abspath>  ::= <slash> <relpath>
<relpath>  ::= <empty> | <relpath1> | <relpath1> <slash>
<relpath1> ::= <pname> | <relpath1> <slash> <pname>
<slash>    ::= / | <slash> /
<pname>    ::= <one or more characters, excluding / and null>

First we'll look at how a path name (<abspath> or <relpath>) is looked up in the local name space, then we'll look at the extension to remote files (<machine>).

The file name space local to each machine is defined by a hierarchy of directories. A directory is a set of entries each consisting of a <pname> and a reference to another directory, a file (regular or device), or a symbolic link. There is a distinguished root directory for each machine; the path name of the root directory itself is a single slash character: /. An <abspath>, or absolute path name, is translated by looking up each <pname> in sequence from left to right, starting at the root directory.

Since the name space is hierarchical, it makes sense to consider any directory as the root of a smaller hierarchy and to look up a path name relative to that directory. A <relpath>, or relative path name, is translated by looking up each <pname> in turn, starting at a specified base directory. Every procedure in OS that accepts a path name parameter also accepts an optional directory handle parameter; if the path name is a <relpath>, it is looked up relative to the directory specified by that handle. (If the path name is a <relpath>, but the directory handle parameter is defaulted, then the working directory of the process is used--see Section 2.8, page ).

Every directory contains an entry .. (dot-dot) referring to its parent directory and an entry . (dot) referring to itself. For references to local files, the root directory is its own parent, so the two path names /../ and / mean the same thing. (The entry .. in the root of a remote file system works differently, as described later in this section.)

The directory-manipulation procedures maintain the invariant that there is a unique <abspath> (not containing . or ..) leading to each directory or symbolic link. This invariant does not apply to files: the HardLink procedure (see page ) creates an additional path name referring to a given file.

Hard links can't be used for directories, and, as explained in Section 2.7, page , they can't span logical volumes. To alleviate these limitations, an additional form of aliasing exists: the symbolic link. A symbolic link is an entity that appears in the name space and whose value is the character string representing another path name. During the translation of a path name, when a symbolic link is encountered as the value of a <pname>, the value of the symbolic link is normally substituted for the path name up to that point and translation continues. The SymLink procedure (see page ) creates a symbolic link.

Now it is time to look at how remote path names are translated. A path name beginning with <machine> (always flagged by the # character) means that the following <abspath> is to be interpreted relative to the root directory of the specified machine. Such a path name could be passed directly to an OS procedure, or could arise as the value of a symbolic link encountered in the translation of a local path name.

The translation of a <machine> in a path name to an actual machine (technically, an instance of a remote file service) proceeds as follows:

1.: An <element> of the form `$u' is replaced by the real user name of the process requesting the translation.
2.: If the <machine> consists of more than one <mname>, it is replaced with the value found by looking up the sequence of <mname>s as a path in the Interim Name Service tree (see ns(8) and the NS interface); the result should be a single NS.Label conforming to the syntax of <mname>.
3.: The remaining <mname> is looked up in the Interim Name Service; it is expected to have an attribute `type' whose value is `instance', `host', or `nameset'. In either of the first two cases, the <mname> is expected to have an attribute `address' that determines an instance of a remote file service. If the `type' is `nameset', then the <mname> is expected to have an attribute `members' whose value is a list of <mname>s and an attribute `next' whose value is a single <mname>. This step is repeated on each element in the `members' list in random order; if none lead to an available server, this step is repeated using the value of the `next' attribute.

If the last <mname> encountered when translating a path name is a `nameset', then the path name is normally considered to refer to a read-only server set, causing operations that would involve modifying the name space or the data or attributes of a file to be disallowed (and to raise ProtectionViolationEC). This is because the main purpose of server sets is to increase the availability of immutable files. There are several qualifications to this rule:

Whether a given server set is considered read-only is actually determined by the value of the additional attribute `readOnly' of an <mname> whose `type' is `nameset'. This attribute should have a value of `true' or `false'. Currently all server sets at SRC are read-only.
Once OS.OpenDir is used (see page ) to open a directory using a path name involving a read-only server set, then it is permissible to perform file system modifications using that directory handle. This is because the directory handle determines a specific machine.
Currently, modifications are disallowed if any server set (not just the last) encountered during a path name translation is read-only. This is a bug.

To facilitate the illusion of a single tree spanning several machines, the parent of the root of the remote file system on a machine, say m, appears to be the local directory /remote/m if it exists, or the root itself otherwise. By convention, each subdirectory of /remote, such as m, contains a single entry named r that is a symbolic link containing #m/. The prefix /remote/ is actually a configuration parameter (see Appendix A.6, page ).

Here are some examples corresponding to the concepts enumerated at the beginning of this section:

1.: A specific file on a specific machine: #jumbo/etc/passwd
2.: A replicated (read-only) file: #server/etc/passwd
3.: A generic file: #$u:homeServer:/user/spool/mail

Only regular files can be accessed remotely; devices can not.

Ultrix note: Access via the OS interface to remote files is not implemented. (This functionality is available, however, through the RFS/RFSClient interface.)

Next: Logical Volumes Up: Concepts and Facilities Previous: Files, Open-File Objects, and

Paul McJones
8/28/1997