File Specification

From GNUpdf

Contents

Overview

File specifications were introduced in PDF 1.1 and allow to refer contents in other files.

Two types of file specifications are supported:

Simple File Specifications
This type of file specification contain the location of a given file in some filesystem. It can take the form of a string whose contents follow a filesystem-independent format, or the form of a dictionary.
Full File Specifications
This type of file specification extends the Simple File Specifications by containing extra information about the specific filesystem type managing the file. It always takes the form of a dictionary.
Types of file specification and its basic objects
Enlarge
Types of file specification and its basic objects

Note that a file specification can reference to both external files or files embedded in the same PDF file that contains the file specification.

File Specification Strings

System-Independent Filenames

Structure of a file specification string
Enlarge
Structure of a file specification string

A file specification string contain a system-independent name for a file. Each name is composed by a list of components separated by slash characters (also known as SOLIDUS characters. The ISO PDF 32000 standard uses this notation). The list of components but the last one conform the path to the file. The last component is the name of the file.

Note that a component may be the empty string, having two or more separators the same semantics as a single separator:

(/a/path/to//a/file)

Note also that any occurrence of a slash inside a component should be escaped with a backslash (or REVERSE SOLIDUS) in order to not be interpreted as a separator. We should use the escape sequence \\ in order to get a literal backslash. So we would write:

(/a/path/with/a/sl\\/ash)

It is allowed to use hexadecimal strings in file specification strings. A hexadecimal string in this context is delimited by the <,> characters. Each pair of hexadecimal digits depict an octect value. So, for example, the following string:

(/a/path/to/f<6f 6f>)

correspond with the filename

/a/path/to/foo

using the ISO-646 (ASCII) CCS.

Absolute and Relative Filenames

File specification strings can be absolute or relative names:

  • If the first component of a file specification string is the empty string (i.e. the first character of the string is a slash separator) then the filename depicted by the string is absolute.
  • If the first component of a file specification string is not the empty string (i.e. the first character of the string is not a slash separator) then the filename depicted by the string is relative.

A PDF consumer application should translate system-independent filenames to system-dependent filenames before to access the files. The details of the translation depend of the specific system on which the application run and are explained in the following sections.

Conversion to Unix filenames

The generation of Unix filenames (including filenames in the GNU Operating System) is straighforward:

  • Separators are translated as-is (components in Unix filenames are separated by slashes).
  • Components are translated as-is.
  • Absolute paths are translated to start from the root directory (/) of the root filesystem.
  • Relative paths are translated to absolute paths assuming that the . directory is the working directory for the pdf consumer application.

So, for example, the file specification

(/etc/passwd)

is translated to

/etc/passwd

On the other hand, the relative file specification

bar/baz.lst

may be translated to

/home/jemarch/docs/bar/baz.lst

assuming that the working directory of the pdf consumer application is /home/jemarch/docs/.

Conversion to MS-Windows and DOS filenames

Conversion to MACOS filenames

The following rules applies when converting system-independent file specification strings to MACOS filenames:

  • Separators are translated to : characters.
  • Components are translated as-is.

File Specification Dictionaries