CPGuide - File Names

From EDM2
Jump to: navigation, search

Reprint Courtesy of International Business Machines Corporation, © International Business Machines Corporation

Control Program Programming Guide and Reference
  1. Introduction to the Control Program
  2. Control Program Functions
  3. Keyboard Functions
  4. Mouse Functions
  5. Video Functions
  6. Data Types
  7. Errors
  8. Debugging
  9. Kernel Debugger Communications Protocol
  10. Device I/O
  11. Dynamic Linking
  12. Error Management
  13. Exception Management
  14. Extended Attributes
  15. File Management
  16. File Names
  17. File Systems
  18. Generic IOCtl Commands
  19. Memory Management
  20. Message Management
  21. National Language Support
  22. Pipes
  23. Program Execution Control
  24. Queues
  25. Semaphores
  26. Timers
  27. Notices
  28. Glossary

File names are the identifiers used by the file system to uniquely identify files on a disk. All file systems have specific rules for constructing names of file objects. Different file systems can have different rules for naming file objects.

The OS/2 FAT file system supports the DOS naming conventions. The OS/2 High Performance File System (HPFS) supports a superset of the DOS naming conventions, allowing for long file names and characters illegal under DOS. Although different file systems can have different rules for naming file objects, all OS/2 file systems require that full path names consist of directory and file names separated by backslashes (\).

The OS/2 operating system views path names as ASCII strings and does not restrict file systems to the DOS file name format. Compatibility with existing DOS applications requires that all installable file systems support a superset of the 8.3 file name format used in the FAT file system.

The following topics are related to the information in this chapter:

  • File Systems
  • File Management
  • Extended Attributes
  • Device I/O

File-Naming Conventions

File name conventions are the rules used to form file names in a given file system. Although each installable file system (IFS) can have specific rules about how individual components in a directory or file name are formed, all file systems follow the same general conventions for combining components. For example, although the FAT file system requires that file and directory names have the 8.3 file name format, and HPFS supports names of up to 255 characters long, both file systems use the backslash (\) character to separate directory names and the file name when forming a path.

When creating names for directories and files, or when processing names supplied by the user, an application must follow these general rules:

  • Process a path as a NULL-terminated string. An application can determine maximum length for a path by using DosQuerySysInfo.
  • Use any character in the current code page for a name, but do not use a path separator, a character in the range 0 through 31, or any character explicitly prohibited by the file system.
The following characters are reserved by the operating system. Do not use them in directory or file names.
   <   >   :   "   /   \   |
Although a name can contain characters in the extended character set (128 - 255), an application must be able to switch code pages if necessary to access the corresponding file.
  • Compare names without regard to case. Names such as "ABC", "Abc", and "abc" are considered to be the same.
  • Use the backslash (\) or the forward slash (/) to separate components in a path. No other character is accepted as a path separator.
  • Use the dot (.) as a directory component in a path to represent the current directory.
  • Use two dots (..) as a directory component in a path to represent the parent of the current directory.
  • Use a period (.) to separate components in a directory name or file name. Unless explicitly defined by a file system, no restrictions are placed on the number of components in a name.

File Names in the FAT File System

Valid file names in the OS/2 FAT file system have the following form:

[drive:][directory\]filename[extension]

The drive parameter must name an existing drive and can be any letter from A through Z. The drive letter must be followed by a colon (:).

The directory parameter specifies the directory that contains the file's directory entry. The directory name must be followed by a backslash (\) to separate it from the file name. If the specified directory is not the current directory, directory must include the names of all the directories in the path, separated by backslashes. The root directory is specified by using a backslash at the beginning of the name.

For example, if the directory ABC is in the directory SAMPLE, and SAMPLE is in the root directory, the directory specification is:

\SAMPLE\ABC.

A directory name can also have an extension, which is any combination of up to three letters, digits, or special characters, preceded by a period (.).

The filename and extension parameters specify the file.

FAT File-Naming Rules

For file objects managed by the FAT file system, the following rules apply:

  • File names are limited to 8 characters before and three characters after a single dot. This is referred to as the 8.3 file name format.
The 8 characters before the dot are blank-filled. Embedded blanks are significant, trailing blanks and blanks immediately preceding the dot are not significant. Trailing blanks are truncated.
For example, "FILE.A" is really "FILE .A ". "FILE.A" and "FILE .A " are treated as the same file by the operating system and refer to the same file. Also, "FILE.TXT " and "FILE.TXT" are treated as the same file.
Blanks elsewhere in the name are significant-"F I L E.TXT" is not the same as "FILE.TXT".
  • Names are not case sensitive. This means that "FILE.TXT" and "file.txt" refer to the same file. Lowercase and uppercase characters are folded together for name comparison purposes.
  • Names returned by file system functions are in uppercase. This means that if "file.txt" is created, DosFindFirst returns "FILE.TXT".
  • Directory and file names can be any combination of up to eight letters, digits, or the following special characters:
   $   %   '   -   _   @   {   }   ~   `   !   # (   )
File extensions can be any combination of up to three letters, digits, or special characters, preceded by a period.
  • Invalid characters for directory names, file names, and volume labels are:
    • the range 0 - 1Fh
    • and the characters:
   <   >   |   +   =   :   ;   ,   .   "   /   \   [   ]

File Names in the High Performance File System

In HPFS, file names can be up to 255 characters long (one must be a terminating NULL, "\0"). Directory names can also be 255 characters long, but the length of the complete path, including drive, directories, and file name, cannot exceed 260 characters.

Certain characters that are illegal in the FAT file system are legal in HPFS file names:

   +   =   ;   ,   [   ]

Also, blank spaces can be used anywhere in an HPFS file name or directory name, but blank spaces and periods at the end of a file name are ignored. Additionally, the period (.) is a valid file name character and can be used as many times as desired. There is no requirement that HPFS file names have extensions; however, many applications still create and use them.

An HPFS file name can be all uppercase, all lowercase, or mixed case. The case is preserved for directory listings but is ignored in file searches and all other system operations. Therefore, in a given directory, there cannot be more than one file with the same name when the only difference is case.

File-Naming Rules for Installable File Systems

For file objects managed by OS/2 installable file systems, the following rules apply:

  • Each element of a full path name residing on a disk managed by an installable file system can consist of up to 255 characters. File names can be up to 255 characters long (one of the characters must be a terminating NULL, "\0"). Directory names can also be 255 characters long, but the length of the complete path, including drive, directories, and file name, cannot exceed 260 characters. For example, in the path name "c:\XXX...XXX\YYY", "XXX...XXX" can include up to 255 characters. This is referred to as long file names.
  • Names are not case sensitive.
  • File name case as specified at create time is preserved. This means that if the file "file.TXT" is created, DosFindFirst returns "file.TXT". File name case may be modified using DosMove.
  • Blanks immediately preceding a dot are significant. This means that "FILE.TXT" and "FILE .TXT" refer to different files.
  • Trailing blanks are truncated. This means that "FILE.TXT " is the same as "FILE.TXT".
  • Blanks elsewhere in the name are significant. This means that "F I L E.TXT" is not the same as "FILE.TXT".
  • For compatibility reasons, trailing dots on component names are discarded. For Example, "\FILE.TXT...TEXT...\A..B...\C." becomes "\FILE.TXT...TEXT\A..B\C". This processing includes semaphore, queue, pipe, module, shared memory names, and device names.
  • The set of legal characters is expanded to include
   +   =   ;   ,   [   ]
as well as all characters legal for the FAT file system.
  • If an installable file system uses a component separator within a file name, it must be a dot (.). There are no restrictions on the number of components which can be allowed within a file name, for example "My.Programming.Reference.Part.One".

Long File Names

Programs that recognize long file names must indicate this by including the NEWFILES statement in their module definition file. This statement directs the linker to set a bit in the executable file header. It indicates that the module supports long file names. This bit is meaningless in a DOS Session and on versions of the OS/2 operating system prior to Version 1.2. Programs written for OS/2 Version 1.2 (and all later versions) installable file systems should set this bit. Bound programs that have this bit set can see files with long file names in OS/2 mode, but only files with 8.3 file name format in DOS Sessions.

This bit has meaning when attached to program modules, not when attached to DLLs. Whether the program recognizes long file names format is entirely dependent on the value of its NEWFILES bit and the effect of the bit extends into any calls to DLLs. In order to be compatible with all OS/2 file systems, dynamic link libraries must not create internal temporary files or directories that do not comply with 8.3 file naming conventions. In addition, dynamic link libraries cannot return long file names to an application. (The caller might be running on a file system that only supports 8.3 file names and use the returned name to create a file.)

OS/2 applications which do not recognize long file names can run with some restrictions. For these programs, long names (including device names) are filtered according to the following rules:

  • Any name not representable in the 8.3 file name format is not returned from DosFindFirst or DosFindNext. This is because the application's buffers are unlikely to be large enough to handle longer names.
  • Any long file name passed to the file system functions listed below are rejected in exactly the same way as under previous versions of the OS/2 operating system. It is not acceptable to create and manipulate a name that you cannot find.
    • DosOpen
    • DosDelete
    • DosMove
    • DosQueryPathInfo
    • DosSetPathInfo
    • DosCreateDir
    • DosDeleteDir
    • DosFindFirst
    • DosFindNext
    • DosQueryFSAttach
    • DosFSAttach
    • DosCopy
    • DosSearchPath
  • Long file names can be passed to DosSetCurrentDir and DosQueryCurrrentDir so that all programs can use all directories.
  • Long names used with non-file system functions (for example, DosCreateSem) are not filtered.

For files located on file devices managed by the OS/2 FAT file system, long file names are handled differently in OS/2 mode than in DOS mode. In OS/2 mode, the long file name is considered an error. In DOS mode, the name is truncated and is not an error. The DOS mode treatment of file name formats provides compatibility with the PC-DOS environment for applications originally written for PC-DOS. However, if you are writing a family application to run under both the OS/2 operating system and the PC-DOS environment, your application must allow for this difference in operating environments.

Because long file names can be input to applications through program command lines, dialog boxes, or function calls, applications must provide their users with rules for how to enter file names. File Names in User Input provides some general guidelines in this matter, that are applicable to both long file names and 8.3 file names.

Moving Files with Long Names

The Workplace Shell supports copying files with long file names to media that is managed by a non-installable file system (IFS) and for returning these files to IFS media with the long name intact.

When a file with a long name is copied to media that does not support long file names, the Workplace Shell stores the file's long name in the .LONGNAME extended attribute. When the file is copied back to a disk that does support long file names, the Workplace Shell restores the long name from the extended attribute.

If the new media does not support extended attributes, files that have long names cannot be moved to the media without having their names modified or truncated.

Note
The behavior described above only applies to the Workplace Shell The command processors, CMD.EXE and COMMAND.COM, do not automatically save the long file name; they require the user to enter a new file name that is legal on the new media. The DosCopy command also does not save the long file name automatically; the programmer must provide the target file name to DosCopy and the target file name must be a legal file name for the target media.
If you choose to store and restore the file's long name, you must do it yourself in the manner described above.

Metacharacters in File Names

Metacharacters are characters that can be used to represent placeholders in a file name. The asterisk (*) and the question mark (?) are the two metacharacters.

The asterisk matches one or more characters, including blanks.

The question mark matches exactly one character, unless that character is a period. To match a period, the original name must contain a period. Metacharacters are illegal in all but the last component of a path.

Metacharacters are also referred to as global file name characters, or as wildcard characters.

An application that allows more than one file name on its command line, can accept metacharacters to provide users with a shortcut for entering a long list of names. For instance, metacharacters can be used to reference a set of files with a common base name; to reference all files with an extension of EXE, the user would enter:

*.exe

Although a name that contains metacharacters is not a complete file name, an application can use functions, such as DosFindFirst and DosEditName, to expand the name (replace the metacharacters) and create one or more valid file names.

Metacharacters have two sets of semantics:

  • As search metacharacters, which are used to select the files that are returned to the user when the user searches the disk for a file.
  • As edit metacharacters, which are used to construct a new file name, given a source name and a target name specification.

Both asterisks and question marks, therefore, have two sets of rules, one for searching for file names and one for editing file names.

Search metacharacters are used in commands that search for files or groups of files, like DIR:

dir *.exe

An application can expand a name with metacharacters to a list of file names by using DosFindFirst and DosFindNext. These functions take a file name template (a name with metacharacters) and return the names of files on the disk that match the pattern in the template.

Edit metacharacters are used in commands that can change the names of files; for example, in a global copy command:

copy *.txt *.old

An application can create a new file name from an existing name by using the DosEditName function. This function takes a template (a name with metacharacters) and expands it, using characters from an existing name. An asterisk in the template directs the function to copy all characters in the existing name until it locates a character that matches the character following the asterisk. A question mark directs the function to copy one character, unless that character is a period. The period in the template directs the function to look for and move to the next period in the existing name, skipping any characters between the current position and the period.

Searching for Files Using Metacharacters

An asterisk (*) matches 0 or more characters, any character, including blank. It does not cross NULL or \, which means it only matches a file name, not an entire path.

A question mark (?) matches 1 character, unless what it would match is a period (.) or the terminating NULL, in which case it matches 0 characters. It also does not cross the backslash character (\).

Any character, other than asterisks and question marks, matches itself, including a period.

Searching is case-insensitive. For example, "FILE.TXT" references the same file named "file.txt".

For compatibility reasons, any file name that does not have a dot in it gets an implicit one automatically appended to the end during searching operations. This means that searching for "FILE." would return "FILE".

Some file system functions accept file object name specifications using metacharacters.

Editing File Names Using Metacharacters

Metacharacters in a source name simply match files and behave just like any other search metacharacter.

Metacharacters in a target name are copy-edit commands and work as follows:

  • A question mark (?) copies one character unless the character it would copy is a period (.), in which case it copies 0 characters. It also copies 0 characters if it is at the end of the source string.
  • An asterisk (*) copies characters from the source name to the target name until it finds a source character that matches the character following it in the target.
  • A period (.) in the target name causes the source pointer to match the corresponding "." in the target. They count from the left.

Editing is case-insensitive. If a case conflict between the source and editing string arises, the case in the editing string is used, thus:

copy file.txt *E.tmp

results in file.txt being copied as filE.tmp.

DosEditName provides applications with the ability to transform a file object name into another name, using an editing string that contains global characters.

Transforming File Names Using Metacharacters

File system functions that an application uses to copy, rename or move file objects do not support the use of global characters. For example, a user can perform a global copy of all files with the extension .EXE by entering the following on the command line:

copy *.exe

An application, however, cannot perform a similar global copy operation by making a single call to DosCopy or DosMove. These functions operate on a single, specific file object.

DosEditName, however, provides applications with the ability to transform an element of a full path name into another name, using an editing string that contains global characters. For example, for an application to copy all files with an extension of .SRC to files with an extension of .SAM, the application would:

  1. Search for all files with the .SRC extension by using DosFindFirst and DosFindNext,
  2. Transform the file names by using DosEditName with an editing string of "*.SAM",
  3. Copy the files with the new extension with DosCopy.

File Names in User Input

Users often supply file names as part of an application's command line or in response to a prompt from the application. Traditionally, users have been able to supply more than one file name by separating the names with certain characters, such as a blank space. In some file systems, however, traditional separators are valid file name characters. This means additional conventions are required to ensure that an application processes all characters in a name.

When an application processes arguments (including file names) from its command line, the operating system treats the double quotation mark (") and the caret (^) as quotation characters. All characters between the opening and closing double quotation marks are processed as a single argument. The caret is used to quote characters that would otherwise have some special property. The character immediately following the caret is treated as a normal character; any special characteristics that the character has are to be ignored. For example, the greater-than symbol (>) normally causes a program's output to be redirected to a file or device. Typing "^>" causes the ">" to be included in the command line passed to the application. In both cases, the operating system discards the quotation characters and does not treat them as part of the final argument.

When a Presentation Manager*(PM) application processes two or more file names from a dialog box or other prompt, it expects the user to enter each file name on a new line. Therefore, a PM application would use a multiple-line entry field to prompt for multiple file names. This often makes the use of quotation characters unnecessary.

When an application is started, the operating system constructs a command line for the application. If the command line includes file names, the operating system places a space character between names and marks the end of the list with two NULL characters. Applications that start other applications by using DosExecPgm can also pass arguments by using this convention or by using quotation characters. In practice, most applications receive a command line as a single, NULL-terminated string. Therefore, applications that use DosExecPgm should prepare command lines as a single string, and enclose any file names in quotation marks.

Device Names

Naming conventions for character devices are similar to those for naming files. The OS/2 operating system has reserved certain names for character devices supported by the base device drivers. These device names are listed below:

CLOCK$ Clock
COM1-COM4 First through fourth serial ports
CON Console keyboard and screen
KBD$ Keyboard
LPT1 First parallel printer
LPT2 Second parallel printer
LPT3 Third parallel printer
MOUSE$ Mouse
NUL Nonexistent (dummy) device
POINTER$ Pointer draw device (mouse screen support)
PRN The default printer, usually LPT1
SCREEN$ Screen

These names can be used with DosOpen to open the corresponding devices. Reserved device names take precedence over file names; DosOpen checks for a device name before checking for a file name. Do not use a file name which is the same as a reserved device name; the file will never be opened, because the command will open the device instead.

COM1 through COM4 are reserved device names only when the ASYNC (RS-232C) device driver is loaded. The same is true for POINTER$ and MOUSE$, which are reserved only when a mouse device driver is loaded.

An application can call DosQueryFHState to verify that a file or device has been opened. See Determining and Setting the State of a File or Device Handle for more information on getting the state of a file handle.