Difference between revisions of "CPGuide - National Language Support"

From EDM2
Jump to: navigation, search
(Created page with "{{IBM-Reprint}} {{CPGuide}} Many applications need to be independent of a particular language. Rather than being hard-coded in English, they want to support, for example, an E...")
(No difference)

Revision as of 02:50, 27 March 2020

Reprint Courtesy of International Business Machines Corporation, © International Business Machines Corporation

Control Program Programming Guide and Reference
  1. Introduction to the Control Program
  2. Control Program Functions
  3. Keyboard Functions
  4. Mouse Functions
  5. Video Functions
  6. Data Types
  7. Errors
  8. Debugging
  9. Kernel Debugger Communications Protocol
  10. Device I/O
  11. Dynamic Linking
  12. Error Management
  13. Exception Management
  14. Extended Attributes
  15. File Management
  16. File Names
  17. File Systems
  18. Generic IOCtl Commands
  19. Memory Management
  20. Message Management
  21. National Language Support
  22. Pipes
  23. Program Execution Control
  24. Queues
  25. Semaphores
  26. Timers

Many applications need to be independent of a particular language. Rather than being hard-coded in English, they want to support, for example, an English version of the application, and a French version, and a German version; preferably without having to change the program code for each version. Meeting this requirement is simplified through the use of such resources as string tables, menu templates, dialog templates, accelerator tables, and through the use of code pages.

This chapter describes the functions an application uses to be NLS enabled, or language independent.

The following topic is related to the information in this chapter:

  • Message management

About National Language Support

The support of national languages by applications requires the following considerations:

  • Displayed text must be translated into the appropriate language.
  • Symbols or icons might not convey the same meaning in all countries. Alternative designs for different countries might be necessary.
  • Translation changes the length of text strings.
  • Different languages often have different text characters.

The use of national language resource files can help with the first three items, and the ability of the application to receive input and display output in any ASCII code page can help with the last item.

The use of ASCII code page 850 avoids many of the problems in this area, since it contains most of the characters required for Latin-1 languages, which include much of Western Europe and North and South America. However, older programs use code page 437 for U.S. English, and code pages 860, 863, and 865 for various languages. The code page applies to both input and output data.

Code page 850 was used for translating Presentation Manager text. Use code page 850 whenever possible for all Presentation Manager applications that might require translation.

National Language Resource Files

When creating an application, define national language dependencies in resources that are held in resource files separate from the program code. That is:

  • Keep pure text strings in string tables.
  • Keep menus in menu templates.
  • Keep dialog boxes in dialog templates.
  • Keep accelerators in accelerator tables.

The language displayed by the application can then be changed by translating the resources, in most cases without changing the application.

However, when translating from one language to another, the length of a text string can change substantially. For example, when translating from English to German, the length of a text string can double in length.

The following table furnishes a general idea of the amount of expansion that can be expected during translation.

Translation Expansion
│For English Phrases           │Translation Expansion Factors │
│Up to 10 characters           │101 - 200%                    │
│11 - 20 characters            │81 - 100%                     │
│21 - 30 characters            │61 - 80%                      │
│31 - 50 characters            │41 - 60%                      │
│51 - 70 characters            │31 - 40%                      │
│Over 70 characters            │30%                           │

When designing your dialog boxes and text string messages, add white space to allow for the expansion that will occur when the text is translated. You might have to adapt the application program to allow for the change in the length of text strings after they are translated. For example, a change in the length of a text string can cause it to become misaligned with other displayed objects.

You can also use the Dialog Box Editor to adjust for misalignments, or to change the size of the dialog box. This would enable you to leave your application program unchanged.

Text strings explicitly displayed by the application program are more of a problem. You will have to include program code that can handle text strings of varying length and format them at runtime according to their size.

Language-Specific Versions of NLS-Enabled Applications

There are two methods of creating a specific national language version of a program designed to handle more than one national language. The choice of the method depends on the amount of available disk space and whether the user wants to change between different languages once the program is installed. The two methods are:

  • Statically link the resources to the application's .EXE files. The executable files are then language-specific and cannot be changed to another national language. The specific .EXE files are then sent to the user.
  • Place the resources into a language-specific, dynamic link library. Designate one library file for each national language. Selecting a particular library file for use with the application gives the desired version of the program. Using this method, all national languages can be shipped with the product; selection of the national language occurs during installation (for example, by naming a specific .DLL file). It is possible to change the national language setting while the program is operating.

About Code Page Management

A code page is a table that defines how the characters in a language or group of languages are encoded. A specific value is given to each character in the code page. For example, in code page 850 the letter "ñ" (lowercase) is encoded as hex A4 (decimal 164), and the letter "Ñ" (uppercase) is encoded as hex A5 (decimal 165).

Code page management enables a user to select a code page for keyboard input, and screen and printer output before starting an application, a system command, or a utility program in the OS/2 multitasking environment.

This means that a user in a particular country, such as England (code page 850), Norway (code page 865), or a language region such as Canadian French (code page 863) can use a code page that defines an ASCII-based character set containing characters used by that particular country or language.

Installable code page files include keyboard translate tables, display character sets, printer character sets, and country/language information for each code page supported.

Of particular interest are two code pages:

  • Code Page 850
  • Code Page 437
Code Page 850 (CP850)

Code Page 850 is also called the Latin-1, multilingual code page. This code page supports the alphabetic characters of the Latin-1-based languages. It contains characters required by 13 languages used in approximately 40 countries.

CP850 also provides the flexibility to develop new applications based on non-Latin-based or special industry-based code pages.

Code Page 850 supports countries using the following languages:

│Belgian French                │Canadian French               │
│Danish                        │Dutch                         │
│Finnish                       │Flemish                       │
│French                        │German                        │
│Italian                       │Norwegian                     │
│Portuguese                    │Spanish                       │
│LAD Spanish                   │Swedish                       │
│Swiss French                  │Swiss German                  │
│U.K. English                  │U.S. English                  │
Code Page 437 (CP437)

Code Page 437 is the standard personal computer code page.

The lower 128 characters are based on the 7-bit ASCII code. The upper 128 characters contain characters from several European languages (including part of the Greek alphabet) and various graphic characters. However, some of the accented characters, such as those used in the Nordic countries, are not represented. The missing characters are available in other code pages (code page 850 will usually contain the desired characters).

Some of the 256 symbols that can be displayed are printer control characters, and are not printed.

ASCII and EBCDIC Code Page Support

The two leading character-coding systems are ASCII and EBCDIC. Presentation Manager applications can use an EBCDIC code page instead of an ASCII code page. Code pages based on both systems are supported by OS/2.

Any code page that either is defined in the CONFIG.SYS file, or is one of the EBCDIC code pages supported, can be selected.

Code Page Preparation

During system initialization, the code pages specified in the CODEPAGE statement are prepared to enable run-time code page switching of the display, the keyboard, the printer, and the country information. The display, keyboard, and printer must be defined in a DEVINFO statement in order to be prepared. Country information is prepared for the system country code specified in the COUNTRY statement.

If a resource cannot be prepared for the selected code page during system initialization, it is prepared for a default code page. The following are the defaults:

  • A keyboard layout defaults to the code page of the translate table designated as the default layout in the KEYBOARD.DCP file. The default layout is based on the national code page of its associated country. You must explicitly specify KEYBOARD.DCP in the DEVINFO statement for the keyboard in CONFIG.SYS.
  • The display defaults to the code page of ROM_0 for the device.
(ROM_0 means a device default code page that is the device native code page or the lowest addressed ROM code page.)
  • The printer defaults to the code page of ROM_0 for the device.
(ROM_0 means a device default code page that is the device native code page or the lowest addressed ROM code page.)
  • The country information defaults to the code page of the first entry found in the COUNTRY.SYS file for the country code. Each entry is the same information for a given country code, but is encoded in a different code page. The first entry is based on the preferred country code page.

If country information cannot be prepared at system initialization because it is not found in the COUNTRY.SYS file, for a code page selected with the CODEPAGE statement, then it is prepared (maintained for run-time code page switching in memory) in the default code page. Similarly, a keyboard layout is prepared in its default code page if it cannot be prepared in the selected code page, because it is not found in the KEYBOARD.DCP file.

COUNTRY.SYS contains one default entry per country code, and KEYBOARD.DCP contains one default entry per keyboard layout based on these assignments.

Code Page Functions

At the system level, OS/2 switches the code pages of supported displays and printers to agree with the code page of the process sending the output. At the application level, OS/2 functions enable a process to control code page assignments.

Using Code Pages

OS/2 provides applications with several functions to obtain information about and manipulate code pages. These functions enable applications to determine and set the current code page.

OS/2 code page management functions enable applications to read keyboard input and write display and printer output for multiple processes using ASCII-based data encoded in different code pages.

The system switches to the required code page, for a code-page-supported device, before input or output.

In the example code fragments that follow, error checking was left out to conserve space. Applications should always check the return code that the functions return. Control Program functions return an APIRET value. A return code of 0 indicates success. If a non-zero value is returned, an error occurred.

Querying Code Page Support and the Current Code Page

DosQueryCp is used to determine the code page of the current process and the prepared system code pages. The following code fragment shows how to get the current code page, and then up to three other prepared pages:

    #define INCL_DOSNLS   /* National Language Support values */
    #include <os2.h>

    ULONG  ulCpList[8];
    ULONG  ulCpSize;
    APIRET ulrc;    /* Return code */

    ulrc = DosQueryCp(sizeof(ulCpList),    /* Length of list          */
                      ulCpList,            /* List                    */
                      &ulCpSize);          /* Length of returned list */

The required code page is the current code page of the process at the time it opens a device, or a specific code page selected by the process with a set-code-page function. A character set can also be specified for some devices, for example, for some printers.

The country functions retrieve country- and language-dependent information in the current code page of the calling process, or in a code page selected by the process.

Setting the Code Page for Text Characters

Each process has a code page tag maintained by OS/2. A code page tag is the identifier of the current code page for the process.

A child process inherits the code page tag of its parent. The default code page for the first process in a session is the same as the session code page. The default code page for a new session is the primary code page specified in the CODEPAGE configuration statement.

To change the code page tag of a process, call DosSetProcessCp. This will not change the process code page tag of its parent or any child process.

Obtaining the Case Map String

DosMapCase performs case mapping on a string of binary values that represent ASCII characters.

The case map that is used is the one in the country file that corresponds to the system country code or selected country code, and to the process code page or selected code page. The default name of the country file is COUNTRY.SYS.

Obtaining the DBCS Environment Vector

DosQueryDBCSEnv obtains a double-byte character set (DBCS) environment vector that resides in the country file. The default name of the country file is COUNTRY.SYS.

The vector corresponds to the system country code or selected country code, and to the process code page or selected code page.

The following code fragment shows how to use DosQueryDBCSEnv:

    #define INCL_DOSNLS   /* National Language Support values */
    #include <os2.h>
    #include <stdio.h>

    ULONG         ulLength;             /* Length of data area provided         */
    COUNTRYCODE   ccStructure;          /* Input data structure                 */
    UCHAR         ucMemoryBuffer[12];   /* DBCS environmental vector (returned) */
    APIRET        ulrc;                 /* Return code                          */

    ulLength = 12;                      /* A length of 12 bytes is sufficient   */
                                        /* to contain the DBCS data returned    */

    ccStructure.country = 0;            /* Use the default system country code  */

    ccStructure.codepage = 0;           /* Return DBSC information for the      */
                                        /* caller's current process code page   */

    ulrc = DosQueryDBCSEnv(ulLength,

    if (ulrc != 0) {
        printf("DosQueryDBCSEnv error: return code = %ld",

On successful return, the buffer MemoryBuffer will contain the country dependent information for the DBCS environmental vector.

Instead of the single-byte character set (SBCS) representation used for Latin text, some Asian countries use code pages that consist of double-byte character set characters, in which each character is represented by a two-byte code. The DBCS code pages enable single-byte data, double-byte data, or mixed (single-byte and double-byte) data.

Obtaining Formatting Information

DosQueryCtryInfo obtains country dependent formatting information that resides in the country file. The default name of the country file is COUNTRY.SYS.

The information corresponds to the system country code or selected country code, and to the process code page or selected code page.

Obtaining Collating Information for SORT

DosQueryCollate obtains a collating sequence table (for characters 00H through FFH) from the country file. The default name of the country file is COUNTRY.SYS. The SORT utility program uses this table to sort text according to the collating sequence.

The collating table returned corresponds to the system country code or selected country code, and to the process code page or selected code page.