Unicode Character Support
Unicode represents each character code as a 16-bit unsigned integer in the range 0 - 65535. Within that encoding space, Unicode separates character representation into four consecutive zones:
Representing each character within two-bytes guarantees a uniform presentation. Regardless of the character or language, whether an English A, or a Japanese ideograph, text handling facilities are certain of a single character within every two bytes of data. Such uniformity prevents the need for numerous methods and techniques employed with previous character sets to determine how many bytes character encoding entailed.
Detailed information about the Unicode character set is not presented in this document. Information about the Unicode standard and the numerical representation of its supported characters is available online at the time of this publication at:
Non-Unicode Character Support
As the Unicode character set continues to be adopted on a global basis, developers may still need to support numerous other character sets for importing and exporting KB data. G2 provides several file I/O system procedures, described in the G2 System Procedures Reference Manual. To facilitate conversion to and from Unicode to other character sets, G2 provides the functionality to convert:
Functions for character conversion are presented in Character Set Conversion Functions.
One of the character sets that G2 provides conversion functionality for is the Gensym character set. The Gensym character set was the default character set in previous G2 releases.
This chapter identifies the characters in the Gensym character set and shows how to encode each character in files and in data streams that are composed and manipulated outside of G2.
You can use the Gensym character set to:
- Compose attribute files, especially those that load attributes that must contain symbol and text values
Attribute files are a superseded capability. For more information, see Appendix F, Superseded Practices.
- Compose GFI input files, especially those that load symbol and text values into symbolic variables, text variables, symbolic parameters, and text parameters.
GFI is a superseded capability. For more information, see Appendix F, Superseded Practices.
- Write GSI bridge applications, especially those that send and receive symbol and text values to and from G2
- Write applications outside of G2 that communicate with G2 through a remote procedure call (RPC) interface, especially those that pass and return symbol and text values to and from G2
- Write applications outside of G2 that work with files written by G2's Inspect facility
Tip: With few exceptions, you can use G2's Text Editor to input any character in the Unicode character set. These features are described in Chapter 36, The Text Editor, and in Chapter 39, Natural Language Facilities.
Copyright © 1997 Gensym Corporation, Inc.