Type

unichar

Code point.

Since v1.0
typedef uint32_t unichar

Discussion πŸ”—

This integer represents a Unicode code point or Unicode scalar value depending on how it’s used. Unicode code points are integers in the inclusive range [0,10FFFF]. Unicode scalar values are code points excluding surrogate characters.

The integer type used as the underlying storage for this type is configurable. This helps simplify integration with existing applications which may already have defined their own Unicode character type. The type chosen can be signed or unsigned, but it must be large enough to accommodate the entire Unicode character repertoire. As of Unicode 16.0, that means an integer with at least 21 bits of storage.

The snippet below demonstrates how to redefine unichar as a signed 64-bit integer. Using a 64-bit integer is wasteful, as you only need 21-bits, and was chosen for example purposes only.

{
    "characterStorage": "int64_t",
}