Code Examples πŸ”—

This section contains code examples for Charisma.

Charisma only has one header file named charisma.h which you must include to retrieve its definitions. The inclusion of this header and the main function are omitted from the examples for brevity.

Decoding Characters πŸ”—

This example decodes the Unicode scalar values of a null terminated UTF-8 encoded string. The string is declared with the u8 string literal syntax introduced in C11. Because the string is null terminated, a -1 is passed as the second parameter, otherwise this must be the number of code units in the string.

If the decoder detects malformed characters, then it will return the character U+FFFD and can continue decoding.

const uint8_t string[] = u8"I πŸ•΅οΈ."; // I spy
int32_t index = 0;
for (;;)
{
    uchar cp = 0x0;
    int32_t r = utf8_decode(string, -1, &index, &cp);
    if (r == 0)
    {
        break; // end of string
    }
    else if (r < 0)
    {
        // malformed character; returned as U+FFFD
    }
    printf("U+%04X\n", cp);
}

UTF-16 and UTF-32 are processed identically, except you’d use the utf16_decode and utf32_decode functions instead.

Encoding Scalar Values πŸ”—

Unicode scalar values can be encoded into any character encoding form. This example encodes HAMBURGER πŸ” (U+1F354) as UTF-8.

The minimum size of the destination buffer depends on which encoding form is being targeted. Here, the output buffer size is 4 because that’s the longest code unit sequence for a Unicode scalar value in the UTF-8 encoding form. For UTF-16, it must be at least 2 and for UTF-32, it must be at least 1.

uint8_t dest[4] = {0};
int32_t dest_len = utf8_encode(U'πŸ”', dest);

UTF-16 and UTF-32 characters are encoded identically, except you’d use the utf16_encode and utf32_encode functions instead.