Function

utf8_decode

Decode UTF-8.

Since v1.0
int32_t utf8_decode(
const uint8_t *text, int32_t length, int32_t *index, uchar *c)

Parameters 🔗

text in

UTF-8 encoded string.

length in

Length of text in code units or -1 if null terminated.

index inout

Code unit of the character to decode on input; code unit of the next character on output.

c out

Decoded Unicode scalar value.

Return Value 🔗

Returns a positive integer when a well-formed Unicode character is decoded, writing the decoded scalar value to c. Returns a negative integer when a malformed character is encountered, writing the Unicode Replacement Character (U+FFFD) to c. Returns zero when the end of text is reached, writing NUL (U+0000) to c.

Discussion 🔗

Decodes the Unicode scalar value from the UTF-8 encoded text at code unit index and writes it to c. The number of code units in text is specified by length, which, if negative, indicates that text is null-terminated. The implementation updates index to the code unit index of the next Unicode scalar value.

If the character at code unit index is malformed, the Unicode Replacement Character (U+FFFD) is written to c, and the implementation safely advances beyond it.

See Also 🔗