Function

uni_next

Decode a scalar value.

Since v1.0
unistat uni_next(
const void *text, unisize text_len, uniattr text_attr, unisize *index, unichar *cp)

Parameters 🔗

text in

Character sequence to decode.

text_len in

Number of code units in text or a negative integer if text is null terminated.

text_attr in

Attributes of text.

index inout

Code unit offset in text.

cp out

Decoded scalar value.

Return Value 🔗

UNI_OK

If the scalar was successfully decoded.

UNI_DONE

If the end of text was reached.

UNI_BAD_ENCODING

If text is malformed; this is never returned if text_attr has UNI_TRUST.

UNI_BAD_OPERATION

If text, index, or cp are NULL.

Discussion 🔗

Decodes one Unicode scalar value from text at code unit index and writes the result to cp. The index parameter is updated by the implementation to refer to the code unit beginning the next scalar.

The number of code units in text is specified by text_len and its encoding is specified by text_attr. If text_len is negative then text is assumed to be null terminated.

This function returns UNI_DONE when iteration has reached the end of text otherwise it returns UNI_OK indicating there are more characters. If the implementation detects text is malformed, then it returns UNI_BAD_ENCODING.

The index parameter must refer to a code point boundary otherwise the behavior is undefined.

Examples 🔗

This example prints each Unicode scalar value from a text string encoded as UTF-8.

#include <unicorn.h>
#include <stdio.h>

int main(void)
{
    const char str[] = u8"I 🕵️."; // I spy
    unisize i = 0;
    for (;;)
    {
        unichar cp;
        unistat r = uni_next(str, -1, UNI_UTF8, &i, &cp);
        if (r == UNI_DONE)
        {
            break;
        }
        else if (r == UNI_BAD_ENCODING)
        {
            // malformed character
        }
        else
        {
            printf("U+%04X\n", cp); // print scalar
        }
    }
    return 0;
}