Function

uni_next

Decode a scalar value.

Since v1.0

const void *text, unisize text_len, uniattr text_attr, unisize *index, unichar *cp)

Parameters 🔗

text	in	Character sequence to decode.
text_len	in	Number of code units in `text` or a negative integer if `text` is null terminated.
text_attr	in	Attributes of `text`.
index	inout	Code unit offset in `text`.
cp	out	Decoded scalar value.

Return Value 🔗

UNI_OK	If the scalar was successfully decoded.
UNI_DONE	If the end of `text` was reached.
UNI_BAD_ENCODING	If `text` is malformed; this is never returned if `text_attr` has UNI_TRUST.
UNI_BAD_OPERATION	If `text`, `index`, or `cp` are NULL.

Discussion 🔗

Decodes one Unicode scalar value from text at code unit index and writes the result to cp. The index parameter is updated by the implementation to refer to the code unit beginning the next scalar.

The number of code units in text is specified by text_len and its encoding is specified by text_attr. If text_len is negative then text is assumed to be null terminated.

This function returns UNI_DONE when iteration has reached the end of text otherwise it returns UNI_OK indicating there are more characters. If the implementation detects text is malformed, then it returns UNI_BAD_ENCODING.

The index parameter must refer to a code point boundary otherwise the behavior is undefined.

Examples 🔗

This example prints each Unicode scalar value from a text string encoded as UTF-8.

#include <unicorn.h>
#include <stdio.h>

int main(void)
{
    const char str[] = u8"I 🕵️."; // I spy
    unisize i = 0;
    for (;;)
    {
        unichar cp;
        unistat r = uni_next(str, -1, UNI_UTF8, &i, &cp);
        if (r == UNI_DONE)
        {
            break;
        }
        else if (r == UNI_BAD_ENCODING)
        {
            // malformed character
        }
        else
        {
            printf("U+%04X\n", cp); // print scalar
        }
    }
    return 0;
}

Manual