Function

uni_nextbrk

Compute next boundary.

Since v1.0
unistat uni_nextbrk(
unibreak boundary, const void *text, unisize text_len, uniattr text_attr, unisize *index)

Parameters 🔗

boundary in

Boundary to detect.

text in

The text to segment.

text_len in

Number of code units in text or -1 if it’s null terminated.

text_attr in

Attributes of text.

index inout

Code point boundary as a code unit offset in text.

Return Value 🔗

UNI_OK

If the break iterator was successfully repositioned.

UNI_DONE

If index is beyond the last character of text.

UNI_BAD_OPERATION

If text or index is NULL.

UNI_BAD_ENCODING

If text is malformed; this is never returned if text_attr has UNI_TRUST.

Discussion 🔗

Compute the next boundary for text starting from a known code point specified by code unit index index. The implementation sets index to the code unit offset of the next boundary.

Examples 🔗

This example iterates the extended grapheme clusters of a UTF-8 encoded string and prints the indices where each grapheme begins.

#include <unicorn.h>
#include <stdio.h>

int main(void)
{
    const char *string = u8"Hi, 世界";
    unisize index = 0;
    while (uni_nextbrk(UNI_GRAPHEME, string, -1, UNI_UTF8, &index) == UNI_OK)
    {
        printf("%d\n", index); // prints '1', '2', '3', '4', 7', '10'
    }
    return 0;
}