Function

uni_normcmp

Canonical equivalence.

Since v1.0

unistat uni_normcmp(

const void *s1, unisize s1_len, uniattr s1_attr, const void *s2, unisize s2_len, uniattr s2_attr, bool *result)

Parameters 🔗

s1	in	First source text.
s1_len	in	Number of code units in `s1` or `-1` if `s1` is null terminated.
s1_attr	in	Attributes of `s1`.
s2	in	Second source text.
s2_len	in	Number of code units in `s2` or `-1` if `s2` is null terminated.
s2_attr	in	Attributes of `s2`.
result	out	Set to `true` if `s1` and `s2` are canonically equivalent; else `false`.

Return Value 🔗

UNI_OK	If the string was normalized successfully.
UNI_BAD_OPERATION	If `s1` or `s2` are `NULL`, if `s1_len` or `s2_len` are negative, or if `result` is `NULL`.
UNI_BAD_ENCODING	If `s1` or `s2` is not well-formed (checks are omitted if the corresponding uniattr has UNI_TRUST).
UNI_NO_MEMORY	If dynamic memory allocation failed.

Discussion 🔗

Check if s1 and s2 are canonically equivalent. That is, it checks if the graphemes of both strings are the same. This function is equivalent to calling uni_norm with UNI_NFD followed by a code point comparison.

The implementation is optimized to normalize the strings incrementally while simultaneously comparing them. This is a more optimal approach when it’s unknown whether the text is normalized or not. If it’s known in advance that the text is normalized, then it’s faster to simply perform the code point comparison directly with memcmp or strcmp.

The implementation strives to be highly performant and avoid dynamic memory allocation when possible. Typically, memory allocation will only be performed on unnaturally long combining character sequences, like Zalgo text. It’s rare for real world text to trigger memory allocation.

Manual

Function

uni_normcmp

Parameters 🔗

Return Value 🔗

Discussion 🔗

On This Page