Function

uni_normcmp

Canonical equivalence.

Since v1.0
unistat uni_normcmp(
const void *s1, unisize s1_len, uniattr s1_attr, const void *s2, unisize s2_len, uniattr s2_attr, bool *result)

Parameters ๐Ÿ”—

s1 in

First source text.

s1_len in

Number of code units in s1 or -1 if s1 is null terminated.

s1_attr in

Attributes of s1.

s2 in

Second source text.

s2_len in

Number of code units in s2 or -1 if s2 is null terminated.

s2_attr in

Attributes of s2.

result out

Set to true if s1 and s2 are canonically equivalent; else false.

Return Value ๐Ÿ”—

UNI_OK

If the string was normalized successfully.

UNI_BAD_OPERATION

If s1 or s2 are NULL, if s1_len or s2_len are negative, or if result is NULL.

UNI_BAD_ENCODING

If s1 or s2 is not well-formed (checks are omitted if the corresponding uniattr has UNI_TRUST).

UNI_NO_MEMORY

If dynamic memory allocation failed.

Discussion ๐Ÿ”—

Check if s1 and s2 are canonically equivalent. That is, it checks if the graphemes of both strings are the same. This function is equivalent to calling uni_norm with UNI_NFD followed by a code point comparison.

The implementation is optimized to normalize the strings incrementally while simultaneously comparing them. This is a more optimal approach when itโ€™s unknown whether the text is normalized or not. If itโ€™s known in advance that the text is normalized, then itโ€™s faster to simply perform the code point comparison directly with memcmp or strcmp.

The implementation strives to be highly performant and avoid dynamic memory allocation when possible. Typically, memory allocation will only be performed on unnaturally long combining character sequences, like Zalgo text. Itโ€™s rare for real world text to trigger memory allocation.