Function
uni_normcmp
Compare strings for canonical equivalence.
Parameters 🔗
| s1 | in | First source text. |
| s1_len | in | Number of code units in |
| s1_attr | in | Attributes of |
| s2 | in | Second source text. |
| s2_len | in | Number of code units in |
| s2_attr | in | Attributes of |
| result | out | Set to |
Return Value 🔗
| UNI_OK | If the string was normalized successfully. |
| UNI_BAD_OPERATION | If |
| UNI_BAD_ENCODING | If |
| UNI_NO_MEMORY | If dynamic memory allocation failed. |
Discussion 🔗
This function checks if s1 and s2 are canonically equivalent. That is, it checks if the graphemes of both strings are the same. The behavior of this function is identical to calling uni_norm with UNI_NFD on both strings followed by a code point comparison.
The implementation is optimized for normalizing the strings incrementally while simultaneously comparing them. This is a more optimal approach when it’s unknown whether input strings are normalized or not. If it’s known in advance that the strings are both normalized, then they can be compared directly with memcmp or strcmp.
The implementation strives to be highly performant and avoid dynamic memory allocation when possible. Typically, memory allocation will only be performed for unnaturally long combining character sequences, like Zalgo text. It’s rare for real-world text to trigger memory allocation.
Examples 🔗
This example compares two strings for canonical equivalence. Conceptually, the implementation normalizes both strings, performs the comparison, and reports the result. This approach is recommended when strings are compared for one-off equality. If strings are compared repeatedly, then it’s recommended to normalize them with uni_norm and cache the result for the comparisons.
#include <unicorn.h>
#include <stdio.h>
#include <stdbool.h>
int main(void)
{
const char *s1 = u8"ma\u0301scara"; // 'a' + U+0301 = á (decomposed)
const char *s2 = u8"m\u00E1scara"; // U+00E1 = á (precomposed)
bool is_equal;
if (uni_normcmp(s1, -1, UNI_UTF8,
s2, -1, UNI_UTF8, &is_equal) != UNI_OK)
{
puts("failed to normalize and compare strings");
return 1;
}
printf("%s", is_equal ? "equal" : "not equal");
return 0;
}