Function
uni_norm
Normalize text.
Parameters 🔗
| form | in | Normalization form. |
| src | in | Input text. |
| src_len | in | Number of code units in |
| src_attr | in | Attributes of |
| dst | out | Output string; can be null |
| dst_len | inout | Code unit capacity of |
| dst_attr | in | Attributes of |
Return Value 🔗
| UNI_OK | On success. |
| UNI_BAD_OPERATION | If |
| UNI_BAD_ENCODING | If |
| UNI_NO_SPACE | If |
| UNI_NO_MEMORY | If dynamic memory allocation failed. |
| UNI_FEATURE_DISABLED | If Unicorn was built without support for normalizing to |
Discussion 🔗
This function normalizes src and writes the result to the dst buffer.
The implementation writes to dst_len the total number of code units written to dst. If the capacity of dst is insufficient, then UNI_NO_SPACE is returned.
If dst is null and dst_len is zero, then the function writes to dst_len the number of code units in the fully normalized text and returns UNI_OK. The function can be called this way to first compute the total size of the destination buffer, then called again with a sufficiently sized buffer.
If src_len is -1, then src is assumed to be null-terminated.
Examples 🔗
This example normalizes the input string to Normalization Form D. This form is ideal for in-memory string comparison because it is quick to compute. For persistent storage or transmission, Normalization Form C is preferred.
#include <unicorn.h>
#include <stdio.h>
int main(void)
{
const char *in = u8"Åström";
char out[32];
unisize outlen = sizeof(out);
if (uni_norm(UNI_NFD, in, -1, UNI_UTF8, out, &outlen, UNI_UTF8) != UNI_OK)
{
// something went wrong
return 1;
}
printf("%.*s", outlen, out);
return 0;
}