Function
uni_norm
Normalize text.
Parameters 🔗
| form | in | Normalization form. |
| src | in | Input text. |
| src_len | in | Number of code units in |
| src_attr | in | Attributes of |
| dst | out | Output string; can be |
| dst_len | inout | Code unit capacity of |
| dst_attr | in | Attributes of |
Return Value 🔗
| UNI_OK | On success. |
| UNI_BAD_OPERATION | If |
| UNI_BAD_ENCODING | If |
| UNI_NO_SPACE | If |
| UNI_NO_MEMORY | If dynamic memory allocation failed. |
| UNI_FEATURE_DISABLED | If Unicorn was built without support for normalizing to |
Discussion 🔗
Normalizes src into the normalization form specified by form and writes the result to dst. If dst is not NULL, then the implementation writes to dst_len the total number of code units written to dst. If the capacity of dst is insufficient, then UNI_NO_SPACE is returned otherwise it returns UNI_OK.
If dst is NULL, then dst_len must be zero. If dst is NULL, then the function writes to dst_len the number of code units in the fully normalized text and returns UNI_OK. Call the function this way to first compute the total size needed for the destination buffer, then call it again with a sufficiently-sized buffer.
Examples 🔗
This example normalizes the input string to Normalization Form D. This form is ideal for in-memory string comparison because it is quick to compute. For persistent storage or transmission, Normalization Form C is preferred.
#include <unicorn.h>
#include <stdio.h>
int main(void)
{
const char *in = u8"Åström";
char out[32];
unisize outlen = sizeof(out);
if (uni_norm(UNI_NFD, in, -1, UNI_UTF8, out, &outlen, UNI_UTF8) != UNI_OK)
{
// something went wrong
return 1;
}
printf("%.*s", outlen, out);
return 0;
}