Function

uni_norm

Normalize text.

Since v1.0
unistat uni_norm(
uninormform form, const void *src, unisize src_len, uniattr src_attr, void *dst, unisize *dst_len, uniattr dst_attr)

Parameters 🔗

form in

Normalization form.

src in

Input text.

src_len in

Number of code units in src or -1 if src is null-terminated.

src_attr in

Attributes of src.

dst out

Output string; can be null

dst_len inout

Code unit capacity of dst on input; number of code units written to dst on output.

dst_attr in

Attributes of dst.

Return Value 🔗

UNI_OK

On success.

UNI_BAD_OPERATION

If src is null, if dst_len is negative, or if dst is NULL and dst_len is greater than zero.

UNI_BAD_ENCODING

If src is malformed; this is never returned if src_attr has UNI_TRUST.

UNI_NO_SPACE

If dst lacks the capacity to store the normalization of src.

UNI_NO_MEMORY

If dynamic memory allocation failed.

UNI_FEATURE_DISABLED

If Unicorn was built without support for normalizing to form.

Discussion 🔗

Normalizes the text pointed to by src, which is in the encoding form specified by src_attr, to the encoding specified by dst_attr and write the result to dst. If src_len is -1, then src is assumed to be null-terminated.

If dst is not null, then the implementation writes to dst_len the total number of code units written to dst. If the capacity of dst is insufficient, then UNI_NO_SPACE is returned.

If dst is null, then the function writes to dst_len the number of code units in the fully normalized text and returns UNI_OK. Call the function this way to first compute the total size needed for the destination buffer, then call it again with a sufficiently sized buffer.

If dst is null, then dst_len must be zero.

Examples 🔗

This example normalizes the input string to Normalization Form D. This form is ideal for in-memory string comparison because it is quick to compute. For persistent storage or transmission, Normalization Form C is preferred.

#include <unicorn.h>
#include <stdio.h>

int main(void)
{
    const char *in = u8"Åström";
    char out[32];
    unisize outlen = sizeof(out);

    if (uni_norm(UNI_NFD, in, -1, UNI_UTF8,  out, &outlen, UNI_UTF8) != UNI_OK)
    {
        // something went wrong
        return 1;
    }

    printf("%.*s", outlen, out);
    return 0;
}