Function

uni_norm

Normalize text.

Since v1.0
unistat uni_norm(
uninormform form, const void *src, unisize src_len, uniattr src_attr, void *dst, unisize *dst_len, uniattr dst_attr)

Parameters 🔗

form in

Normalization form.

src in

Input text.

src_len in

Number of code units in src or -1 if src is null terminated.

src_attr in

Attributes of src.

dst out

Output string; can be NULL

dst_len inout

Code unit capacity of dst on input; number of code units written to dst on output.

dst_attr in

Attributes of dst.

Return Value 🔗

UNI_OK

On success.

UNI_BAD_OPERATION

If src is NULL, if dst_len is negative, or if dst is NULL and dst_len is greater than zero.

UNI_BAD_ENCODING

If src is malformed; this is never returned if src_attr has UNI_TRUST.

UNI_NO_SPACE

If dst lacks the capacity to store the normalization of src.

UNI_NO_MEMORY

If dynamic memory allocation failed.

UNI_FEATURE_DISABLED

If Unicorn was built without support for normalizing to form.

Discussion 🔗

Normalizes src into the normalization form specified by form and writes the result to dst. If dst is not NULL, then the implementation writes to dst_len the total number of code units written to dst. If the capacity of dst is insufficient, then UNI_NO_SPACE is returned otherwise it returns UNI_OK.

If dst is NULL, then dst_len must be zero. If dst is NULL, then the function writes to dst_len the number of code units in the fully normalized text and returns UNI_OK. Call the function this way to first compute the total size needed for the destination buffer, then call it again with a sufficiently-sized buffer.

Examples 🔗

This example normalizes the input string to Normalization Form D. This form is ideal for in-memory string comparison because it is quick to compute. For persistent storage or transmission, Normalization Form C is preferred.

#include <unicorn.h>
#include <stdio.h>

int main(void)
{
    const char *in = u8"Åström";
    char out[32];
    unisize outlen = sizeof(out);

    if (uni_norm(UNI_NFD, in, -1, UNI_UTF8,  out, &outlen, UNI_UTF8) != UNI_OK)
    {
        // something went wrong
        return 1;
    }

    printf("%.*s", outlen, out);
    return 0;
}