Function

uni_norm

Normalize text.

Since v1.0
unistat uni_norm(
uninormform form, const void *src, unisize src_len, uniattr src_attr, void *dst, unisize *dst_len, uniattr dst_attr)

Parameters πŸ”—

form in

Normalization form.

src in

Input text.

src_len in

Number of code units in src or -1 if src is null terminated.

src_attr in

Attributes of src.

dst out

Output string; can be NULL

dst_len inout

Code unit capacity of dst on input; number of code units written to dst on output.

dst_attr in

Attributes of dst.

Return Value πŸ”—

UNI_OK

On success.

UNI_BAD_OPERATION

If src is NULL, if dst_len is negative, or if dst is NULL and dst_len is greater than zero.

UNI_BAD_ENCODING

If src is malformed; this is never returned if src_attr has UNI_TRUST.

UNI_NO_SPACE

If dst lacks the capacity to store the normalization of src.

UNI_NO_MEMORY

If dynamic memory allocation failed.

UNI_FEATURE_DISABLED

If Unicorn was built without support for normalizing to form.

Discussion πŸ”—

Normalizes src into the normalization form specified by form and writes the result to dst. If dst is not NULL, then the implementation writes to dst_len the total number of code units written to dst. If the capacity of dst is insufficient, then UNI_NO_SPACE is returned otherwise it returns UNI_OK.

If dst is NULL, then dst_len must be zero. If dst is NULL, then the function writes to dst_len the number of code units in the fully normalized text and returns UNI_OK. Call the function this way to first compute the total size needed for the destination buffer, then call it again with a sufficiently-sized buffer.

Examples πŸ”—

This example normalizes the input string to Normalization Form D. This form is ideal for in-memory string comparison because it is quick to compute. For persistent storage or transmission, Normalization Form C is preferred.

#include <unicorn.h>
#include <stdio.h>

int main(void)
{
    const char *in = u8"Γ…strΓΆm";
    char out[32];
    unisize outlen = sizeof(out);

    if (uni_norm(UNI_NFD, in, -1, UNI_UTF8,  out, &outlen, UNI_UTF8) != UNI_OK)
    {
        // something went wrong
        return 1;
    }

    printf("%.*s", outlen, out);
    return 0;
}