Case Mapping

Case conversion, matching, and detection.

Case Conversion Functions 🔗

Transform text into a cased form.

unistat uni_caseconv( unicaseconv casing, const void *src, unisize src_len, uniattr src_attr, void *dst, unisize *dst_len, uniattr dst_attr): Perform case conversion.

unistat uni_caseconvchk( unicaseconv casing, const void *text, unisize text_len, uniattr text_attr, bool *result): Check case status.

Case Folding Functions 🔗

Transform text for case-insensitive string comparisons.

unistat uni_casefold( unicasefold casing, const void *src, unisize src_len, uniattr src_attr, void *dst, unisize *dst_len, uniattr dst_attr): Perform case folding.

unistat uni_casefoldcmp( unicasefold casing, const void *s1, unisize s1_len, uniattr s1_attr, const void *s2, unisize s2_len, uniattr s2_attr, bool *result): Case-insensitive string comparison.

unistat uni_casefoldchk( unicasefold casing, const void *text, unisize text_len, uniattr text_attr, bool *result): Check case fold status.

Enumerations 🔗

enum unicaseconv: Case conversion operations.

enum unicasefold: Case folding operations.

Discussion 🔗

Case mapping is the process of transforming characters from one case to another for comparison or presentation.

Case Conversion 🔗

Case conversion transforms text into a particular cased form. The resulting text is presentable to an end user.

There are three case conversion operations:

Case conversion is context sensitive which means characters are mapped based on their surrounding characters. If only one character is being case converted in isolation then the following simple case mapping functions may be used:

Support for case conversion must be enabled in the JSON configuration file. Where string values “lower”, “title”, and “upper” correspond to the UNI_LOWER, UNI_TITLE, and UNI_UPPER constants, respectively.

{
    "algorithms": {
        "caseConversion": [
            "lower",
            "title",
            "upper"
        ],
    }
}

Case Folding 🔗

Case folding transforms text into a form intended for case-insensitive string comparisons. Case folding is primarily based on the simple lowercase mappings, but there are instances where uppercase characters are used such as Cherokee characters. A case folded string must never be displayed to an end-user. It’s only intended for internal case-insensitive comparisons.

There are two variations of case folding supported by Unicorn:

Default case folding is intended for comparing strings for binary equivalence whereas canonical case folding is intended for testing canonical caseless equivalence.

Default case folding does not preserve normalization forms. This means a string in a particular Unicode normalization form is not guaranteed to be in that same normalization form after being case folded.

Support for case folding must be enabled in the JSON configuration file. Where string values “default” and “canonical” correspond to the UNI_DEFAULT and UNI_CANONICAL constants, respectively.

{
    "algorithms": {
        "caseFolding": [
            "default",
            "canonical"
        ],
    }
}

Manual