Unicorn: The Mythical Bridge Between Unicode® and Embedded Devices
Posted on 2024-12-02T16:44:00Z #unicorn
Today, Railgun Labs proudly announces Unicorn – an embeddable implementation of common Unicode® algorithms.
Key Features
Unicorn implements essential Unicode algorithms, namely:
- Normalization (NFC, NFD)
- Case folding
- Case conversion
- Collation (via the DUCET)
- Grapheme, word, and sentence segmentation
- Short string compression (via BOCU-1)
Fully Customizable
Unicorn is highly customizable, allowing you to include only the Unicode algorithms, character properties, and encoding forms that are specific to your application. While the full build of Unicorn is several hundred kilobytes, removing unused features will significantly reduce the size to meet the constraints of your system.
Portability Across Platforms
Unicorn is written in C99 and requires only a minimal set of features from libc which are listed in the following table. Unicorn is entirely portable and does not require an FPU or 64-bit integer support, making it ideal for resource constrained devices like microcontrollers and IoT devices.
Header | Types | Macros | Functions |
---|---|---|---|
stdint.h | int8_t , int16_t , int32_t uint8_t , uint16_t , uint32_t |
||
string.h | memcpy , memset , memcmp |
||
stddef.h | size_t |
NULL |
|
stdbool.h | bool , true , false |
||
assert.h | assert |
Fault Tolerant
All operations in Unicorn are atomic. That means either an operation succeeds or nothing occurs at all. This guarantees errors, such as out-of-memory errors, never corrupt internal state. This also means if an error occurs, like an out of memory error, that you can recover (free up memory) and try the same operation again.
Comprehensive Testing
Unicorn has undergone extensive testing to ensure Unicode conformance, robustness, and security:
- Official Unicode conformance tests
- Manually written tests
- Out-of-memory tests
- Fuzz tests
- Static analysis
- Valgrind analysis
- Code sanitizers (UBSAN, ASAN, and MSAN)
- Extensive use of assert() and run-time checks
MISRA C:2012 Compliance
Unicorn adheres to the MISRA C:2012 guidelines, meeting all Required and Mandatory rules, and most Advisory rules.
Any deviations from the standard are documented here.
Closing Thoughts
Unicorn brings efficient and reliable Unicode algorithm support to embedded systems. The public source distribution is available here.
Unicode® is a registered trademark of Unicode, Inc. in the United States and other countries. Railgun Labs is not affiliated with, endorsed, or sponsored by Unicode, Inc. (the Unicode Consortium).