Scanner

Find semantic elements.

Structures πŸ”—

struct judo_stream

Scanner state.

struct judo_span

Code unit range.

Macros πŸ”—

Maximum nesting depth.

#define JUDO_ERRMAX

Maximum error description length.

Types πŸ”—

typedef long double judo_number

Floating-point storage type.

Enumerations πŸ”—

enum judo_result

Function status code.

enum judo_element

Semantic element.

Functions πŸ”—

struct judo_stream *stream, const char *source, int32_t length)

Incrementally scan JSON.

const char *lexeme, int32_t length, judo_number *number)

Lexeme to float.

const char *lexeme, int32_t length, char *buf, int32_t *buflen)

Lexeme to decoded string.

Discussion πŸ”—

The Judo scanner processes JSON source text into semantic elements. Semantic elements are notable marker points within the JSON stream that may or may not correspond with a JSON token. You can find semantic elements with the judo_scan function, which returns them one-at-a-time. This function is intended to be called repeatedly until all semantic elements have been processed or an error halts the scanner.

The JSON source text must be encoded as UTF-8. It can be null or none-null terminated. In the following code snippet, a null-terminated string is defined:

const char *json = "{\"abc\":123}";

To scan this string, you need the judo_stream structure. This structure represents the current state of the JSON scanner. It must be zero initialized before its first use.

struct judo_stream stream = {0};

Next, it’s time to find semantic elements by repeatedly calling the judo_scan function. In the subsequent code snippet, judo_scan is called repeatedly until all semantic elements have been processed or an error aborts scanning.

enum judo_result result;
for (;;) {
    result = judo_scan(&stream, json, -1);
     if (result == JUDO_SUCCESS) {
        if (stream.element == JUDO_EOF) {
            // successfully completed scanning
        }
    } else {
        // an error occured
    }
}

After each call to judo_scan, you can inspect certain fields of the judo_stream structure. These fields are element, where, and error.

The value of the element field is one of the judo_element enumeration constants. It describes the semantic element that was just processed.

The where field describe the span of UTF-8 code units where the semantic element was found in the source text. In the case of literal values, like numbers and strings, the span of code units defines its lexeme.

The error field is a null-terminated UTF-8 encoded string that describes the details of any error in US English. You’ll know an error occurred because judo_scan will return a value other than JUDO_SUCCESS.

Processing Semantic Elements πŸ”—

Arrays and objects consist of push/pop element pairs. Conceptually, these pairs are like a stack data structure where every β€œpush” is matched by a β€œpop” element.

Elements of an array are returned as sequential values. Members of an object are returned as element pairs: a name is returned first, then the value is returned next.

Extracting Numeric Values πŸ”—

When a semantic element is a number its floating point value can be obtained with the judo_numberify function. This function accepts a judo_number which is a floating point storage type chosen when configuring the Judo library.

judo_number number;
judo_numberify(&source[stream.where.offset], stream.where.length, &number);

Extracting String Values πŸ”—

When a semantic element is a string or object name the decoded string value can be obtained with the judo_stringify function. This function accepts a buffer to write the decoded string data to, as well as a β€œlength” parameter for the buffer. The length parameter represents the buffer’s capacity on input. The implementation will update this parameter with the number of code units written to the buffer on output.

char buf[32];
int32_t buflen = sizeof(buf);
judo_stringify(&source[stream.where.offset], stream.where.length, buf, &buflen);

The resulting buffer is not null-terminated. This is because JSON strings can contain encoded zero bytes, which are decoded and placed as literal zero bytes in the resulting buffer. Therefore, you must rely on the number of code units written to the buffer length parameter to determine the length of the resulting buffer.

To compute the required size of the destination buffer, first call judo_stringify with a null buffer and a zero buffer length. The implementation will update the buffer length parameter with the number of code units needed to represent the fully decoded string. You can then call the function again with a buffer of the appropriate size.

buflen = 0;
judo_stringify(&source[stream.where.offset], stream.where.length, NULL, &buflen);

Handling Errors πŸ”—

If an error occurs, then the judo_scan function will return a judo_result other than JUDO_SUCCESS. The result code indicates the general classification of the error, e.g. syntax error, bad encoding, etc. Regardless of the error code, the error field of the judo_stream structure will be populated with a UTF-8 encoded error message written in US English.

Additionally, the where field will be populated with the code unit index and count which together communicate the span of code units where the error was detected in the JSON source text. The span can be used to derive line and column numbers for more detailed error reporting.

Saving State πŸ”—

The Judo scanner does not use global state, static storage, or dynamic memory allocation. Its implementation is thread-safe and all functions are idempotent.

The entire scanner state is maintained by the judo_stream structure. Instances of this structure can be copied with memcpy to preserve an earlier state.