Scanner

Find tokens.

Structures 🔗

struct judo_stream: Scanner state.

struct judo_span: Code unit range.

Macros 🔗

#define JUDO_MAXDEPTH: Maximum nesting depth.

#define JUDO_ERRMAX: Maximum error description length.

Types 🔗

typedef long double judo_number: Floating-point storage type.

Enumerations 🔗

enum judo_result: Function status code.

enum judo_token: Semantic element.

Functions 🔗

enum judo_result judo_scan( struct judo_stream *stream, const char *source, int32_t length): Incrementally scan JSON.

enum judo_result judo_numberify( const char *lexeme, int32_t length, judo_number *number): Lexeme to float.

enum judo_result judo_stringify( const char *lexeme, int32_t length, char *buf, int32_t *buflen): Lexeme to decoded string.

Discussion 🔗

The Judo scanner processes JSON source text into semantic tokens. You discover tokens with the judo_scan function, which returns them incrementally. This function is intended to be called repeatedly until all tokens have been scanned.

The JSON source text must be encoded as UTF-8. It can be null or non-null terminated. In the following code snippet, a null-terminated string is defined:

const char *json = "{\"abc\":123}";

To scan this string, you need the judo_stream structure. This structure represents the current state of the JSON scanner. It must be zero initialized before its first use.

struct judo_stream stream = {0};

Next, tokens are discovered by repeatedly calling the judo_scan function. In the subsequent code snippet, the function is called repeatedly until all tokens have been processed or an error is detected.

enum judo_result result;
for (;;) {
    result = judo_scan(&stream, json, -1);
     if (result == JUDO_RESULT_SUCCESS) {
        if (stream.token == JUDO_TOKEN_EOF) {
            // successfully completed scanning
        }
    } else {
        // an error occurred
    }
}

After each call to judo_scan, you can inspect certain fields of the judo_stream structure. These fields are token, where, and error.

The value of the token field is one of the judo_token enumeration constants. It describes the token that was just processed.

The where field describes the span of UTF-8 code units where the token was found in the source text. In the case of literal values, like numbers and strings, the span of code units defines its lexeme.

The error field is a null-terminated UTF-8 encoded string that describes the details of any error in US English. You’ll know an error occurred because judo_scan will return a value other than JUDO_RESULT_SUCCESS.

Processing Tokens 🔗

Arrays and objects consist of start/end token pairs. In well-formed JSON, each “start” token will always be matched by an “end” token. In malformed JSON, the scanner will report an error if there is a mismatch.

Elements of an array are returned as sequential values. Members of an object are returned as token pairs: a name is returned first, then the value is returned next.

Extracting Numeric Values 🔗

When a token is a number its floating point value can be obtained with the judo_numberify function. This function accepts a judo_number which is a floating point storage type chosen when configuring the Judo library.

judo_number number;
judo_numberify(&source[stream.where.offset], stream.where.length, &number);

Extracting String Values 🔗

When a token is a string or object name the decoded string value can be obtained with the judo_stringify function. This function accepts a buffer to write the decoded string data to, as well as a “length” parameter for the buffer. The length parameter represents the buffer’s capacity on input. The implementation will update this parameter with the number of code units written to the buffer on output.

char buf[32];
int32_t buflen = sizeof(buf);
judo_stringify(&source[stream.where.offset], stream.where.length, buf, &buflen);

The resulting buffer is not null-terminated. This is because JSON strings can contain encoded zero bytes, which are decoded and placed as literal zero bytes in the resulting buffer. Therefore, you must rely on the number of code units written to the buffer length parameter to determine the length of the resulting buffer.

To compute the required size of the destination buffer, first call judo_stringify with a null buffer and a zero buffer length. The implementation will update the buffer length parameter with the number of code units needed to represent the fully decoded string. You can then call the function again with a buffer of the appropriate size.

buflen = 0;
judo_stringify(&source[stream.where.offset], stream.where.length, NULL, &buflen);

Handling Errors 🔗

If an error occurs, then the judo_scan function will return a judo_result other than JUDO_RESULT_SUCCESS. The result code indicates the general classification of the error, e.g. syntax error, bad encoding, etc. Regardless of the error code, the error field of the judo_stream structure will be populated with a UTF-8 encoded error message written in US English.

Additionally, the where field will be populated with the code unit index and count which together communicate the span of code units where the error was detected in the JSON source text. The span can be used to derive line and column numbers for more detailed error reporting.

Saving State 🔗

The Judo scanner does not use global state, static storage, or dynamic memory allocation. Its implementation is thread-safe and all functions are idempotent.

The entire scanner state is maintained by the judo_stream structure. Instances of this structure can be copied with memcpy to preserve an earlier state.

Manual