Scanner
Find semantic elements.
Structures π
- struct judo_stream
Scanner state.
- struct judo_span
Code unit range.
Macros π
-
#define JUDO_MAXDEPTH
Maximum nesting depth.
-
#define JUDO_ERRMAX
Maximum error description length.
Types π
- typedef long double judo_number
Floating-point storage type.
Enumerations π
- enum judo_result
Function status code.
- enum judo_element
Semantic element.
Functions π
Incrementally scan JSON.
Lexeme to float.
Lexeme to decoded string.
Discussion π
The Judo scanner processes JSON source text into semantic elements. Semantic elements are notable marker points within the JSON stream that may or may not correspond with a JSON token. You can find semantic elements with the judo_scan function, which returns them one-at-a-time. This function is intended to be called repeatedly until all semantic elements have been processed or an error halts the scanner.
The JSON source text must be encoded as UTF-8. It can be null or none-null terminated. In the following code snippet, a null-terminated string is defined:
const char *json = "{\"abc\":123}";
To scan this string, you need the judo_stream structure. This structure represents the current state of the JSON scanner. It must be zero initialized before its first use.
struct judo_stream stream = {0};
Next, itβs time to find semantic elements by repeatedly calling the judo_scan function. In the subsequent code snippet, judo_scan is called repeatedly until all semantic elements have been processed or an error aborts scanning.
enum judo_result result;
for (;;) {
result = judo_scan(&stream, json, -1);
if (result == JUDO_SUCCESS) {
if (stream.element == JUDO_EOF) {
// successfully completed scanning
}
} else {
// an error occured
}
}
After each call to judo_scan, you can inspect certain fields of the judo_stream structure. These fields are element, where, and error.
The value of the element field is one of the judo_element enumeration constants. It describes the semantic element that was just processed.
The where field describe the span of UTF-8 code units where the semantic element was found in the source text. In the case of literal values, like numbers and strings, the span of code units defines its lexeme.
The error field is a null-terminated UTF-8 encoded string that describes the details of any error in US English. Youβll know an error occurred because judo_scan will return a value other than JUDO_SUCCESS.
Processing Semantic Elements π
Arrays and objects consist of push/pop element pairs. Conceptually, these pairs are like a stack data structure where every βpushβ is matched by a βpopβ element.
Elements of an array are returned as sequential values. Members of an object are returned as element pairs: a name is returned first, then the value is returned next.
Extracting Numeric Values π
When a semantic element is a number its floating point value can be obtained with the judo_numberify function. This function accepts a judo_number which is a floating point storage type chosen when configuring the Judo library.
judo_number number;
judo_numberify(&source[stream.where.offset], stream.where.length, &number);
Extracting String Values π
When a semantic element is a string or object name the decoded string value can be obtained with the judo_stringify function. This function accepts a buffer to write the decoded string data to, as well as a βlengthβ parameter for the buffer. The length parameter represents the bufferβs capacity on input. The implementation will update this parameter with the number of code units written to the buffer on output.
char buf[32];
int32_t buflen = sizeof(buf);
judo_stringify(&source[stream.where.offset], stream.where.length, buf, &buflen);
The resulting buffer is not null-terminated. This is because JSON strings can contain encoded zero bytes, which are decoded and placed as literal zero bytes in the resulting buffer. Therefore, you must rely on the number of code units written to the buffer length parameter to determine the length of the resulting buffer.
To compute the required size of the destination buffer, first call judo_stringify with a null buffer and a zero buffer length. The implementation will update the buffer length parameter with the number of code units needed to represent the fully decoded string. You can then call the function again with a buffer of the appropriate size.
buflen = 0;
judo_stringify(&source[stream.where.offset], stream.where.length, NULL, &buflen);
Handling Errors π
If an error occurs, then the judo_scan function will return a judo_result other than JUDO_SUCCESS. The result code indicates the general classification of the error, e.g. syntax error, bad encoding, etc. Regardless of the error code, the error field of the judo_stream structure will be populated with a UTF-8 encoded error message written in US English.
Additionally, the where field will be populated with the code unit index and count which together communicate the span of code units where the error was detected in the JSON source text. The span can be used to derive line and column numbers for more detailed error reporting.
Saving State π
The Judo scanner does not use global state, static storage, or dynamic memory allocation. Its implementation is thread-safe and all functions are idempotent.
The entire scanner state is maintained by the judo_stream structure. Instances of this structure can be copied with memcpy
to preserve an earlier state.