01 Current JSON Schema Normalization Rules - 1

CorrineCorrine
10 min read

Introduction

The existing canonicalization implementation _canonicalise.py employs a single ruleset, offering only one transformation strategy for all JSON Schemas. This approach limits flexibility.

As Julian noted in the project description, "There are multiple different normalizations and rulesets we can and will define." Different use cases may require different normalization strategies - some prioritizing validation performance and others might need better human readability.

Before we start designing the transformation rules, let’s analyze the current implementation - the Hypothesis JSON Schema normalizer first.

1. Basic Schema Normalization

1.1 Boolean Schema Normalization

Rule: Convert boolean schemas to standard form

  • true{} (empty object)

  • false{"not": {}}

Transformation Logic: In JSON Schema, true represents accepting all inputs, while false represents rejecting all inputs. This normalization provides consistent object representations for these fundamental concepts.

Code Reference:

if schema is True:
    return {}
elif schema is False:
    return {"not": {}}

1.2 Constant and Enumeration Normalization

1.2.1 Constant Validation

Rule: Validate "const" values and handle invalid constants

  • {"const": invalid_value}{"not": {}}

Transformation Logic: If a constant value doesn't satisfy the schema's own validation requirements, the entire schema is considered unsatisfiable.

Code Reference:

if "const" in schema:
    if not make_validator(schema).is_valid(schema["const"]):
        return FALSEY
    return {"const": schema["const"]}

1.2.2 Enumeration Normalization

Rule: Normalize enumeration lists

  • {"enum": ["foo"]}{"const": "foo"}

  • {"enum": []}{"not": {}}

  • {"enum": [2, invalid_value, 1]}{"enum": [1, 2]} (after removing invalid values and sorting)

Transformation Logic:

  • Filter out invalid enum values

  • Convert empty enums to the "false" schema ({"not": {}})

  • Convert single-value enums to "const" properties

  • Sort enum values for consistency

Code Reference:

if "enum" in schema:
    validator = make_validator(schema)
    enum_ = sorted(
        (v for v in schema["enum"] if validator.is_valid(v)), key=sort_key
    )
    if not enum_:
        return FALSEY
    elif len(enum_) == 1:
        return {"const": enum_[0]}
    return {"enum": enum_}

1.3 Conditional Statement Transformation

Rule: Transform if/then/else structures into logical combinations

  • {"if": A, "then": B, "else": C}{"anyOf": [{"allOf": [A, B, schema]}, {"allOf": [{"not": A}, C, schema]}]}

Transformation Logic: Converts conditional logic into an equivalent combination of logical operators (anyOf/allOf).

Code Reference:

if_ = schema.pop("if", None)
then = schema.pop("then", schema)
else_ = schema.pop("else", schema)
if (if_ is not None and (then is not schema or else_ is not schema)
    and (then not in (if_, TRUTHY) or else_ != TRUTHY)):
    alternatives = [
        {"allOf": [if_, then, schema]},
        {"allOf": [{"not": if_}, else_, schema]},
    ]
    schema = canonicalish({"anyOf": alternatives})

2. Type Handling Normalization

2.1 Type Array Processing

Rule: Normalize type representations

  • {"type": []}{"not": {}}

  • {"type": ["null"]}{"const": null}

  • {"type": ["boolean"]}{"enum": [false, true]}

  • {"type": ["null", "boolean"]}{"enum": [null, false, true]}

  • {"type": ["number", "integer"]}{"type": "number"} (integers are a subset of numbers)

Transformation Logic:

  • Empty type arrays become unsatisfiable schemas

  • Special type handling for null/boolean types converts them to explicit enumerations

  • Removal of redundant types (e.g., "integer" when "number" is present)

Code Reference:

type_ = get_type(schema)
# Empty type array check
if not type_:
    assert type_ == []
    return FALSEY
# Special type conversions
if type_ == ["null"]:
    return {"const": null}
if type_ == ["boolean"]:
    return {"enum": [false, true]}
if type_ == ["null", "boolean"]:
    return {"enum": [null, false, true]}

2.2 Type-Specific Keyword Cleanup

Rule: Remove keywords irrelevant to the current type

  • When string type isn't included, remove pattern, maxLength, etc.

  • When array type isn't included, remove items, maxItems, etc.

  • When object type isn't included, remove properties, required, etc.

Transformation Logic: This removes unnecessary keywords that don't apply to the current schema's types, simplifying the schema.

Code Reference:

for t, kw in TYPE_SPECIFIC_KEYS:
    numeric = {"number", "integer"}
    if t in type_ or (t in numeric and numeric.intersection(type_)):
        continue
    for k in kw.split():
        schema.pop(k, None)

3. Numeric Type Normalization

3.1 Numeric Range Processing

3.1.1 Remove Redundant Exclusivity Flags

Rule: Remove default exclusivity flags

  • {"minimum": 10, "exclusiveMinimum": false}{"minimum": 10}

  • {"maximum": 100, "exclusiveMaximum": false}{"maximum": 100}

Transformation Logic: When exclusivity flags are false (the default), these redundant keywords are removed.

Code Reference:

if schema.get("exclusiveMinimum") is False:
    del schema["exclusiveMinimum"]
if schema.get("exclusiveMaximum") is False:
    del schema["exclusiveMaximum"]

3.1.2 Detect Unsatisfiable Ranges

Rule: Detect and handle contradictory range constraints

  • {"minimum": 10, "maximum": 5} → Remove "number" from type list

Transformation Logic: If the minimum value is greater than the maximum value, the "number" type is removed from the schema's type list, making those constraints inapplicable.

Code Reference:

lo, hi, exmin, exmax = get_number_bounds(schema)
lobound = next_up(lo) if exmin else lo
hibound = next_down(hi) if exmax else hi
if lobound > hibound:
    type_.remove("number")

3.2 Integer Range Processing

Rule: Adjust range boundaries for integer types

  • {"type": "integer", "minimum": 1.5}{"type": "integer", "minimum": 2}

  • {"type": "integer", "maximum": 5.7}{"type": "integer", "maximum": 5}

Transformation Logic:

  • For integer types, non-integer boundaries are adjusted to valid integers (ceiling/floor)

  • Removes unnecessary exclusiveMinimum/exclusiveMaximum flags

Code Reference:

if "integer" in type_:
    lo, hi = get_integer_bounds(schema)
    if lo is not None:
        schema["minimum"] = lo
        schema.pop("exclusiveMinimum", None)
    if hi is not None:
        schema["maximum"] = hi
        schema.pop("exclusiveMaximum", None)

3.3 Multiple Constraint Processing

Rule: Normalize multiple constraints

  • {"multipleOf": -5}{"multipleOf": 5}

  • {"type": "integer", "multipleOf": 1/n}{"type": "integer"} (for any n, since all integers are multiples of 1/n)

Transformation Logic:

  • Ensures multipleOf is positive (takes absolute value)

  • Removes redundant fractional multiplier constraints for integer types

Code Reference:

if "multipleOf" in schema:
    schema["multipleOf"] = abs(schema["multipleOf"])

if mul is not None and "number" not in type_ and Fraction(mul).numerator == 1:
    # Every integer is a multiple of 1/n for all natural numbers n
    schema.pop("multipleOf")

4. Array Normalization

4.1 Items Processing

4.1.1 Handle Items Lists

Rule: Simplify items lists

  • {"items": [{}, {"not": {}}, {}]}{"items": [{}, {}], "maxItems": 2}

Transformation Logic:

  • Trims items beyond maxItems

  • If a FALSEY item is encountered, truncates the array and sets maxItems

Code Reference:

if "array" in type_ and isinstance(schema.get("items"), list):
    schema["items"] = schema["items"][: schema.get("maxItems")]
    for idx, s in enumerate(schema["items"]):
        if s == FALSEY:
            schema["items"] = schema["items"][:idx]
            schema["maxItems"] = idx
            schema.pop("additionalItems", None)
            break

4.1.2 Simplify Unrestricted Items

Rule: Remove redundant items definitions

  • {"items": {}} → Remove items key

Transformation Logic: When items is an empty object (unrestricted), it's removed to simplify the schema.

Code Reference:

if "array" in type_ and schema.get("items", TRUTHY) == TRUTHY:
    schema.pop("items", None)

4.2 Contains Constraint Processing

Rule: Process contains keyword interactions

  • {"contains": {"not": {}}} → Remove "array" from type list

  • {"contains": {}}{"minItems": 1}

Transformation Logic:

  • Attempts to merge contains and items constraints

  • If contains is FALSEY, make array type unavailable

  • If contains is TRUTHY, convert it to minItems >= 1

Code Reference:

if "array" in type_ and "contains" in schema:
    if isinstance(schema.get("items"), dict):
        contains_items = merged([schema["contains"], schema["items"]])
        if contains_items is not None:
            schema["contains"] = contains_items

    if schema["contains"] == FALSEY:
        type_.remove("array")
    else:
        schema["minItems"] = max(schema.get("minItems", 0), 1)
    if schema["contains"] == TRUTHY:
        schema.pop("contains")
        schema["minItems"] = max(schema.get("minItems", 1), 1)

4.3 Length Constraint Processing

Rule: Handle array length constraints

  • {"minItems": 5, "maxItems": 3} → Remove "array" from type list

Transformation Logic: If the minimum length exceeds the maximum length, remove "array" from the type list.

Code Reference:

if "array" in type_ and schema.get("minItems", 0) > schema.get("maxItems", math.inf):
    type_.remove("array")

5. Object Normalization

5.1 Properties Processing

Rule: Normalize property definitions

  • Remove properties with FALSEY values

  • Adjust maxProperties limit when additionalProperties is false

Transformation Logic:

  • Removes invalid properties (with FALSEY values)

  • When additionalProperties is FALSEY, adjusts maxProperties to the property count upper limit

Code Reference:

if ("properties" in schema and not schema.get("patternProperties") 
    and schema.get("additionalProperties") == FALSEY):
    max_props = schema.get("maxProperties", math.inf)
    for k, v in list(schema["properties"].items()):
        if v == FALSEY:
            schema["properties"].pop(k)
    schema["maxProperties"] = min(max_props, len(schema["properties"]))

5.2 Required Processing

Rule: Normalize required property lists

  • Merge required and dependencies

  • Check for conflicts with property constraints

  • Sort required list alphabetically

Transformation Logic:

  • Merges required and dependencies property lists

  • Detects conflicts between required properties and property constraints

  • Alphabetically sorts the required list

Code Reference:

if "object" in type_ and "required" in schema:
    reqs = set(schema["required"])
    # Process dependencies
    if schema.get("dependencies"):
        dep_names = {
            k: sorted(set(v))
            for k, v in schema["dependencies"].items()
            if isinstance(v, list)
        }
        schema["dependencies"].update(dep_names)
        while reqs.intersection(dep_names):
            for r in reqs.intersection(dep_names):
                reqs.update(dep_names.pop(r))
                schema["dependencies"].pop(r)
    schema["required"] = sorted(reqs)

5.3 Dependencies Processing

Rule: Remove empty dependencies

  • {"dependencies": {"prop": []}} → Remove this dependency

Transformation Logic: Removes dependency entries that don't actually restrict anything.

Code Reference:

for k, v in schema.get("dependencies", {}).copy().items():
    if v in ([], TRUTHY):
        schema["dependencies"].pop(k)

6. Logical Combination Normalization

6.1 AnyOf Normalization

Rule: Normalize anyOf structures

  • {"anyOf": [{"anyOf": [A, B]}, C]}{"anyOf": [A, B, C]}

  • {"anyOf": [{}, {"type": "string"}]}{}

  • Deduplicate, sort, and remove FALSEY options

Transformation Logic:

  • Flattens nested anyOf structures

  • Removes FALSEY options and deduplicates

  • Simplifies single-item anyOf

  • Special handling for type-only subschemas

Code Reference:

if "anyOf" in schema:
    i = 0
    while i < len(schema["anyOf"]):
        s = schema["anyOf"][i]
        if set(s) == {"anyOf"}:
            schema["anyOf"][i : i + 1] = s["anyOf"]
            continue
        i += 1
    schema["anyOf"] = [
        json.loads(s)
        for s in sorted(
            {encode_canonical_json(a) for a in schema["anyOf"] if a != FALSEY}
        )
    ]
    if not schema["anyOf"]:
        return FALSEY
    if len(schema) == len(schema["anyOf"]) == 1:
        return schema["anyOf"][0]

6.2 AllOf Normalization

Rule: Normalize allOf structures

  • Deduplicate and sort

  • {"allOf": [{}, {"type": "string"}, {}]}{"allOf": [{}, {"type": "string"}]}

  • {"allOf": [{"not": {}}, ...]}{"not": {}}

  • If all items are TRUTHY, remove allOf

Transformation Logic:

  • Sorts and deduplicates

  • If any item is FALSEY, the entire schema becomes FALSEY

  • If all items are TRUTHY, removes allOf

  • Attempts to merge all conditions

Code Reference:

if "allOf" in schema:
    schema["allOf"] = [
        json.loads(enc)
        for enc in sorted(set(map(encode_canonical_json, schema["allOf"])))
    ]
    if any(s == FALSEY for s in schema["allOf"]):
        return FALSEY
    if all(s == TRUTHY for s in schema["allOf"]):
        schema.pop("allOf")
    # Attempt to merge
    elif len(schema) == len(schema["allOf"]) == 1:
        return schema["allOf"][0]
    else:
        tmp = schema.copy()
        ao = tmp.pop("allOf")
        out = merged([tmp, *ao])
        if out is not None:
            schema = out

6.3 OneOf Normalization

Rule: Normalize oneOf structures

  • Simplify single-item oneOf

  • {"oneOf": [A]} → merge A with the parent schema

  • If empty or containing multiple TRUTHY items, convert to FALSEY

Transformation Logic:

  • Sorts and removes FALSEY options

  • Simplifies single-item oneOf

  • Detects invalid oneOf combinations (empty or multiple TRUTHY)

Code Reference:

if "oneOf" in schema:
    one_of = schema.pop("oneOf")
    one_of = sorted(one_of, key=encode_canonical_json)
    one_of = [s for s in one_of if s != FALSEY]
    if len(one_of) == 1:
        m = merged([schema, one_of[0]])
        if m is not None:
            return m
    if (not one_of) or one_of.count(TRUTHY) > 1:
        return FALSEY
    schema["oneOf"] = one_of

6.4 Not Normalization

Rule: Normalize not keyword

  • {"not": {"not": A}} → A

  • {"not": {"anyOf": [A, B]}}{"not": {"anyOf": [A, B]}}

  • Type exclusion optimization

Transformation Logic:

  • Flattens nested not structures

  • Optimizes based on type constraints

  • Attempts to merge not with existing schema

Code Reference:

if "not" in schema:
    not_ = schema.pop("not")
    negated = []
    to_negate = not_["anyOf"] if set(not_) == {"anyOf"} else [not_]
    for not_ in to_negate:
        # Type constraint handling...
        if set(not_).issubset(type_constraints):
            not_["type"] = get_type(not_)
            for t in set(type_).intersection(not_["type"]):
                if not type_keys.get(t, set()).intersection(not_):
                    type_.remove(t)
                    if t not in ("integer", "number"):
                        not_["type"].remove(t)
            not_ = canonicalish(not_)
        # Merge handling...
    if len(negated) > 1:
        schema["not"] = {"anyOf": negated}
    elif negated:
        schema["not"] = negated[0]

7. Redundancy Removal

Rule: Remove key-value pairs that don't affect validation

  • {"minItems": 0} → Remove key

  • {"items": {}} → Remove key

  • {"required": []} → Remove key

Transformation Logic: Deletes unnecessary keywords with default values, such as:

  • "minItems": 0

  • "items": {}

  • "required": []

Code Reference:

for kw, identity in {
    "minItems": 0,
    "items": {},
    "additionalItems": {},
    "dependencies": {},
    "minProperties": 0,
    "properties": {},
    "propertyNames": {},
    "patternProperties": {},
    "additionalProperties": {},
    "required": [],
}.items():
    if kw in schema and schema[kw] == identity:
        schema.pop(kw)

8. Schema Merging Logic

The merged() function implements schema merging with these key rules:

8.1 Basic Constraint Merging

Rule: Merge boundary constraints

  • For minimum/minLength/etc., take the maximum value

  • For maximum/maxLength/etc., take the minimum value

Transformation Logic:

  • For maximum-type constraints, takes the minimum value

  • For minimum-type constraints, takes the maximum value

Code Reference:

for key in {"maximum", "exclusiveMaximum", "maxLength", "maxItems", "maxProperties"} & set(s) & set(out):
    out[key] = min([out[key], s.pop(key)])
for key in {"minimum", "exclusiveMinimum", "minLength", "minItems", "minProperties"} & set(s) & set(out):
    out[key] = max([out[key], s.pop(key)])

8.2 Complex Structure Merging

Rule: Merge property structures

  • Merge properties considering exact matches, pattern matches, and defaults

  • Merge dependency relationships

  • Attempt to merge items structures

Transformation Logic:

  • Complex merging of property definitions, considering patterns and additional properties

  • Merging dependency relationships

  • Attempting to merge items structures with special handling for arrays

Code Reference:

# Properties merging (abbreviated)
out_props = out.setdefault("properties", {})
s_props = s.pop("properties", {})
for prop_name in set(out_props) | set(s_props):
    # Complex merging logic...
12
Subscribe to my newsletter

Read articles from Corrine directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Corrine
Corrine