Introduction

The existing canonicalization implementation _canonicalise.py employs a single ruleset, offering only one transformation strategy for all JSON Schemas. This approach limits flexibility.

As Julian noted in the project description, "There are multiple different normalizations and rulesets we can and will define." Different use cases may require different normalization strategies - some prioritizing validation performance and others might need better human readability.

Before we start designing the transformation rules, let’s analyze the current implementation - the Hypothesis JSON Schema normalizer first.

1. Basic Schema Normalization

1.1 Boolean Schema Normalization

Rule: Convert boolean schemas to standard form

true → {} (empty object)
false → {"not": {}}

Transformation Logic: In JSON Schema, true represents accepting all inputs, while false represents rejecting all inputs. This normalization provides consistent object representations for these fundamental concepts.

Code Reference:

if schema is True:
    return {}
elif schema is False:
    return {"not": {}}

1.2 Constant and Enumeration Normalization

1.2.1 Constant Validation

Rule: Validate "const" values and handle invalid constants

{"const": invalid_value} → {"not": {}}

Transformation Logic: If a constant value doesn't satisfy the schema's own validation requirements, the entire schema is considered unsatisfiable.

Code Reference:

if "const" in schema:
    if not make_validator(schema).is_valid(schema["const"]):
        return FALSEY
    return {"const": schema["const"]}

1.2.2 Enumeration Normalization

Rule: Normalize enumeration lists

{"enum": ["foo"]} → {"const": "foo"}
{"enum": []} → {"not": {}}
{"enum": [2, invalid_value, 1]} → {"enum": [1, 2]} (after removing invalid values and sorting)

Transformation Logic:

Filter out invalid enum values
Convert empty enums to the "false" schema ({"not": {}})
Convert single-value enums to "const" properties
Sort enum values for consistency

Code Reference:

if "enum" in schema:
    validator = make_validator(schema)
    enum_ = sorted(
        (v for v in schema["enum"] if validator.is_valid(v)), key=sort_key
    )
    if not enum_:
        return FALSEY
    elif len(enum_) == 1:
        return {"const": enum_[0]}
    return {"enum": enum_}

1.3 Conditional Statement Transformation

Rule: Transform if/then/else structures into logical combinations

{"if": A, "then": B, "else": C} → {"anyOf": [{"allOf": [A, B, schema]}, {"allOf": [{"not": A}, C, schema]}]}

Transformation Logic: Converts conditional logic into an equivalent combination of logical operators (anyOf/allOf).

Code Reference:

if_ = schema.pop("if", None)
then = schema.pop("then", schema)
else_ = schema.pop("else", schema)
if (if_ is not None and (then is not schema or else_ is not schema)
    and (then not in (if_, TRUTHY) or else_ != TRUTHY)):
    alternatives = [
        {"allOf": [if_, then, schema]},
        {"allOf": [{"not": if_}, else_, schema]},
    ]
    schema = canonicalish({"anyOf": alternatives})

2. Type Handling Normalization

2.1 Type Array Processing

Rule: Normalize type representations

{"type": []} → {"not": {}}
{"type": ["null"]} → {"const": null}
{"type": ["boolean"]} → {"enum": [false, true]}
{"type": ["null", "boolean"]} → {"enum": [null, false, true]}
{"type": ["number", "integer"]} → {"type": "number"} (integers are a subset of numbers)

Transformation Logic:

Empty type arrays become unsatisfiable schemas
Special type handling for null/boolean types converts them to explicit enumerations
Removal of redundant types (e.g., "integer" when "number" is present)

Code Reference:

type_ = get_type(schema)
# Empty type array check
if not type_:
    assert type_ == []
    return FALSEY
# Special type conversions
if type_ == ["null"]:
    return {"const": null}
if type_ == ["boolean"]:
    return {"enum": [false, true]}
if type_ == ["null", "boolean"]:
    return {"enum": [null, false, true]}

2.2 Type-Specific Keyword Cleanup

Rule: Remove keywords irrelevant to the current type

When string type isn't included, remove pattern, maxLength, etc.
When array type isn't included, remove items, maxItems, etc.
When object type isn't included, remove properties, required, etc.

Transformation Logic: This removes unnecessary keywords that don't apply to the current schema's types, simplifying the schema.

Code Reference:

for t, kw in TYPE_SPECIFIC_KEYS:
    numeric = {"number", "integer"}
    if t in type_ or (t in numeric and numeric.intersection(type_)):
        continue
    for k in kw.split():
        schema.pop(k, None)

3. Numeric Type Normalization

3.1 Numeric Range Processing

3.1.1 Remove Redundant Exclusivity Flags

Rule: Remove default exclusivity flags

{"minimum": 10, "exclusiveMinimum": false} → {"minimum": 10}
{"maximum": 100, "exclusiveMaximum": false} → {"maximum": 100}

Transformation Logic: When exclusivity flags are false (the default), these redundant keywords are removed.

Code Reference:

if schema.get("exclusiveMinimum") is False:
    del schema["exclusiveMinimum"]
if schema.get("exclusiveMaximum") is False:
    del schema["exclusiveMaximum"]

3.1.2 Detect Unsatisfiable Ranges

Rule: Detect and handle contradictory range constraints

{"minimum": 10, "maximum": 5} → Remove "number" from type list

Transformation Logic: If the minimum value is greater than the maximum value, the "number" type is removed from the schema's type list, making those constraints inapplicable.

Code Reference:

lo, hi, exmin, exmax = get_number_bounds(schema)
lobound = next_up(lo) if exmin else lo
hibound = next_down(hi) if exmax else hi
if lobound > hibound:
    type_.remove("number")

3.2 Integer Range Processing

Rule: Adjust range boundaries for integer types

{"type": "integer", "minimum": 1.5} → {"type": "integer", "minimum": 2}
{"type": "integer", "maximum": 5.7} → {"type": "integer", "maximum": 5}

Transformation Logic:

For integer types, non-integer boundaries are adjusted to valid integers (ceiling/floor)
Removes unnecessary exclusiveMinimum/exclusiveMaximum flags

Code Reference:

if "integer" in type_:
    lo, hi = get_integer_bounds(schema)
    if lo is not None:
        schema["minimum"] = lo
        schema.pop("exclusiveMinimum", None)
    if hi is not None:
        schema["maximum"] = hi
        schema.pop("exclusiveMaximum", None)

3.3 Multiple Constraint Processing

Rule: Normalize multiple constraints

{"multipleOf": -5} → {"multipleOf": 5}
{"type": "integer", "multipleOf": 1/n} → {"type": "integer"} (for any n, since all integers are multiples of 1/n)

Transformation Logic:

Ensures multipleOf is positive (takes absolute value)
Removes redundant fractional multiplier constraints for integer types

Code Reference:

if "multipleOf" in schema:
    schema["multipleOf"] = abs(schema["multipleOf"])

if mul is not None and "number" not in type_ and Fraction(mul).numerator == 1:
    # Every integer is a multiple of 1/n for all natural numbers n
    schema.pop("multipleOf")

4. Array Normalization

4.1 Items Processing

4.1.1 Handle Items Lists

Rule: Simplify items lists

{"items": [{}, {"not": {}}, {}]} → {"items": [{}, {}], "maxItems": 2}

Transformation Logic:

Trims items beyond maxItems
If a FALSEY item is encountered, truncates the array and sets maxItems

Code Reference:

if "array" in type_ and isinstance(schema.get("items"), list):
    schema["items"] = schema["items"][: schema.get("maxItems")]
    for idx, s in enumerate(schema["items"]):
        if s == FALSEY:
            schema["items"] = schema["items"][:idx]
            schema["maxItems"] = idx
            schema.pop("additionalItems", None)
            break

4.1.2 Simplify Unrestricted Items

Rule: Remove redundant items definitions

{"items": {}} → Remove items key

Transformation Logic: When items is an empty object (unrestricted), it's removed to simplify the schema.

Code Reference:

if "array" in type_ and schema.get("items", TRUTHY) == TRUTHY:
    schema.pop("items", None)

4.2 Contains Constraint Processing

Rule: Process contains keyword interactions

{"contains": {"not": {}}} → Remove "array" from type list
{"contains": {}} → {"minItems": 1}

Transformation Logic:

Attempts to merge contains and items constraints
If contains is FALSEY, make array type unavailable
If contains is TRUTHY, convert it to minItems >= 1

Code Reference:

if "array" in type_ and "contains" in schema:
    if isinstance(schema.get("items"), dict):
        contains_items = merged([schema["contains"], schema["items"]])
        if contains_items is not None:
            schema["contains"] = contains_items

    if schema["contains"] == FALSEY:
        type_.remove("array")
    else:
        schema["minItems"] = max(schema.get("minItems", 0), 1)
    if schema["contains"] == TRUTHY:
        schema.pop("contains")
        schema["minItems"] = max(schema.get("minItems", 1), 1)

4.3 Length Constraint Processing

Rule: Handle array length constraints

{"minItems": 5, "maxItems": 3} → Remove "array" from type list

Transformation Logic: If the minimum length exceeds the maximum length, remove "array" from the type list.

Code Reference:

if "array" in type_ and schema.get("minItems", 0) > schema.get("maxItems", math.inf):
    type_.remove("array")

5. Object Normalization

5.1 Properties Processing

Rule: Normalize property definitions

Remove properties with FALSEY values
Adjust maxProperties limit when additionalProperties is false

Transformation Logic:

Removes invalid properties (with FALSEY values)
When additionalProperties is FALSEY, adjusts maxProperties to the property count upper limit

Code Reference:

if ("properties" in schema and not schema.get("patternProperties") 
    and schema.get("additionalProperties") == FALSEY):
    max_props = schema.get("maxProperties", math.inf)
    for k, v in list(schema["properties"].items()):
        if v == FALSEY:
            schema["properties"].pop(k)
    schema["maxProperties"] = min(max_props, len(schema["properties"]))

5.2 Required Processing

Rule: Normalize required property lists

Merge required and dependencies
Check for conflicts with property constraints
Sort required list alphabetically

Transformation Logic:

Merges required and dependencies property lists
Detects conflicts between required properties and property constraints
Alphabetically sorts the required list

Code Reference:

if "object" in type_ and "required" in schema:
    reqs = set(schema["required"])
    # Process dependencies
    if schema.get("dependencies"):
        dep_names = {
            k: sorted(set(v))
            for k, v in schema["dependencies"].items()
            if isinstance(v, list)
        }
        schema["dependencies"].update(dep_names)
        while reqs.intersection(dep_names):
            for r in reqs.intersection(dep_names):
                reqs.update(dep_names.pop(r))
                schema["dependencies"].pop(r)
    schema["required"] = sorted(reqs)

5.3 Dependencies Processing

Rule: Remove empty dependencies

{"dependencies": {"prop": []}} → Remove this dependency

Transformation Logic: Removes dependency entries that don't actually restrict anything.

Code Reference:

for k, v in schema.get("dependencies", {}).copy().items():
    if v in ([], TRUTHY):
        schema["dependencies"].pop(k)

6. Logical Combination Normalization

6.1 AnyOf Normalization

Rule: Normalize anyOf structures

{"anyOf": [{"anyOf": [A, B]}, C]} → {"anyOf": [A, B, C]}
{"anyOf": [{}, {"type": "string"}]} → {}
Deduplicate, sort, and remove FALSEY options

Transformation Logic:

Flattens nested anyOf structures
Removes FALSEY options and deduplicates
Simplifies single-item anyOf
Special handling for type-only subschemas

Code Reference:

if "anyOf" in schema:
    i = 0
    while i < len(schema["anyOf"]):
        s = schema["anyOf"][i]
        if set(s) == {"anyOf"}:
            schema["anyOf"][i : i + 1] = s["anyOf"]
            continue
        i += 1
    schema["anyOf"] = [
        json.loads(s)
        for s in sorted(
            {encode_canonical_json(a) for a in schema["anyOf"] if a != FALSEY}
        )
    ]
    if not schema["anyOf"]:
        return FALSEY
    if len(schema) == len(schema["anyOf"]) == 1:
        return schema["anyOf"][0]

6.2 AllOf Normalization

Rule: Normalize allOf structures

Deduplicate and sort
{"allOf": [{}, {"type": "string"}, {}]} → {"allOf": [{}, {"type": "string"}]}
{"allOf": [{"not": {}}, ...]} → {"not": {}}
If all items are TRUTHY, remove allOf

Transformation Logic:

Sorts and deduplicates
If any item is FALSEY, the entire schema becomes FALSEY
If all items are TRUTHY, removes allOf
Attempts to merge all conditions

Code Reference:

if "allOf" in schema:
    schema["allOf"] = [
        json.loads(enc)
        for enc in sorted(set(map(encode_canonical_json, schema["allOf"])))
    ]
    if any(s == FALSEY for s in schema["allOf"]):
        return FALSEY
    if all(s == TRUTHY for s in schema["allOf"]):
        schema.pop("allOf")
    # Attempt to merge
    elif len(schema) == len(schema["allOf"]) == 1:
        return schema["allOf"][0]
    else:
        tmp = schema.copy()
        ao = tmp.pop("allOf")
        out = merged([tmp, *ao])
        if out is not None:
            schema = out

6.3 OneOf Normalization

Rule: Normalize oneOf structures

Simplify single-item oneOf
{"oneOf": [A]} → merge A with the parent schema
If empty or containing multiple TRUTHY items, convert to FALSEY

Transformation Logic:

Sorts and removes FALSEY options
Simplifies single-item oneOf
Detects invalid oneOf combinations (empty or multiple TRUTHY)

Code Reference:

if "oneOf" in schema:
    one_of = schema.pop("oneOf")
    one_of = sorted(one_of, key=encode_canonical_json)
    one_of = [s for s in one_of if s != FALSEY]
    if len(one_of) == 1:
        m = merged([schema, one_of[0]])
        if m is not None:
            return m
    if (not one_of) or one_of.count(TRUTHY) > 1:
        return FALSEY
    schema["oneOf"] = one_of

6.4 Not Normalization

Rule: Normalize not keyword

{"not": {"not": A}} → A
{"not": {"anyOf": [A, B]}} → {"not": {"anyOf": [A, B]}}
Type exclusion optimization

Transformation Logic:

Flattens nested not structures
Optimizes based on type constraints
Attempts to merge not with existing schema

Code Reference:

if "not" in schema:
    not_ = schema.pop("not")
    negated = []
    to_negate = not_["anyOf"] if set(not_) == {"anyOf"} else [not_]
    for not_ in to_negate:
        # Type constraint handling...
        if set(not_).issubset(type_constraints):
            not_["type"] = get_type(not_)
            for t in set(type_).intersection(not_["type"]):
                if not type_keys.get(t, set()).intersection(not_):
                    type_.remove(t)
                    if t not in ("integer", "number"):
                        not_["type"].remove(t)
            not_ = canonicalish(not_)
        # Merge handling...
    if len(negated) > 1:
        schema["not"] = {"anyOf": negated}
    elif negated:
        schema["not"] = negated[0]

7. Redundancy Removal

Rule: Remove key-value pairs that don't affect validation

{"minItems": 0} → Remove key
{"items": {}} → Remove key
{"required": []} → Remove key

Transformation Logic: Deletes unnecessary keywords with default values, such as:

"minItems": 0
"items": {}
"required": []

Code Reference:

for kw, identity in {
    "minItems": 0,
    "items": {},
    "additionalItems": {},
    "dependencies": {},
    "minProperties": 0,
    "properties": {},
    "propertyNames": {},
    "patternProperties": {},
    "additionalProperties": {},
    "required": [],
}.items():
    if kw in schema and schema[kw] == identity:
        schema.pop(kw)

8. Schema Merging Logic

The merged() function implements schema merging with these key rules:

8.1 Basic Constraint Merging

Rule: Merge boundary constraints

For minimum/minLength/etc., take the maximum value
For maximum/maxLength/etc., take the minimum value

Transformation Logic:

For maximum-type constraints, takes the minimum value
For minimum-type constraints, takes the maximum value

Code Reference:

for key in {"maximum", "exclusiveMaximum", "maxLength", "maxItems", "maxProperties"} & set(s) & set(out):
    out[key] = min([out[key], s.pop(key)])
for key in {"minimum", "exclusiveMinimum", "minLength", "minItems", "minProperties"} & set(s) & set(out):
    out[key] = max([out[key], s.pop(key)])

8.2 Complex Structure Merging

Rule: Merge property structures

Merge properties considering exact matches, pattern matches, and defaults
Merge dependency relationships
Attempt to merge items structures

Transformation Logic:

Complex merging of property definitions, considering patterns and additional properties
Merging dependency relationships
Attempting to merge items structures with special handling for arrays

Code Reference:

# Properties merging (abbreviated)
out_props = out.setdefault("properties", {})
s_props = s.pop("properties", {})
for prop_name in set(out_props) | set(s_props):
    # Complex merging logic...

01 Current JSON Schema Normalization Rules - 1

Introduction

1. Basic Schema Normalization

1.1 Boolean Schema Normalization

1.2 Constant and Enumeration Normalization

1.2.1 Constant Validation

1.2.2 Enumeration Normalization

1.3 Conditional Statement Transformation

2. Type Handling Normalization

2.1 Type Array Processing

2.2 Type-Specific Keyword Cleanup

3. Numeric Type Normalization

3.1 Numeric Range Processing

3.1.1 Remove Redundant Exclusivity Flags

3.1.2 Detect Unsatisfiable Ranges

3.2 Integer Range Processing

3.3 Multiple Constraint Processing

4. Array Normalization

4.1 Items Processing

4.1.1 Handle Items Lists

4.1.2 Simplify Unrestricted Items

4.2 Contains Constraint Processing

4.3 Length Constraint Processing

5. Object Normalization

5.1 Properties Processing

5.2 Required Processing

5.3 Dependencies Processing

6. Logical Combination Normalization

6.1 AnyOf Normalization

6.2 AllOf Normalization

6.3 OneOf Normalization

6.4 Not Normalization

7. Redundancy Removal

8. Schema Merging Logic

8.1 Basic Constraint Merging

8.2 Complex Structure Merging

Subscribe to my newsletter

Corrine

Corrine