01 Current JSON Schema Normalization Rules - 1

Introduction
The existing canonicalization implementation _
canonicalise.py
employs a single ruleset, offering only one transformation strategy for all JSON Schemas. This approach limits flexibility.
As Julian noted in the project description, "There are multiple different normalizations and rulesets we can and will define." Different use cases may require different normalization strategies - some prioritizing validation performance and others might need better human readability.
Before we start designing the transformation rules, let’s analyze the current implementation - the Hypothesis JSON Schema normalizer first.
1. Basic Schema Normalization
1.1 Boolean Schema Normalization
Rule: Convert boolean schemas to standard form
true
→{}
(empty object)false
→{"not": {}}
Transformation Logic: In JSON Schema, true
represents accepting all inputs, while false
represents rejecting all inputs. This normalization provides consistent object representations for these fundamental concepts.
Code Reference:
if schema is True:
return {}
elif schema is False:
return {"not": {}}
1.2 Constant and Enumeration Normalization
1.2.1 Constant Validation
Rule: Validate "const" values and handle invalid constants
{"const": invalid_value}
→{"not": {}}
Transformation Logic: If a constant value doesn't satisfy the schema's own validation requirements, the entire schema is considered unsatisfiable.
Code Reference:
if "const" in schema:
if not make_validator(schema).is_valid(schema["const"]):
return FALSEY
return {"const": schema["const"]}
1.2.2 Enumeration Normalization
Rule: Normalize enumeration lists
{"enum": ["foo"]}
→{"const": "foo"}
{"enum": []}
→{"not": {}}
{"enum": [2, invalid_value, 1]}
→{"enum": [1, 2]}
(after removing invalid values and sorting)
Transformation Logic:
Filter out invalid enum values
Convert empty enums to the "false" schema (
{"not": {}}
)Convert single-value enums to "const" properties
Sort enum values for consistency
Code Reference:
if "enum" in schema:
validator = make_validator(schema)
enum_ = sorted(
(v for v in schema["enum"] if validator.is_valid(v)), key=sort_key
)
if not enum_:
return FALSEY
elif len(enum_) == 1:
return {"const": enum_[0]}
return {"enum": enum_}
1.3 Conditional Statement Transformation
Rule: Transform if/then/else structures into logical combinations
{"if": A, "then": B, "else": C}
→{"anyOf": [{"allOf": [A, B, schema]}, {"allOf": [{"not": A}, C, schema]}]}
Transformation Logic: Converts conditional logic into an equivalent combination of logical operators (anyOf/allOf).
Code Reference:
if_ = schema.pop("if", None)
then = schema.pop("then", schema)
else_ = schema.pop("else", schema)
if (if_ is not None and (then is not schema or else_ is not schema)
and (then not in (if_, TRUTHY) or else_ != TRUTHY)):
alternatives = [
{"allOf": [if_, then, schema]},
{"allOf": [{"not": if_}, else_, schema]},
]
schema = canonicalish({"anyOf": alternatives})
2. Type Handling Normalization
2.1 Type Array Processing
Rule: Normalize type representations
{"type": []}
→{"not": {}}
{"type": ["null"]}
→{"const": null}
{"type": ["boolean"]}
→{"enum": [false, true]}
{"type": ["null", "boolean"]}
→{"enum": [null, false, true]}
{"type": ["number", "integer"]}
→{"type": "number"}
(integers are a subset of numbers)
Transformation Logic:
Empty type arrays become unsatisfiable schemas
Special type handling for null/boolean types converts them to explicit enumerations
Removal of redundant types (e.g., "integer" when "number" is present)
Code Reference:
type_ = get_type(schema)
# Empty type array check
if not type_:
assert type_ == []
return FALSEY
# Special type conversions
if type_ == ["null"]:
return {"const": null}
if type_ == ["boolean"]:
return {"enum": [false, true]}
if type_ == ["null", "boolean"]:
return {"enum": [null, false, true]}
2.2 Type-Specific Keyword Cleanup
Rule: Remove keywords irrelevant to the current type
When string type isn't included, remove pattern, maxLength, etc.
When array type isn't included, remove items, maxItems, etc.
When object type isn't included, remove properties, required, etc.
Transformation Logic: This removes unnecessary keywords that don't apply to the current schema's types, simplifying the schema.
Code Reference:
for t, kw in TYPE_SPECIFIC_KEYS:
numeric = {"number", "integer"}
if t in type_ or (t in numeric and numeric.intersection(type_)):
continue
for k in kw.split():
schema.pop(k, None)
3. Numeric Type Normalization
3.1 Numeric Range Processing
3.1.1 Remove Redundant Exclusivity Flags
Rule: Remove default exclusivity flags
{"minimum": 10, "exclusiveMinimum": false}
→{"minimum": 10}
{"maximum": 100, "exclusiveMaximum": false}
→{"maximum": 100}
Transformation Logic: When exclusivity flags are false (the default), these redundant keywords are removed.
Code Reference:
if schema.get("exclusiveMinimum") is False:
del schema["exclusiveMinimum"]
if schema.get("exclusiveMaximum") is False:
del schema["exclusiveMaximum"]
3.1.2 Detect Unsatisfiable Ranges
Rule: Detect and handle contradictory range constraints
{"minimum": 10, "maximum": 5}
→ Remove "number" from type list
Transformation Logic: If the minimum value is greater than the maximum value, the "number" type is removed from the schema's type list, making those constraints inapplicable.
Code Reference:
lo, hi, exmin, exmax = get_number_bounds(schema)
lobound = next_up(lo) if exmin else lo
hibound = next_down(hi) if exmax else hi
if lobound > hibound:
type_.remove("number")
3.2 Integer Range Processing
Rule: Adjust range boundaries for integer types
{"type": "integer", "minimum": 1.5}
→{"type": "integer", "minimum": 2}
{"type": "integer", "maximum": 5.7}
→{"type": "integer", "maximum": 5}
Transformation Logic:
For integer types, non-integer boundaries are adjusted to valid integers (ceiling/floor)
Removes unnecessary exclusiveMinimum/exclusiveMaximum flags
Code Reference:
if "integer" in type_:
lo, hi = get_integer_bounds(schema)
if lo is not None:
schema["minimum"] = lo
schema.pop("exclusiveMinimum", None)
if hi is not None:
schema["maximum"] = hi
schema.pop("exclusiveMaximum", None)
3.3 Multiple Constraint Processing
Rule: Normalize multiple constraints
{"multipleOf": -5}
→{"multipleOf": 5}
{"type": "integer", "multipleOf": 1/n}
→{"type": "integer"}
(for any n, since all integers are multiples of 1/n)
Transformation Logic:
Ensures
multipleOf
is positive (takes absolute value)Removes redundant fractional multiplier constraints for integer types
Code Reference:
if "multipleOf" in schema:
schema["multipleOf"] = abs(schema["multipleOf"])
if mul is not None and "number" not in type_ and Fraction(mul).numerator == 1:
# Every integer is a multiple of 1/n for all natural numbers n
schema.pop("multipleOf")
4. Array Normalization
4.1 Items Processing
4.1.1 Handle Items Lists
Rule: Simplify items
lists
{"items": [{}, {"not": {}}, {}]}
→{"items": [{}, {}], "maxItems": 2}
Transformation Logic:
Trims items beyond maxItems
If a FALSEY item is encountered, truncates the array and sets maxItems
Code Reference:
if "array" in type_ and isinstance(schema.get("items"), list):
schema["items"] = schema["items"][: schema.get("maxItems")]
for idx, s in enumerate(schema["items"]):
if s == FALSEY:
schema["items"] = schema["items"][:idx]
schema["maxItems"] = idx
schema.pop("additionalItems", None)
break
4.1.2 Simplify Unrestricted Items
Rule: Remove redundant items
definitions
{"items": {}}
→ Remove items key
Transformation Logic: When items
is an empty object (unrestricted), it's removed to simplify the schema.
Code Reference:
if "array" in type_ and schema.get("items", TRUTHY) == TRUTHY:
schema.pop("items", None)
4.2 Contains Constraint Processing
Rule: Process contains keyword interactions
{"contains": {"not": {}}}
→ Remove "array" from type list{"contains": {}}
→{"minItems": 1}
Transformation Logic:
Attempts to merge contains and
items
constraintsIf contains is FALSEY, make array type unavailable
If contains is TRUTHY, convert it to
minItems >= 1
Code Reference:
if "array" in type_ and "contains" in schema:
if isinstance(schema.get("items"), dict):
contains_items = merged([schema["contains"], schema["items"]])
if contains_items is not None:
schema["contains"] = contains_items
if schema["contains"] == FALSEY:
type_.remove("array")
else:
schema["minItems"] = max(schema.get("minItems", 0), 1)
if schema["contains"] == TRUTHY:
schema.pop("contains")
schema["minItems"] = max(schema.get("minItems", 1), 1)
4.3 Length Constraint Processing
Rule: Handle array length constraints
{"minItems": 5, "maxItems": 3}
→ Remove "array" from type list
Transformation Logic: If the minimum length exceeds the maximum length, remove "array" from the type list.
Code Reference:
if "array" in type_ and schema.get("minItems", 0) > schema.get("maxItems", math.inf):
type_.remove("array")
5. Object Normalization
5.1 Properties Processing
Rule: Normalize property definitions
Remove properties with FALSEY values
Adjust
maxProperties
limit whenadditionalProperties
is false
Transformation Logic:
Removes invalid properties (with FALSEY values)
When
additionalProperties
is FALSEY, adjustsmaxProperties
to the property count upper limit
Code Reference:
if ("properties" in schema and not schema.get("patternProperties")
and schema.get("additionalProperties") == FALSEY):
max_props = schema.get("maxProperties", math.inf)
for k, v in list(schema["properties"].items()):
if v == FALSEY:
schema["properties"].pop(k)
schema["maxProperties"] = min(max_props, len(schema["properties"]))
5.2 Required Processing
Rule: Normalize required property lists
Merge required and dependencies
Check for conflicts with property constraints
Sort required list alphabetically
Transformation Logic:
Merges required and dependencies property lists
Detects conflicts between required properties and property constraints
Alphabetically sorts the required list
Code Reference:
if "object" in type_ and "required" in schema:
reqs = set(schema["required"])
# Process dependencies
if schema.get("dependencies"):
dep_names = {
k: sorted(set(v))
for k, v in schema["dependencies"].items()
if isinstance(v, list)
}
schema["dependencies"].update(dep_names)
while reqs.intersection(dep_names):
for r in reqs.intersection(dep_names):
reqs.update(dep_names.pop(r))
schema["dependencies"].pop(r)
schema["required"] = sorted(reqs)
5.3 Dependencies Processing
Rule: Remove empty dependencies
{"dependencies": {"prop": []}}
→ Remove this dependency
Transformation Logic: Removes dependency entries that don't actually restrict anything.
Code Reference:
for k, v in schema.get("dependencies", {}).copy().items():
if v in ([], TRUTHY):
schema["dependencies"].pop(k)
6. Logical Combination Normalization
6.1 AnyOf Normalization
Rule: Normalize anyOf
structures
{"anyOf": [{"anyOf": [A, B]}, C]}
→{"anyOf": [A, B, C]}
{"anyOf": [{}, {"type": "string"}]}
→{}
Deduplicate, sort, and remove FALSEY options
Transformation Logic:
Flattens nested
anyOf
structuresRemoves FALSEY options and deduplicates
Simplifies single-item
anyOf
Special handling for type-only subschemas
Code Reference:
if "anyOf" in schema:
i = 0
while i < len(schema["anyOf"]):
s = schema["anyOf"][i]
if set(s) == {"anyOf"}:
schema["anyOf"][i : i + 1] = s["anyOf"]
continue
i += 1
schema["anyOf"] = [
json.loads(s)
for s in sorted(
{encode_canonical_json(a) for a in schema["anyOf"] if a != FALSEY}
)
]
if not schema["anyOf"]:
return FALSEY
if len(schema) == len(schema["anyOf"]) == 1:
return schema["anyOf"][0]
6.2 AllOf Normalization
Rule: Normalize allOf
structures
Deduplicate and sort
{"allOf": [{}, {"type": "string"}, {}]}
→{"allOf": [{}, {"type": "string"}]}
{"allOf": [{"not": {}}, ...]}
→{"not": {}}
If all items are TRUTHY, remove
allOf
Transformation Logic:
Sorts and deduplicates
If any item is FALSEY, the entire schema becomes FALSEY
If all items are TRUTHY, removes
allOf
Attempts to merge all conditions
Code Reference:
if "allOf" in schema:
schema["allOf"] = [
json.loads(enc)
for enc in sorted(set(map(encode_canonical_json, schema["allOf"])))
]
if any(s == FALSEY for s in schema["allOf"]):
return FALSEY
if all(s == TRUTHY for s in schema["allOf"]):
schema.pop("allOf")
# Attempt to merge
elif len(schema) == len(schema["allOf"]) == 1:
return schema["allOf"][0]
else:
tmp = schema.copy()
ao = tmp.pop("allOf")
out = merged([tmp, *ao])
if out is not None:
schema = out
6.3 OneOf Normalization
Rule: Normalize oneOf
structures
Simplify single-item oneOf
{"oneOf": [A]}
→ merge A with the parent schemaIf empty or containing multiple TRUTHY items, convert to FALSEY
Transformation Logic:
Sorts and removes FALSEY options
Simplifies single-item oneOf
Detects invalid oneOf combinations (empty or multiple TRUTHY)
Code Reference:
if "oneOf" in schema:
one_of = schema.pop("oneOf")
one_of = sorted(one_of, key=encode_canonical_json)
one_of = [s for s in one_of if s != FALSEY]
if len(one_of) == 1:
m = merged([schema, one_of[0]])
if m is not None:
return m
if (not one_of) or one_of.count(TRUTHY) > 1:
return FALSEY
schema["oneOf"] = one_of
6.4 Not Normalization
Rule: Normalize not keyword
{"not": {"not": A}}
→ A{"not": {"anyOf": [A, B]}}
→{"not": {"anyOf": [A, B]}}
Type exclusion optimization
Transformation Logic:
Flattens nested not structures
Optimizes based on type constraints
Attempts to merge not with existing schema
Code Reference:
if "not" in schema:
not_ = schema.pop("not")
negated = []
to_negate = not_["anyOf"] if set(not_) == {"anyOf"} else [not_]
for not_ in to_negate:
# Type constraint handling...
if set(not_).issubset(type_constraints):
not_["type"] = get_type(not_)
for t in set(type_).intersection(not_["type"]):
if not type_keys.get(t, set()).intersection(not_):
type_.remove(t)
if t not in ("integer", "number"):
not_["type"].remove(t)
not_ = canonicalish(not_)
# Merge handling...
if len(negated) > 1:
schema["not"] = {"anyOf": negated}
elif negated:
schema["not"] = negated[0]
7. Redundancy Removal
Rule: Remove key-value pairs that don't affect validation
{"minItems": 0}
→ Remove key{"items": {}}
→ Remove key{"required": []}
→ Remove key
Transformation Logic: Deletes unnecessary keywords with default values, such as:
"minItems": 0
"items": {}
"required": []
Code Reference:
for kw, identity in {
"minItems": 0,
"items": {},
"additionalItems": {},
"dependencies": {},
"minProperties": 0,
"properties": {},
"propertyNames": {},
"patternProperties": {},
"additionalProperties": {},
"required": [],
}.items():
if kw in schema and schema[kw] == identity:
schema.pop(kw)
8. Schema Merging Logic
The merged()
function implements schema merging with these key rules:
8.1 Basic Constraint Merging
Rule: Merge boundary constraints
For minimum/minLength/etc., take the maximum value
For maximum/maxLength/etc., take the minimum value
Transformation Logic:
For maximum-type constraints, takes the minimum value
For minimum-type constraints, takes the maximum value
Code Reference:
for key in {"maximum", "exclusiveMaximum", "maxLength", "maxItems", "maxProperties"} & set(s) & set(out):
out[key] = min([out[key], s.pop(key)])
for key in {"minimum", "exclusiveMinimum", "minLength", "minItems", "minProperties"} & set(s) & set(out):
out[key] = max([out[key], s.pop(key)])
8.2 Complex Structure Merging
Rule: Merge property structures
Merge properties considering exact matches, pattern matches, and defaults
Merge dependency relationships
Attempt to merge items structures
Transformation Logic:
Complex merging of property definitions, considering patterns and additional properties
Merging dependency relationships
Attempting to merge
items
structures with special handling for arrays
Code Reference:
# Properties merging (abbreviated)
out_props = out.setdefault("properties", {})
s_props = s.pop("properties", {})
for prop_name in set(out_props) | set(s_props):
# Complex merging logic...
Subscribe to my newsletter
Read articles from Corrine directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
