Untagged Algebraic Data Types, with ChatGPT

The following is an edited version of an article written by ChatGPT, after some back-and-forth.

This summary explores untagged algebraic data types (ADTs), focusing on their similarities and differences with traditional tagged ADTs. We examine union types (untagged sums) and typed multisets (untagged products), discussing their algebraic properties and connections to category theory. The aim is to provide a rigorous yet accessible overview of untagged ADTs, highlighting their theoretical significance and potential applications.


Introduction

Algebraic data types (ADTs) are fundamental constructs in programming languages and type theory, enabling the composition of complex types from simpler ones using sum (variant) and product (tuple) operations. Traditional ADTs are tagged, meaning each component is associated with a tag or label that disambiguates its identity and type. In contrast, untagged ADTs omit these explicit identifiers, resulting in union types (untagged sums) and typed multisets (untagged products). Understanding untagged ADTs offers insights into alternative data modeling approaches and enriches the theoretical landscape of type systems.


Union Types (Untagged Sums)

Union types represent values that may belong to multiple types without explicit tagging to indicate which type is held. For types \(A\) and \(B\), the union type \(A \cup B\) consists of values that are either of type \(A\) or type \(B\). Unlike tagged sum types, there is no inherent mechanism to distinguish between the types at runtime.

Algebraic Properties:

  • Associativity: \(A \cup (B \cup C) \cong (A \cup B) \cup C\)

  • Commutativity: \(A \cup B \cong B \cup A\)

  • Idempotence: \(A \cup A \cong A\)

These properties align union types with the mathematical notion of set union, forming a commutative idempotent monoid under the union operation.


Typed Multisets (Untagged Products)

Typed multisets are unordered collections of elements with specified types, possibly allowing multiple instances of the same type. For types \(A_1, A_2, \dots, A_n\), the typed multiset values of type \(\{ A_1, A_2, \dots, A_n \}\) contain \(n\) elements of exactly these types, without any ordering or labeling.

Algebraic Properties:

  • Associativity: \(\{A, \{B, C\}\} \cong \{\{A, B\},C\} \cong \{A, B,C\}\)

  • Commutativity: \(\{ A, B \} \cong \{ B, A \}\); order is irrelevant at both the type and value levels.

  • Non-Idempotence: \(\{ A, A \} \not\cong \{ A \}\); component type multiplicity is significant.

Typed multisets form a commutative monoid under multiset combination, accounting for the multiplicity and unordered nature of elements.


Comparing Untagged and Tagged ADTs

While both untagged and tagged ADTs enable the construction of complex types, the absence of tags in untagged ADTs leads to key differences:

  • Value Equivalence Under Permutation: In untagged products (typed multisets), values are equivalent under permutation of their elements. This means that the value \(\{a, b\}\) is considered equal to \(\{b, a\}\). In contrast, tagged product types (tuples) preserve the order of elements, and \((a, b)\) is generally not equal to \((b, a)\).

  • Type Discrimination: In tagged ADTs, tags or labels facilitate type discrimination and pattern matching, enabling operations that rely on knowing which variant or component is present. Untagged ADTs lack explicit tags, so type discrimination must be achieved through other means.

Despite these differences, untagged ADTs can emulate their tagged counterparts with the help of explicit tags or labels, highlighting a fundamental relationship between the two paradigms.


Connections to Category Theory

In category theory, tagged sum and product types correspond to coproducts and products, respectively. Untagged ADTs can be associated with other categorical constructs:

  • Union Types: Correspond to coproducts without injections, emphasizing the absence of explicit tagging.

  • Typed Multisets: Align with symmetric monoidal categories, where the tensor product is commutative up to isomorphism, reflecting the unordered nature of typed multisets. In these categories, morphisms are invariant under permutation of inputs, mirroring the equivalence of values in typed multisets.

Additionally, typed multisets can be modeled as elements of the free commutative monoid generated by the set of types, capturing the essence of multiplicity and unordered combination. This mathematical structure formalizes the way elements combine without concern for order, and where repeated types are significant.


Conclusion

Untagged ADTs offer an alternative perspective on data type construction, emphasizing unordered and untagged combinations of types. The connections to category theory provide a robust mathematical foundation, suggesting avenues for further research and potential applications in programming language design and type systems.

0
Subscribe to my newsletter

Read articles from Stephane Bersier directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Stephane Bersier
Stephane Bersier