The Math Behind ML – Week 3 : Set Theory-Part - 1

Srikar AmaraSrikar Amara
12 min read

Hey Everyone, We are back with the next part in our series The Math Behind ML - Introduction to Set Theory.

Let’s recap from previous week :

Linear Algebra: The Secret Arsenal - Recap

Linear algebra empowers us to manipulate and analyze data.

Imagine you have an image on a flat 2D screen… but now you want to pop it into 3D space, spin it around, shrink it, or even mix it with other data (like Iron Man—because why not?). Guess what? Linear algebra is the behind-the-scenes wizard that makes all that possible!

And like any good story, we’ve got three heroes that make the magic happen:

  • Vectors – These are like arrows pointing to where things are or where they’re heading. In machine learning, vectors are how we hold data—whether it's a single image, a line of text, or user behavior. They’re the basic building blocks of everything!

  • Matrices – Think of these as transformation tools. They rotate, stretch, squish, and flip our data into different shapes. They help us move between 2D and 3D worlds, adjust images, or even shrink huge datasets into simpler ones.

  • Dot Products – These little math tools tell us how “in sync” two vectors are. Super handy when trying to figure out angles, lighting in graphics, or how similar two things are—like matching faces, finding movie recommendations, or detecting patterns in text.

    Linear Algebra is the tool box that helps machines see, think, and learn.

Before getting into vectors let’s revise set theory — Trust me this revision will be helpful in solidifying the upcoming concepts.


Set Theory

Sets

In mathematics, a set is a collection of objects or elements. A set does not contain any duplicate values. Imagine a country as an example. We can think of a country as a collection of cities, states, and towns. In this scenario, everything within the country is considered an element or object.

Representation of Sets

Let's look at the set of even numbers from 2 to 10, called X. We can show this set in different ways, each giving us a unique view:

  • Semantic Form (Descriptive): The set of all even numbers that are at least 2 and at most 10. (This form describes the property that defines the elements of the set in simple language.)

  • Roster Form (Enumeration): {2, 4, 6, 8, 10} (This form lists all the elements in the set explicitly.)

  • Set Builder Form (Set Comprehension): This form defines the set by specifying a property that its elements must satisfy. Here are a couple of common ways to express this:

    • {x∣x≤10 and x is a positive even number} (This reads as "the set of all x such that x is less than or equal to 10 and x is a positive even number.")

    • {x∈Z∣2≤x≤10 and x≡0(mod2)} (This is a more formal notation, reading as "the set of all x belonging to the set of integers (Z) such that x is greater than or equal to 2, less than or equal to 10, and x is congruent to 0 modulo 2," which means x is divisible by 2.)

  • Visual Representation [Venn diagram] :

Note for ML Context: Understanding these different representations is useful in machine learning. For example:

  • Roster form can be used for small, fixed sets of categories or labels.

  • Set builder form is helpful for defining sets based on conditions, which is important in feature selection or defining data subsets based on specific criteria.

  • Venn diagrams can help visualize the overlap or separation of different data segments or feature spaces.


Types of sets & operations

Now that you’ve understood what a set represents, let’s dive deep into various types of sets. I’ll try to keep it crisp.

  • In the world of mathematics and computer science, sets are fundamental. Let's explore some key types:

    • Finite Set: Sets You Can Count Imagine a box with a limited number of items. That's a finite set! It contains a specific number of elements that you can actually count, no matter how large that number might be.

      • Example: The set of all whole numbers less than 100: {1, 2, 3, ..., 99}.
    • Infinite Set: The Never-Ending Collection Now picture a box that keeps getting filled forever! An infinite set has an unlimited number of elements – it goes on endlessly.

      • Examples: The set of all positive integers (1, 2, 3, ...), the set of all real numbers on the number line.
    • Empty Set (or Null Set): Absolutely Nothing Inside This is a special set that contains no elements at all. Think of an empty box. It's represented by the symbol '∅' (phi) or simply by empty curly braces '{}'.

      • Example: The set of all even prime numbers greater than 2 is an empty set.
    • Singleton Set (or Unit Set): A Lonely Element A singleton set is a set with just one element inside. It's a set with only one lonely element :(

      • Example: The set containing the number 7: {7}.
    • Subset: One Set Contained Within Another If all the elements of one set are also found within a larger set, then the smaller set is a subset of the larger one.

      • Example: If Set A = {1, 2, 3} and Set B = {1, 2, 3, 4, 5}, then A is a subset of B because every number in A is also in B.
    • Proper Subset: A Part of the Whole, But Not the Whole Thing A proper subset is like a regular subset, but with one crucial difference: it's not identical to the larger set. It contains some, but not all, of the elements of the larger set.

      • Example: If Set A = {a, b, c}, then {a, b} is a proper subset of A because it contains elements from A but not all of them.
    • Universal Set: The Big Picture Imagine a container that holds everything related to a specific topic. That's the universal set. It's the overarching set that encompasses all the elements under consideration.

      • Example: If we're talking about different types of vehicles, the universal set might be {cars, motorcycles, trucks, buses, bicycles, trains, airplanes, ships, ...}. Then, the set of {cars, trucks, buses} would be a subset of this universal set.
    • Power Set: The Set of All Possibilities The power set of a given set is a set containing all the possible subsets of that original set, including the empty set and the set itself. It explores every combination!

      • Example: If Set A = {x, y}, its power set is {∅, {x}, {y}, {x, y}}.

The Rules of the Game: Fundamental Properties of Sets

Sets aren't just static collections; they follow specific rules that govern how they interact. Understanding these properties is key to grasping many concepts in mathematics, computer science, and yes, even machine learning!

  • Commutative Property: Order Doesn't Matter Think of solving a puzzle. Whether you add the first piece before the second or vice versa, the final picture is the same. Similarly, with sets:

    • Union: Combining set A and set B gives the same result as combining set B and set A: A∪B=B∪A

    • Intersection: Finding the common elements in A and B is the same as finding the common elements in B and A: A∩B=B∩A

  • Associative Property: Grouping Doesn't Change the Outcome Imagine stacking blocks. The final stack is the same whether you first combine the red and blue blocks, then add the green, or if you first combine the blue and green, then add the red. For sets:

    • Union: How you group sets for repeated union doesn't change the final combined set:(A∪B)∪C=A∪(B∪C)

    • Intersection: Similarly, the grouping in repeated intersection doesn't affect the common elements:(A∩B)∩C=A∩(B∩C)

  • Distributive Property: Spreading Operations This is like distributing multiplication over addition in arithmetic. With sets, intersection can be distributed over union, and vice versa:

    • A∩(B∪C)=(A∩B)∪(A∩C)

    • A∪(B∩C)=(A∪B)∩(A∪C)

  • Identity Property: The Neutral Elements Just like adding zero or multiplying by one leaves a number unchanged, there are identity elements for set operations:

    • Union: Combining any set with the empty set leaves the original set unchanged: A∪∅=A

    • Intersection: The elements common to any set and the universal set are simply the elements of the original set: A∩U=A

  • Complement Property: What's Missing Consider a set A within a larger universal set U. The complement of A (denoted as Ac) contains everything in U that's not in A:

    • The union of a set and its complement covers the entire universal set: A∪Ac=U

    • A set and its complement have no elements in common: A∩Ac=∅

  • Idempotent Property: Doing It Again Doesn't Change It Combining a set with itself, whether through union or intersection, simply results in the same set:

    • A∪A=A

    • A∩A=A

  • De Morgan's Laws: Flipping Operations and Sets These powerful laws describe how complements interact with union and intersection:

    • The complement of the union of two sets is equal to the intersection of their complements:(A∪B)c=Ac∩Bc

    • The complement of the intersection of two sets is equal to the union of their complements:(A∩B)c=Ac∪Bc

  • Law of Double Complementation: Back to the Start Taking the complement of a set twice brings you back to the original set:

    • (Ac)c=A
  • Laws of the Empty Set and Universal Set: The empty set and the universal set have unique complement properties:

    • The complement of the empty set is the entire universal set: ∅c=U

    • The complement of the universal set is the empty set: Uc=∅

Understanding these properties provides a solid foundation for working with sets, which, as we discussed earlier, have numerous applications in machine learning, from data manipulation to model evaluation. They provide the logical framework for many algorithms and data handling techniques.


More operations on sets :

  • Set Difference: What's Left Behind The difference between two sets, say A and B (written as A−B), gives you a new set containing only the elements that are in A but not in B. It's like taking away the elements of B from A.

    • Example: If Set A = {1, 2, 3} and Set B = {3, 4, 5, 6}, then A−B={1,2}. Notice that the element '3', which is present in both A and B, is excluded from the result.
  • Set Complement: Everything Outside The complement of a set A (denoted as Ac or A) is defined with respect to a universal set (U), which contains all possible elements under consideration.

    The complement of A includes all the elements in U that are not in A. It's like finding everything outside the boundary of set A within the universe.

    • Example: Let's say our universal set U={1,2,3,4,5,6,7,8,9,10} and Set A = {2, 4, 6, 8}. Then, the complement of A, Ac, would be {1,3,5,7,9}.

    • (Note: You mentioned a figure in your original text. If you have a visual representation for this, definitely include it in your blog post as it can greatly aid understanding!)

  • Cartesian Product: Creating Ordered Pairs The Cartesian product of two sets, A and B (written as A×B), is a set formed by taking every possible ordered pair where the first element of the pair comes from A and the second element comes from B. The order of elements in these pairs matters!

    • Formally: A×B={(x,y)∣x∈A and y∈B}

    • Example: If Set A = {1, 2, 3} and Set B = {H, T}, then the Cartesian product A×B is: {(1,H),(1,T),(2,H),(2,T),(3,H),(3,T)} Notice that (1,H) is different from (H,1) because the order matters in ordered pairs.


Visualizing Sets: Fun Facts about Venn Diagrams and Their Power in Machine Learning

We've explored the fascinating world of sets – their types and how we can manipulate them. Now, let's shine a spotlight on how we often visualize these relationships: Venn Diagrams.

Fun Facts About Venn Diagrams:

  • Named After the Pioneer: These diagrams are named after John Venn, a British logician and philosopher who formally introduced them in 1880. So, next time you see those overlapping shapes, remember the person who made this visual tool so accessible!

  • Beyond Circles: While the classic Venn diagram uses overlapping circles to represent sets, the concept isn't limited to just circles. Other shapes like ellipses, rectangles, or even more abstract forms can be used, especially when dealing with a larger number of sets or more complex relationships. However, circles remain the most common and easily understandable representation.

  • Visualizing Logic: Venn diagrams are powerful tools for illustrating logical relationships between sets, such as intersection (common elements), union (all combined elements), and difference (elements unique to one set). This visual clarity makes abstract set theory much more concrete.

From Visuals to Real-World Impact: Applications of Sets in Machine Learning

The above abstract ideas we've talked about become really useful in machine learning. Here are a few ways:

1. Organizing Features:

Imagine you're looking at data about different types of fruit. One "feature" could be the set of all red fruits (like apples and strawberries). Another feature could be the set of all round fruits (like apples and oranges).

  • How Sets Help: By thinking of each feature as a set, we can see which data points belong to which category.

  • Venn Diagrams: A Venn diagram could show the overlap – the fruits that are both red and round (like red apples). This helps us understand how different features relate to each other.

2. Cleaning and Preparing Data:

Think of all the data you have as one big set.

  • How Sets Help: Sets automatically remove duplicates. So, if you have a list of customer IDs and some are repeated, thinking of it as a set instantly gives you a list of unique customers. Also, if you have two different lists of data, you can use set operations to find the customers that are in both lists (intersection) or in one list but not the other (difference).

3. Evaluating Model Performance:

When a machine learning model tries to predict categories (like "cat" or "dog"), we can think of the actual correct answers as one set, and the model's predictions as another set.

  • How Sets Help: We can see how many predictions match the actual answers (the intersection of the two sets). This idea is behind important measures like precision (how many of the "cat" predictions were actually cats) and recall (how many of the actual cats did the model correctly identify).

4. Understanding Relationships (e.g., Recommendations):

Imagine a website recommending movies. Each user has a set of movies they've liked.

  • How Sets Help: By comparing the sets of liked movies for different users, we can find users with similar tastes (a large intersection of their "liked" movie sets). This helps in recommending new movies that a user might enjoy based on what others with similar tastes have liked.

5. Ensuring Data Integrity (Splitting Data):

When we train a machine learning model, we usually split our data into different sets: a training set, a validation set, and a test set.

  • How Sets Help: We want to make sure these sets have no overlap (they are "disjoint sets"). If there was an overlap, the model might "cheat" by learning from data in the training set and then seeing the same data in the test set, giving a misleadingly good performance. Sets help us define and check for this lack of overlap.

Hold onto your hats! In our next installment, we'll unravel the magic of matrices and vectors and explore Advanced Set Concepts like Relations and Functions (Set Theory Part - II), revealing how these mathematical structures power the amazing feats of machine learning, along with coding examples showcasing their real-world utility in ML and some practical problems.

2
Subscribe to my newsletter

Read articles from Srikar Amara directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Srikar Amara
Srikar Amara