Statistics Learning Journey – Day 2 (More on Data Table and Histogram)

Shubham GuptaShubham Gupta
4 min read

Topics I learned on Day 2 ⬇️:


  1. Relative Frequency Table

    A relative frequency table is a type of frequency table where instead of showing the actual counts (frequencies) of data values, it shows the proportions or percentages of the total.

    It helps us understand how often each value occurs relative to the whole dataset.

  • Example:-

    Suppose we surveyed 20 people about their favorite color:

COLORFREQUENCYRELATIVE FREQUENCY
Red66/20 = 0.30 (30%)
Blue88/20 = 0.40 (40%)
Green44/20 = 0.20 (20%)
Yellow22/20 = 0.10 (10%)
TOTAL201.00 (100%)

πŸ‘‰ This shows proportion of each color relative to the total people surveyed and also note that this is a Column Related Frequency, We can also make Row related frequency if there is more than one row.
For example, 40% like Blue, which is the most popular color.


  1. Joint Distribution

    A joint distribution describes the probability (or relative frequency) of two variables occurring together.
    It’s usually shown in a two-way (contingency) table where rows represent one variable and columns represent another.

  • Example:-

    Below is a normal data about people playing sport and their gender. Now if we need to calculate that how much proportion of Females playing Badminton?

SPORT / GENDERMALEFEMALETOTAL
Cricket81220
Football131225
Badminton7815
Hockey17825
TOTAL454085
  • Now Let’s see how to get the proportions.
SPORT / GENDERMALE (Proportion)FEMALE (Proportion)TOTAL
Cricket8/85 = 0.09412/85 = 0.1410.235
Football13/85 = 0.15212/85 = 0.1410.293
Badminton7/85 = 0.0828/85 = 0.0940.176
Hockey17/85 = 0.208/85 = 0.0940.294
TOTAL0.5280.4700.998 (Approx 1.000)
  • Interpretation:

    (Badminton & Females) = 0.094 β€”> About 9.4% of people are playing Badminton and they are Females.


  1. Histogram

    A histogram is a type of graph that shows the frequency (or relative frequency) of numerical data values within certain ranges (called bins or intervals).

  • Example and The process of making any histogram graph:-

    Here is a Dataset that shows some selected students age of a school:

    β€”> 12, 11, 18, 19, 15, 9, 10, 16, 17, 13, 12, 20, 15, 7, 19, 14, 14, 10, 17, 10, 17, 18, 18, 13, 14, 15, 15, 10, 20, 12, 9, 15, 16, 13, 14, 15, 15, 18, 20

    β€”> Now to make histogram graph we need Bins. So, Let’s make bins.

    Find Range using dataset

    minimum: 7, maximum: 20 β€”β€”> Range: Max - Min = 13

    β€”> Find best number of Bins for your dataset

    A common rule: √n bins**,** Where n = 39

    √39 = 6.2 So we can use 6 or 7 bins.

    β€”> Find Bin Width

    Bin Width: Range/No. of Bins = 1.9 So we can say approx 2.

    β€”> Now on the x-axis write all the bins width where we put Bins like this 7-8, 9-10, 11-12, 13-14, 15-16, 17-18, 19-20.

    β€”> Now count how many values fall in each bin

    7–8: {7} β†’ 1

    9–10: {9,9,10,10,10,10} β†’ 6

    11–12: {11,12,12,12} β†’ 4

    13–14: {13,13,13,14,14,14,14} β†’ 7

    15–16: {15,15,15,15,15,15,15,16,16} β†’ 9

    17–18: {17,17,17,18,18,18,18} β†’ 7

    19–20: {19,19,20,20,20} β†’ 5

    Finally here is your Histogram πŸ‘‡πŸ»:


Conclusion

These topics and concepts are really helpful to me for Data Visualization and Analyzing them πŸ“Š.

If you know more about any of these topics then definitely tell me, I am ready to deep dive in these topics.

Closing Note

This wraps up Day 2 of my Statistics journey πŸš€.
I’ll keep sharing my learnings here and on LinkedIn. Stay tuned for Day 3!

0
Subscribe to my newsletter

Read articles from Shubham Gupta directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Shubham Gupta
Shubham Gupta