Statistics Learning Journey β Day 2 (More on Data Table and Histogram)

Topics I learned on Day 2 β¬οΈ:
Relative Frequency Table
A relative frequency table is a type of frequency table where instead of showing the actual counts (frequencies) of data values, it shows the proportions or percentages of the total.
It helps us understand how often each value occurs relative to the whole dataset.
Example:-
Suppose we surveyed 20 people about their favorite color:
COLOR | FREQUENCY | RELATIVE FREQUENCY |
Red | 6 | 6/20 = 0.30 (30%) |
Blue | 8 | 8/20 = 0.40 (40%) |
Green | 4 | 4/20 = 0.20 (20%) |
Yellow | 2 | 2/20 = 0.10 (10%) |
TOTAL | 20 | 1.00 (100%) |
π This shows proportion of each color relative to the total people surveyed and also note that this is a Column Related Frequency, We can also make Row related frequency if there is more than one row.
For example, 40% like Blue, which is the most popular color.
Joint Distribution
A joint distribution describes the probability (or relative frequency) of two variables occurring together.
Itβs usually shown in a two-way (contingency) table where rows represent one variable and columns represent another.
Example:-
Below is a normal data about people playing sport and their gender. Now if we need to calculate that how much proportion of Females playing Badminton?
SPORT / GENDER | MALE | FEMALE | TOTAL |
Cricket | 8 | 12 | 20 |
Football | 13 | 12 | 25 |
Badminton | 7 | 8 | 15 |
Hockey | 17 | 8 | 25 |
TOTAL | 45 | 40 | 85 |
- Now Letβs see how to get the proportions.
SPORT / GENDER | MALE (Proportion) | FEMALE (Proportion) | TOTAL |
Cricket | 8/85 = 0.094 | 12/85 = 0.141 | 0.235 |
Football | 13/85 = 0.152 | 12/85 = 0.141 | 0.293 |
Badminton | 7/85 = 0.082 | 8/85 = 0.094 | 0.176 |
Hockey | 17/85 = 0.20 | 8/85 = 0.094 | 0.294 |
TOTAL | 0.528 | 0.470 | 0.998 (Approx 1.000) |
Interpretation:
(Badminton & Females) = 0.094 β> About 9.4% of people are playing Badminton and they are Females.
Histogram
A histogram is a type of graph that shows the frequency (or relative frequency) of numerical data values within certain ranges (called bins or intervals).
Example and The process of making any histogram graph:-
Here is a Dataset that shows some selected students age of a school:
β> 12, 11, 18, 19, 15, 9, 10, 16, 17, 13, 12, 20, 15, 7, 19, 14, 14, 10, 17, 10, 17, 18, 18, 13, 14, 15, 15, 10, 20, 12, 9, 15, 16, 13, 14, 15, 15, 18, 20
β> Now to make histogram graph we need Bins. So, Letβs make bins.
Find Range using dataset
minimum: 7, maximum: 20 ββ> Range: Max - Min = 13
β> Find best number of Bins for your dataset
A common rule: βn bins**,** Where n = 39
β39 = 6.2 So we can use 6 or 7 bins.
β> Find Bin Width
Bin Width: Range/No. of Bins = 1.9 So we can say approx 2.
β> Now on the x-axis write all the bins width where we put Bins like this 7-8, 9-10, 11-12, 13-14, 15-16, 17-18, 19-20.
β> Now count how many values fall in each bin
7β8: {7} β 1
9β10: {9,9,10,10,10,10} β 6
11β12: {11,12,12,12} β 4
13β14: {13,13,13,14,14,14,14} β 7
15β16: {15,15,15,15,15,15,15,16,16} β 9
17β18: {17,17,17,18,18,18,18} β 7
19β20: {19,19,20,20,20} β 5
Finally here is your Histogram ππ»:
Conclusion
These topics and concepts are really helpful to me for Data Visualization and Analyzing them π.
If you know more about any of these topics then definitely tell me, I am ready to deep dive in these topics.
Closing Note
This wraps up Day 2 of my Statistics journey π.
Iβll keep sharing my learnings here and on LinkedIn. Stay tuned for Day 3!
Subscribe to my newsletter
Read articles from Shubham Gupta directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
