The COVID-19 pandemic has had a profound and far-reaching impact on every aspect of life, particularly in a diverse and densely populated country like India. As an aspiring data analyst, I wanted to take a data-driven approach to understand better the scale, spread, and progression of the pandemic across the nation.

In this project, I designed and developed a comprehensive COVID-19 India Dashboard using PostgreSQL, Excel, and Tableau. The goal was to transform raw, unstructured data into an interactive and insightful visual tool that could communicate key trends related to confirmed cases, recoveries, deaths, and vaccination distributions across different states and age groups.

This blog outlines my complete workflow — from data sourcing and cleaning to visualization design decisions — while highlighting the insights uncovered along the way. Whether you’re a fellow data practitioner, a student, or someone interested in data storytelling, I hope this project offers both value and inspiration.

Project Overview

The primary objective of this project was to build an end-to-end data analytics solution that transforms raw COVID-19 data into an interactive and accessible dashboard for exploring the pandemic’s impact in India. By combining structured data processing with visual storytelling, the dashboard was designed to provide meaningful insights into the following key areas:

Confirmed Cases: Trend analysis and distribution by state over time.
Recoveries & Deaths: Comparative view of recovery rates and mortality.
Vaccination Data: Age group-wise vaccination coverage and progress.
Geographic Patterns: Regional disparities in infection and recovery rates.

This project was also an opportunity to practice and showcase a full data pipeline — covering data ingestion, transformation, and visualization — using industry-relevant tools. In particular, I focused on using:

PostgreSQL for cleaning, filtering, and querying the dataset,
Excel for preliminary formatting and manual corrections, where required, and
Tableau to design a clean, interactive dashboard that emphasizes usability and insight clarity.

🗂️ Data Extraction: Sourcing COVID-19 Data from Kaggle

The dataset for this project was sourced from Kaggle, a widely used platform for real-world datasets and data science competitions. I selected a publicly available COVID-19 India dataset that included state-wise statistics on confirmed cases, recoveries, deaths, and vaccination counts. The dataset provided daily updates and included age group–level granularity for vaccination metrics.

The data was downloaded in CSV format and included multiple files such as:

covid_19_india.csv – Daily updates for each state
covid_vaccine_statewise.csv – Vaccination details, including age categories
StatewiseTestingDetails.csv – Details of statewise testing

Kaggle Link: Dataset

This raw dataset served as the foundation for all subsequent cleaning, transformation, and visualization steps.

🧾 Preliminary Cleaning in Excel

Before importing the data into PostgreSQL, I performed a round of initial preprocessing in Microsoft Excel. This step helped with quick fixes and structural consistency that would later support smooth querying and joins in SQL.

A few steps that were taken in Excel alongside the problem they solved:

Date Field Standardization Problem:
The initial dataset contained date entries in inconsistent formats. Some dates (e.g., 31/01/2021) were in ‘General’ format, while others (e.g., 1/2/2021) were already recognized as Excel ‘Date’ format. This inconsistency prevented uniform processing and direct import into PostgreSQL, which requires a yyyy-mm-dd format.

Solution:

Text to Columns for Initial Date Correction:

The column containing the dates was selected.
The “Text to Columns” feature was utilized with the “Delimited” option.
During the Text to Columns wizard, the “Date” data format was explicitly selected, and the appropriate date order (e.g., DMY for DD/MM/YYYY) was specified. Excel automatically reinterpreted and standardized the dates to its internal date format.
Converting to PostgreSQL Compatible Format (yyyy-mm-dd):

A new column was inserted adjacent to the standardized date column.
The Excel TEXT function was used to convert the dates into the desired yyyy-mm-dd string format. The formula applied was: \=TEXT(cell_reference, “yyyy-mm-dd”)
The values from this newly formatted column were then copied.
A third, new column was created. The copied values were pasted into this column using the “Paste Values” option. This step is crucial to remove formula dependencies and prevent #REF! errors when original columns are deleted.
The original date column and the intermediate TEXT formula column were subsequently deleted, leaving only the yyyy-mm-dd formatted date column ready for import.

2. Handling Blank Values in Numeric Columns Problem:
During the import process for the Statewise_Testing_details table, the Negative and Positive columns were configured in PostgreSQL to disallow blank values. The Excel source data, however, contained empty cells in these columns.

Solution:

For each of the Negative and Positive columns, a new helper column was introduced.
The following Excel formula was applied to replace blank cells with a 0 (zero): \=IF(ISBLANK(cell_reference),0,cell_reference)
Similar to the date formatting, the values from these helper columns were copied and then “Paste Values” into the original Negative and Positive columns (or new columns replacing them), effectively replacing blanks with zeros while removing formula dependencies. This ensures that the numeric columns meet the NOT NULL constraint in the PostgreSQL table definition.
Following these steps, the Excel data is properly formatted and cleaned, making it ready for a successful import into the PostgreSQL database.

While Excel is not ideal for large-scale data processing, it is incredibly effective for quick data inspection and ensuring schema consistency before importing into a relational database.

🛢️ Data Cleaning and Transformation in PostgreSQL

After preparing the raw data in Excel, I imported the cleaned CSV files into PostgreSQL for more rigorous data transformation and querying. PostgreSQL served as the backend engine to:

Handle large datasets efficiently
Run complex query between tables (e.g., matching vaccination stats with state-level case data)
Implement window functions for rolling averages and cumulative counts
Filter and structure data for Tableau exports (e.g., state-wise daily cases, vaccination by age group, etc.)

I created multiple custom views and CTEs (Common Table Expressions) to segment the data meaningfully before exporting final result sets to Tableau.

This SQL layer ensured that the data visualized in Tableau was accurate, well-structured, and optimized for performance.

🧩 Dashboard Planning: Structuring the Layout with draw.io

Before jumping into Tableau, I invested time in designing the dashboard layout using draw.io. Planning the structure visually beforehand helped clarify the narrative flow, avoid clutter, and ensure a logical arrangement of insights for the end user.

Using draw.io, I created a simple wireframe that defined:

Top-level KPIs: Total confirmed cases, recoveries, deaths, and vaccinations
Sectional Layouts: Time-series analysis at the top, followed by maps and comparison charts
User Interactions: Dropdown filters for state, date range, and age group

This early sketch acted as a blueprint during dashboard development and minimized rework. It also made it easier to maintain consistency and clarity across all visuals.

📊 Building the Dashboard in Tableau

With clean and well-structured data exported from PostgreSQL, I transitioned into Tableau to develop the final interactive dashboard. Tableau’s drag-and-drop interface, combined with calculated fields and parameter controls, allowed for the seamless creation of engaging visual stories.

Key visualizations implemented:

Line Charts to show trends in confirmed, recovered, and death cases over time
Symbol Maps with circle size encoding to case fatality rate per state
Bar Charts to represent the test positivity rate by state
Filters and Drop-down Menus for dynamic exploration by state, time period

I focused heavily on:

Clarity: Avoiding visual overload by limiting chart types per section
Color Consistency: Using distinguishable but non-intrusive color palettes
Responsiveness: Ensuring the layout works across screen sizes (especially for Tableau Public)

The result was an interactive, informative dashboard that not only displays raw data but also tells a compelling visual story about the pandemic’s trajectory in India.

🔍 Key Insights & Observations

The final dashboard offered a range of valuable insights into the progression and distribution of COVID-19 across India. Some of the most notable observations include:

State-wise Disparities: Maharashtra, Kerala, and Karnataka consistently showed higher case counts, with Maharashtra leading in both confirmed cases and fatalities during peak waves.
Vaccination Trends by Age Group: The 45–60 age group accounted for the largest share of vaccinations, reflecting the rollout strategy.
Recovery Rate Improvements: Over time, a noticeable increase in recovery rates was observed in most states, especially during the later stages of the second wave.
Temporal Spikes: The dashboard clearly captured the sharp surge during India’s second wave in mid-2021, providing an impactful visual of how rapidly the crisis escalated.
Underreported Deaths: In some cases, the mismatch between case volume and reported deaths raised questions about underreporting and data reliability in certain regions.

These insights not only supported a deeper understanding of the pandemic’s impact but also showcased how visual analytics can surface trends that raw data alone might obscure.

⚙️ Challenges Faced & Lessons Learned

This project presented a number of challenges, each of which became a learning opportunity:

Data Inconsistencies: Public datasets often came with missing values, inconsistent date formats, and mismatched column headers. Cleaning this data taught me the importance of building robust preprocessing pipelines.
Schema Changes: The dataset structure had occasional updates, requiring schema adjustments and refreshes in SQL and Tableau. This helped me learn how to make data workflows more modular and adaptable.
Color & Layout Decisions: Striking a balance between aesthetics and usability in Tableau took iteration. Planning the layout in draw.io before building helped streamline this process.
SQL Optimization: Working with daily time-series data across multiple dimensions pushed me to use window functions, CTEs, and aggregated views effectively.

Overall, this project enhanced both my technical and design skills across the data analytics lifecycle. It also reinforced the value of planning, documentation, and iterative development in delivering impactful data projects.

✅ Conclusion & Next Steps

This project was a deep dive into the end-to-end process of turning raw public health data into an actionable, interactive dashboard. From sourcing and cleaning data using PostgreSQL and Excel to building a thoughtfully designed visualization in Tableau, the project sharpened my skills in data engineering, storytelling, and user-centric dashboard development.

Beyond technical growth, it reinforced the importance of visual communication in making complex data more accessible and impactful , especially in times of public health crisis.

📣 Let’s Connect

If you found this project insightful or have feedback, I’d love to hear from you! I’m constantly learning and exploring new areas in data science.

🔗 GitHub Repository
📊 View Dashboard on Tableau Public
💬 Feel free to connect with me on LinkedIn or follow my learning journey on Twitter.

Thank you for reading 🙌

Building a COVID-19 Analytics Dashboard from Scratch