โค๏ธ Cardiovascular Risk Assessment and Heart Attack Prediction System

Developed by OpenSere (Abhishek Choudhary), this model predicts the risk of heart attacks using structured medical data. It is designed to assist in pre-diagnostic screening and awareness, especially in low-resource environments.

๐Ÿง  Model Summary

This model is a gradient boosting classifier trained using the XGBoost algorithm. It uses patient health indicators to predict the probability of heart attack occurrence (binary classification: 0 = No risk, 1 = At risk).

Architecture

  • Algorithm: XGBoost Classifier

  • Objective: Binary Classification

  • Features: 25+ health metrics including Age, Blood Pressure, Cholesterol, Heart Rate, Diabetes, and Lifestyle indicators like Smoking, Obesity, etc.

  • Labels: Heart Attack Risk (0 or 1)

Characteristics

  • Handles missing/non-critical fields like Country, Continent, Income, and Patient ID gracefully

  • Well-suited for deployment in healthcare dashboards or apps

  • Lightweight model with fast inference (<50ms)


โš™๏ธ Usage

Code Snippet

pythonCopyEditimport joblib
import pandas as pd

# Load the model
model = joblib.load("heart_model_final.pkl")

# Example input data
input_data = pd.DataFrame([{
    "Age": 52,
    "Sex": 1,
    "Cholesterol": 205,
    "Blood Pressure": 135,
    "Heart Rate": 85,
    "Diabetes": 1,
    "Family History": 1,
    "Smoking": 0,
    "Obesity": 1,
    "Alcohol Consumption": 0,
    "Exercise Hours Per Week": 1,
    "Diet": 1,
    "Previous Heart Problems": 0,
    "Medication Use": 1,
    "Stress Level": 3,
    "Sedentary Hours Per Day": 8,
    "BMI": 28.5,
    "Triglycerides": 180,
    "Physical Activity Days Per Week": 3,
    "Sleep Hours Per Day": 6
}])

# Predict
prediction = model.predict(input_data)[0]
print("Heart Attack Risk:", "Yes" if prediction == 1 else "No")

Input Format

  • Inputs: Pandas DataFrame with structured numeric data (no missing mandatory features)

  • Outputs: Binary class label (0: No risk, 1: Risk)

Known Issues

  • Model assumes data is clean and properly formatted.

  • False positives may occur in edge cases like high stress but no other symptoms.


๐Ÿงฉ System Design

  • Standalone: Yes, but can be part of a larger diagnostic suite.

  • Inputs: CSV files, API feeds, or manual form data

  • Outputs: Class label, probability scores, and explainable SHAP values (optional)

  • Optional Inputs: Patient ID, Country, Continent, Income, Hemisphere


โš™๏ธ Implementation Requirements

Training Environment

  • Language: Python 3.10+

  • Libraries: xgboost, pandas, scikit-learn, joblib

  • Training Time: ~8 seconds on a laptop (i7, 16GB RAM)

  • Hardware: No GPU required

  • Inference: <50ms per sample on CPU


๐Ÿ“Š Model Characteristics

Initialization

  • Trained from scratch using medical data collected from hospital studies.

Stats

  • Total Features: 20+ core health indicators

  • Model Size: ~200 KB

  • XGBoost Trees: 100 estimators, max depth = 5

  • Latency: Real-time (<0.05s/sample)

Privacy & Optimization

  • No pruning or quantization

  • Differential privacy not applied due to anonymized, structured medical data


๐Ÿ“š Data Overview

Training Data

  • Source: [Custom medical dataset from Iraq hospital and online sources]

  • Size: ~500 patients

  • Features engineered from structured CSV

  • Preprocessing: Categorical encoding, normalization, NA handling

Demographics

  • Data represents a diverse sample from Middle Eastern populations

  • Gender-balanced

  • Adults aged 20โ€“85

Evaluation Data

  • 80/20 Train/Test split

  • Validation using stratified 5-fold cross-validation

  • Metrics: Accuracy, Precision, Recall, F1 Score, ROC-AUC


โœ… Evaluation Results

Summary

MetricScore
Accuracy94.6%
Precision92.1%
Recall96.3%
F1 Score94.1%
ROC-AUC0.98

Subgroup Performance

  • Better recall for older age groups (โ‰ฅ50)

  • Slightly lower precision for diabetic patients due to overlap in symptoms

Fairness

  • No explicit bias mitigation applied, but training was stratified

  • No sensitive personal identifiers in the dataset


โš ๏ธ Usage Limitations

  • Not a substitute for a certified medical diagnosis

  • Results may be affected by inaccurate or missing user input

  • Should be validated against local population data before clinical use


๐Ÿ” Ethics

  • Designed to empower awareness, not replace doctors

  • Dataset was anonymized and used for educational, non-commercial purposes

  • Users are reminded that clinical intervention must follow expert review


๐Ÿงช Research Studies and Evaluation Results

  1. Prediction of Cardiovascular Disease Using XGBoost
    This study constructs an XGBoost model for cardiovascular disease prediction, demonstrating its effectiveness compared to other algorithms.
    ๐Ÿ”— Nature Scientific Reports

  2. Enhanced Heart Attack Prediction Using eXtreme Gradient Boosting
    A research paper proposing a novel approach leveraging XGBoost for heart attack analysis and prediction, highlighting its superior performance in handling medical data complexities.
    ๐Ÿ”— Journal of Theoretical and Practical Engineering Science

  3. Estimation of Risk Factors Related to Heart Attack With XGBoost
    This publication discusses the successful classification of heart attack datasets using XGBoost, emphasizing the model's accuracy in identifying associated risk factors.
    ๐Ÿ”— ResearchGate Publication


๐Ÿง  Additional Resources

  1. Prediction on UCI Heart Disease Dataset Using XGBoost
    A project that applies XGBoost to the UCI Heart Disease dataset, providing insights into model implementation and performance.
    ๐Ÿ”— GitHub Repository

  2. How XGBoost Can Save Your Heart with 94% Accuracy
    An article detailing the application of XGBoost in heart disease prediction, achieving high accuracy and discussing the model's advantages.
    ๐Ÿ”— Medium Article


These sources provide comprehensive information on datasets, model architectures, evaluation metrics, and real-world applications relevant to heart attack prediction using machine learning techniques like XGBoost. They can serve as valuable references for the development, validation, and documentation of your model.

0
Subscribe to my newsletter

Read articles from Abhishek Choudhary directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Abhishek Choudhary
Abhishek Choudhary

Iโ€™m Abhishek Choudhary, a Jaipur-based photographer, developer, and human behavior observer with a passion for building thoughtful digital experiences. With a multidisciplinary background, I bring together technology, creativity, and market insight to develop tools, content, and strategies that solve real problems. Iโ€™ve worked across startups, created multiple mini projects, and received national recognition for innovation in both mental health and visual storytelling. ๐Ÿง  I'm deeply interested in how humans behave, react, and express โ€” and I often channel this into my Python-based tools, emotion-detection systems, and AI projects. ๐Ÿ“ธ As a photographer and story-filmmaker, I craft visuals that speak โ€” combining art, light, and emotion. ๐Ÿ“ˆ On the analytical side, I actively engage in stock market research, tracking trends and behavioral patterns to predict outcomes.