Developed by OpenSere (Abhishek Choudhary), this model predicts the risk of heart attacks using structured medical data. It is designed to assist in pre-diagnostic screening and awareness, especially in low-resource environments.

🧠 Model Summary

This model is a gradient boosting classifier trained using the XGBoost algorithm. It uses patient health indicators to predict the probability of heart attack occurrence (binary classification: 0 = No risk, 1 = At risk).

Architecture

Algorithm: XGBoost Classifier
Objective: Binary Classification
Features: 25+ health metrics including Age, Blood Pressure, Cholesterol, Heart Rate, Diabetes, and Lifestyle indicators like Smoking, Obesity, etc.
Labels: Heart Attack Risk (0 or 1)

Characteristics

Handles missing/non-critical fields like Country, Continent, Income, and Patient ID gracefully
Well-suited for deployment in healthcare dashboards or apps
Lightweight model with fast inference (<50ms)

⚙️ Usage

Code Snippet

pythonCopyEditimport joblib
import pandas as pd

# Load the model
model = joblib.load("heart_model_final.pkl")

# Example input data
input_data = pd.DataFrame([{
    "Age": 52,
    "Sex": 1,
    "Cholesterol": 205,
    "Blood Pressure": 135,
    "Heart Rate": 85,
    "Diabetes": 1,
    "Family History": 1,
    "Smoking": 0,
    "Obesity": 1,
    "Alcohol Consumption": 0,
    "Exercise Hours Per Week": 1,
    "Diet": 1,
    "Previous Heart Problems": 0,
    "Medication Use": 1,
    "Stress Level": 3,
    "Sedentary Hours Per Day": 8,
    "BMI": 28.5,
    "Triglycerides": 180,
    "Physical Activity Days Per Week": 3,
    "Sleep Hours Per Day": 6
}])

# Predict
prediction = model.predict(input_data)[0]
print("Heart Attack Risk:", "Yes" if prediction == 1 else "No")

Input Format

Inputs: Pandas DataFrame with structured numeric data (no missing mandatory features)
Outputs: Binary class label (0: No risk, 1: Risk)

Known Issues

Model assumes data is clean and properly formatted.
False positives may occur in edge cases like high stress but no other symptoms.

🧩 System Design

Standalone: Yes, but can be part of a larger diagnostic suite.
Inputs: CSV files, API feeds, or manual form data
Outputs: Class label, probability scores, and explainable SHAP values (optional)
Optional Inputs: Patient ID, Country, Continent, Income, Hemisphere

⚙️ Implementation Requirements

Training Environment

Language: Python 3.10+
Libraries: xgboost, pandas, scikit-learn, joblib
Training Time: ~8 seconds on a laptop (i7, 16GB RAM)
Hardware: No GPU required
Inference: <50ms per sample on CPU

📊 Model Characteristics

Initialization

Trained from scratch using medical data collected from hospital studies.

Stats

Total Features: 20+ core health indicators
Model Size: ~200 KB
XGBoost Trees: 100 estimators, max depth = 5
Latency: Real-time (<0.05s/sample)

Privacy & Optimization

No pruning or quantization
Differential privacy not applied due to anonymized, structured medical data

📚 Data Overview

Training Data

Source: [Custom medical dataset from Iraq hospital and online sources]
Size: ~500 patients
Features engineered from structured CSV
Preprocessing: Categorical encoding, normalization, NA handling

Demographics

Data represents a diverse sample from Middle Eastern populations
Gender-balanced
Adults aged 20–85

Evaluation Data

80/20 Train/Test split
Validation using stratified 5-fold cross-validation
Metrics: Accuracy, Precision, Recall, F1 Score, ROC-AUC

✅ Evaluation Results

Summary

Metric	Score
Accuracy	94.6%
Precision	92.1%
Recall	96.3%
F1 Score	94.1%
ROC-AUC	0.98

Subgroup Performance

Better recall for older age groups (≥50)
Slightly lower precision for diabetic patients due to overlap in symptoms

Fairness

No explicit bias mitigation applied, but training was stratified
No sensitive personal identifiers in the dataset

⚠️ Usage Limitations

Not a substitute for a certified medical diagnosis
Results may be affected by inaccurate or missing user input
Should be validated against local population data before clinical use

🔐 Ethics

Designed to empower awareness, not replace doctors
Dataset was anonymized and used for educational, non-commercial purposes
Users are reminded that clinical intervention must follow expert review

🧪 Research Studies and Evaluation Results

Prediction of Cardiovascular Disease Using XGBoost
This study constructs an XGBoost model for cardiovascular disease prediction, demonstrating its effectiveness compared to other algorithms.
🔗 Nature Scientific Reports
Enhanced Heart Attack Prediction Using eXtreme Gradient Boosting
A research paper proposing a novel approach leveraging XGBoost for heart attack analysis and prediction, highlighting its superior performance in handling medical data complexities.
🔗 Journal of Theoretical and Practical Engineering Science
Estimation of Risk Factors Related to Heart Attack With XGBoost
This publication discusses the successful classification of heart attack datasets using XGBoost, emphasizing the model's accuracy in identifying associated risk factors.
🔗 ResearchGate Publication

🧠 Additional Resources

Prediction on UCI Heart Disease Dataset Using XGBoost
A project that applies XGBoost to the UCI Heart Disease dataset, providing insights into model implementation and performance.
🔗 GitHub Repository
How XGBoost Can Save Your Heart with 94% Accuracy
An article detailing the application of XGBoost in heart disease prediction, achieving high accuracy and discussing the model's advantages.
🔗 Medium Article

These sources provide comprehensive information on datasets, model architectures, evaluation metrics, and real-world applications relevant to heart attack prediction using machine learning techniques like XGBoost. They can serve as valuable references for the development, validation, and documentation of your model.

❤️ Cardiovascular Risk Assessment and Heart Attack Prediction System