No More ‘Killed’: The Smart Way to Download Kaggle Datasets Without Crashing


Tired of Kaggle datasets crashing mid-download?
You've tried cloud notebooks, open data libraries, even Google Colab — but when that cell finally finishes running, boom — “KILLED”.
Sound familiar?
I’ve been there. Downloading large datasets like those from the AMEX Default Prediction competition can be a pain — especially when they’re 10+ GB and your system just taps out.
So I built a lightweight script to help you download Kaggle datasets smoothly, with:
Proper progress bars (thanks
tqdm
)Automatic extraction
Minimal dependencies
Zero cloud notebook drama
Let’s get you set up.
Here is smart_download.py
prerequisites:
Python environment (venv preferred)
Install
requests
andtqdm
:pip install requests tqdm
source-code:
The actual script is in smart_download.py
. Don't forget to replace the URL with the required dataset in the source script.
import requests
from pathlib import Path
from tqdm import tqdm
import zipfile
import os
import json
# Where to save data
DATA_DIR = Path("data")
DATA_DIR.mkdir(exist_ok=True)
# Load your Kaggle credentials
with open(Path.home() / ".kaggle/kaggle.json") as f:
creds = json.load(f)
KAGGLE_USERNAME = creds["username"]
KAGGLE_KEY = creds["key"]
def download_file(file_name):
url = f"https://www.kaggle.com/api/v1/competitions/data/download/amex-default-prediction/{file_name}"
auth = (KAGGLE_USERNAME, KAGGLE_KEY)
headers = {"User-Agent": "Mozilla/5.0"}
zip_path = DATA_DIR / f"{file_name}.zip"
print(f"⬇️ Downloading {file_name}...")
with requests.get(url, stream=True, auth=auth, headers=headers) as r:
r.raise_for_status()
total_size = int(r.headers.get("content-length", 0))
with open(zip_path, "wb") as f:
for chunk in tqdm(r.iter_content(chunk_size=1024 * 1024), # 1MB chunks
total=total_size // (1024 * 1024),
unit="MB"):
f.write(chunk)
print(f"✅ Downloaded {file_name}. Extracting...")
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
zip_ref.extractall(DATA_DIR)
os.remove(zip_path)
print(f"🧹 Cleaned up {file_name}.zip")
# Files to grab
files = ["train_data.csv", "train_labels.csv", "sample_submission.csv"]
for f in files:
download_file(f)
final-run :
bash command to script
python smart_download.py
You'll see progress bars for each file, followed by extraction and cleanup. That’s it.
✅ That’s a Wrap!
This should save you hours of failed attempts and hair-pulling.
If it helped, consider giving it a ❤️ .
Subscribe to my newsletter
Read articles from Yalla Balaji directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
