No More ‘Killed’: The Smart Way to Download Kaggle Datasets Without Crashing

Yalla BalajiYalla Balaji
2 min read

Tired of Kaggle datasets crashing mid-download?
You've tried cloud notebooks, open data libraries, even Google Colab — but when that cell finally finishes running, boom — “KILLED”.

Sound familiar?
I’ve been there. Downloading large datasets like those from the AMEX Default Prediction competition can be a pain — especially when they’re 10+ GB and your system just taps out.

So I built a lightweight script to help you download Kaggle datasets smoothly, with:

  • Proper progress bars (thanks tqdm)

  • Automatic extraction

  • Minimal dependencies

  • Zero cloud notebook drama

Let’s get you set up.

Here is smart_download.py

prerequisites:

  • Python environment (venv preferred)

  • Install requests and tqdm:

  •       pip install requests tqdm
    

source-code:

The actual script is in smart_download.py. Don't forget to replace the URL with the required dataset in the source script.

import requests
from pathlib import Path
from tqdm import tqdm
import zipfile
import os
import json

# Where to save data
DATA_DIR = Path("data")
DATA_DIR.mkdir(exist_ok=True)

# Load your Kaggle credentials
with open(Path.home() / ".kaggle/kaggle.json") as f:
    creds = json.load(f)

KAGGLE_USERNAME = creds["username"]
KAGGLE_KEY = creds["key"]

def download_file(file_name):
    url = f"https://www.kaggle.com/api/v1/competitions/data/download/amex-default-prediction/{file_name}"
    auth = (KAGGLE_USERNAME, KAGGLE_KEY)
    headers = {"User-Agent": "Mozilla/5.0"}
    zip_path = DATA_DIR / f"{file_name}.zip"

    print(f"⬇️  Downloading {file_name}...")

    with requests.get(url, stream=True, auth=auth, headers=headers) as r:
        r.raise_for_status()
        total_size = int(r.headers.get("content-length", 0))
        with open(zip_path, "wb") as f:
            for chunk in tqdm(r.iter_content(chunk_size=1024 * 1024),  # 1MB chunks
                              total=total_size // (1024 * 1024),
                              unit="MB"):
                f.write(chunk)

    print(f"✅ Downloaded {file_name}. Extracting...")

    with zipfile.ZipFile(zip_path, 'r') as zip_ref:
        zip_ref.extractall(DATA_DIR)

    os.remove(zip_path)
    print(f"🧹 Cleaned up {file_name}.zip")

# Files to grab
files = ["train_data.csv", "train_labels.csv", "sample_submission.csv"]
for f in files:
    download_file(f)

final-run :

bash command to script

python smart_download.py

You'll see progress bars for each file, followed by extraction and cleanup. That’s it.

✅ That’s a Wrap!

This should save you hours of failed attempts and hair-pulling.
If it helped, consider giving it a ❤️ .

2
Subscribe to my newsletter

Read articles from Yalla Balaji directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Yalla Balaji
Yalla Balaji