Predict Bitcoin Prices: LSTM & SQL Approach

Problem Statement

Cryptocurrency prices are highly volatile and influenced by various unpredictable factors such as market sentiment, economic policies, and global events.
For traders and investors, being able to predict short-term price movements can help in making informed decisions and reducing risk exposure.

The goal of this project is to:

Build a data-driven system that predicts the next closing price of Bitcoin.
Use historical time-series data stored in a SQL database.
Automate the end-to-end pipeline — from fetching data, preprocessing, training an LSTM model, and making predictions.

🛠 Tech Stack

Python
TensorFlow / Keras
Pandas, NumPy, Scikit-learn
SQLAlchemy (Database Connection)
MinMaxScaler (Feature Scaling)
ModelCheckpoint (Best Model Saving)
Custom Logging (Error Tracking)

🔍 Workflow with Code Snippets

1️⃣ Fetching Data from SQL

We connect to the database using SQLAlchemy and fetch Bitcoin price history.

# data_fetcher.py
query = text("""
    SELECT timestamp, current_price
    FROM CryptoMarketData
    WHERE coin_id = :coin_name
    ORDER BY timestamp ASC
""")

result = self.session.execute(query, {'coin_name': coin_name})
df = pd.DataFrame(result.fetchall(), columns=["timestamp", "current_price"])
df['timestamp'] = pd.to_datetime(df['timestamp'])

💡 Explanation:
This query retrieves timestamp and current_price for the given coin from the CryptoMarketData table, ordered chronologically.

2️⃣ Data Preprocessing for LSTM

Before feeding the data into the model, we scale and sequence it.

# data_preprocessor.py
scaled = self.scaler.fit_transform(df[['current_price']])

# Create sequences of length `time_steps`
for i in range(self.time_steps, len(scaled)):
    X.append(scaled[i - self.time_steps:i])
    y.append(scaled[i])

💡 Explanation:

Scaling keeps all values between 0 and 1 for stable training.
Sequences are chunks of historical prices used to predict the next price.

3️⃣ Building the LSTM Model

Our LSTM has two stacked LSTM layers with dropout for regularization.

# model_builder.py
model = Sequential()
model.add(LSTM(64, return_sequences=True, input_shape=self.input_shape))
model.add(Dropout(0.2))
model.add(LSTM(64))
model.add(Dropout(0.2))
model.add(Dense(1))

💡 Explanation:

First LSTM layer outputs sequences to feed into the second LSTM layer.
Dropout prevents overfitting.
Dense(1) outputs a single predicted price value.

4️⃣ Training the Model

ModelCheckpoint ensures the best version of the LSTM model is saved during training by monitoring the loss and only updating the file when performance improves. This prevents overfitting from later epochs, saves time by avoiding retraining, and guarantees that the most accurate model is ready for predictions.

# trainer.py
callbacks = [
    ModelCheckpoint(filepath=model_path, save_best_only=True, monitor='loss', verbose=1)
]

history = self.model.fit(
    X_train, y_train,
    epochs=self.epochs,
    batch_size=self.batch_size,
    validation_data=(X_val, y_val),
    callbacks=callbacks
)

filepath=model_path → Tells Keras where to save the .h5 model file.
save_best_only=True → Avoids overwriting with worse versions.
monitor='loss' → Saves the model when the training loss decreases (you could also monitor validation loss).
verbose=1 → Prints a message whenever the model is saved.

5️⃣ Making Predictions

We load the trained model and predict the next price.

# predictor.py
last_sequence = scaled_data[-Config.TIME_STEPS:]
input_data = np.expand_dims(last_sequence, axis=0)
predicted_scaled = self.model.predict(input_data)
predicted_price = self.preprocessor.inverse_scale(predicted_scaled)[0][0]

💡 Explanation:

We take the last TIME_STEPS prices, scale them, and feed them into the model.
Output is inverse-scaled back to the actual price.

6️⃣ End-to-End Pipeline

We connect all the components into one smooth execution.

# pipeline.py
fetcher = CryptoDataFetcher()
df = fetcher.fetch_coin_data(coin_name)

preprocessor = DataPreprocessor()
scaled_data = preprocessor.scale_data(df)
X, y = preprocessor.create_sequences(scaled_data)
X_train, y_train, X_test, y_test = preprocessor.train_test_split(X, y)

builder = LSTMModelBuilder(input_shape=(X_train.shape[1], X_train.shape[2]))
model = builder.build_model()

trainer = ModelTrainer(model, coin_name)
trainer.train(X_train, y_train, X_val=X_test, y_val=y_test)

predictor = CryptoPricePredictor(coin_name)
print(f"✅ Predicted Next Price: ₹{predictor.predict_next_price(df):.2f}")

How Data Flows Through My Bitcoin Price Prediction System

Inside the Model: Architecture and Design Choices

Why the Model is Built This Way

We use 60 timesteps because it’s like giving the model a 2-month “memory” of Bitcoin’s price. That’s enough to catch meaningful trends without drowning it in too much history or slowing training.

Each LSTM layer has 64 units — a good middle ground where the model is smart enough to learn complex patterns but still trains quickly.

We add a 20% dropout so the model doesn’t get too “attached” to specific neurons. It’s like making it work with slightly different teammates each round, which helps it generalize better.

The final Dense layer has 1 neuron because we just want one thing: the next predicted price.

For training, Mean Squared Error (MSE) is perfect here — it punishes big mistakes more than small ones, which keeps predictions accurate.

And we use the Adam optimizer because it learns fast, adapts well to different data patterns, and works great for messy, unpredictable crypto price data.

Results

The LSTM model was trained to predict the next minute’s price for a selected cryptocurrency. Below is an example where Bitcoin’s predicted price is ₹113,951.23.

Key Takeaways

Built a modular end-to-end LSTM pipeline for minute-level price prediction.
Used SQL as the data source, ensuring the model trains on real historical data.
Created separate models for each coin so predictions remain coin-specific.
Integrated an interactive dashboard for quick, user-friendly predictions.

Future Enhancements

Current implementation focuses on learning LSTM concepts with minute-level predictions.
The pipeline is flexible for future enhancements.
Integrate MLflow to:
- Track experiments
- Manage model versions
- Streamline retraining as new data is added
Expand the model to per-hour predictions for more strategic and less noisy forecasts.
Implement a multi-coin single model to avoid retraining for every cryptocurrency.

End Note

This project started as a mini-experiment to understand how LSTM models handle time-series data — and quickly became a fully functional next-minute crypto price predictor. By combining real-time market data from CoinGecko, a clean preprocessing pipeline, and a modular LSTM architecture, we built something both educational and practical.

The real win here isn’t just the predictions — it’s the foundation we’ve created. With planned enhancements like MLflow integration, per-hour forecasting, and real-time deployment, this project can easily grow into a more robust and production-ready system.

For now, it stands as a hands-on learning milestone and a great reminder that even small projects can teach big lessons.

Here is the link:

LSTM Bitcoin price prediction

Real-Time Bitcoin Price Prediction Using LSTM & SQL Integration

Problem Statement

🛠 Tech Stack

🔍 Workflow with Code Snippets

1️⃣ Fetching Data from SQL

2️⃣ Data Preprocessing for LSTM

3️⃣ Building the LSTM Model

4️⃣ Training the Model

5️⃣ Making Predictions

6️⃣ End-to-End Pipeline

How Data Flows Through My Bitcoin Price Prediction System

Inside the Model: Architecture and Design Choices

Why the Model is Built This Way

Results

Key Takeaways

Future Enhancements

End Note

Subscribe to my newsletter

Nilanjan Sarkar

Nilanjan Sarkar