print(result) "Part 5 of NotADev"

IsaIsa
4 min read

Advancing the Model with New Features and Overcoming Challenges

As I continue my journey to enhance my trading bot, I've reached a significant milestone—the integration of new machine learning features into my model. This has been both an exciting and challenging phase, where the focus has been on improving the prediction accuracy of buy and sell signals. In this post, I'll share the details of how I worked with ChatGPT to incorporate Recursive Feature Elimination (RFE), XGBoost, and hyperparameter tuning, and what these changes mean for the bot's performance.

Integrating RFE, XGBoost, and Hyperparameter Tuning

The journey started with a realization: the existing model's feature set wasn't providing enough granularity for accurate trading decisions. To address this, I asked ChatGPT to suggest improvements. It decided to add new features and apply RFE to determine which ones were most important. The goal was simple—let's keep only what truly matters. This led to an extensive feature engineering process that added indicators like Average Directional Index (ADX), Rate of Change (ROC), Momentum (MOM), and many others.

Once these features were ready, ChatGPT used RFE to identify which ones significantly contributed to model performance. This reduced the feature set to the most relevant indicators, which in turn helped improve model training efficiency and reduce overfitting.

Here's a snippet showing how RFE was applied to the model:

from sklearn.feature_selection import RFE
from sklearn.ensemble import RandomForestClassifier

# Initializing the model for feature selection
model_for_rfe = RandomForestClassifier(random_state=42)

# Selecting top 10 features using RFE
rfe = RFE(model_for_rfe, n_features_to_select=10)
X_train_rfe = rfe.fit_transform(X_train, y_train)

Next came XGBoost—a powerful tool in the machine learning arsenal. I asked ChatGPT if it could improve the model further, and it suggested experimenting with XGBoost alongside the traditional Random Forest approach, evaluating which performed better with the trading dataset. It also recommended hyperparameter tuning, using GridSearchCV to test different combinations for the Random Forest model and find the optimal setup. The resulting model showed noticeable improvements in performance metrics, boosting the accuracy of predictions and helping with more informed trading decisions.

Here's an example of how GridSearchCV was used for hyperparameter tuning:

from sklearn.model_selection import GridSearchCV

param_grid = {
    'n_estimators': [100, 200],
    'max_depth': [10, 20],
    'min_samples_split': [2, 5],
}

rf = RandomForestClassifier(random_state=42)
grid_search = GridSearchCV(rf, param_grid, cv=3, n_jobs=-1, verbose=2)
grid_search.fit(X_train_rfe, y_train)

# Best model after hyperparameter tuning
best_rf = grid_search.best_estimator_

Challenges Faced and Lessons Learned

This process wasn't without its challenges. Integrating these new features meant dealing with various compatibility issues between the training script and the prediction logic in the trading script. For example, after training the model with 17 features, I encountered repeated errors when using the model to make predictions in real-time trading.

One of the most frustrating errors was the infamous "feature mismatch" problem. The model expected a specific order and set of features, but the data I was providing during predictions was either incorrectly ordered or incomplete.

To solve this, ChatGPT suggested ensuring strict consistency between the feature set used during training and the real-time feature extraction in the trading script. This was a valuable lesson—the importance of keeping feature engineering consistent across all stages of model development.

Here's how I ensured feature consistency during prediction:

# Extracting features in the same order as during training
features_df = pd.DataFrame([features], columns=FEATURE_COLUMNS)
features_df = features_df.fillna(0)

# Making prediction
prediction_proba = model.predict_proba(features_df)
buy_proba = prediction_proba[0][1]
sell_proba = prediction_proba[0][0]

The Impact on Prediction Accuracy

These changes have had a substantial impact on the bot's performance. The new features have provided more depth for understanding market movements. RFE and XGBoost allowed us to focus on what's truly important while hyperparameter tuning made the model leaner and more effective. In practice, this means the bot is now better at identifying potential opportunities and minimizing false signals—a crucial aspect of successful algorithmic trading.

This stage of development marked a turning point, where the focus shifted from simply making predictions to making accurate, reliable predictions that could translate into profitable trades. While the journey is far from over, I'm excited about the improvements so far and ready to tackle the next challenges that arise.

Stay tuned for the next update, where I'll share how ChatGPT helped me align the trading script to properly consume the upgraded model features and how I managed to make it all work together seamlessly.

pxng0lin.

0
Subscribe to my newsletter

Read articles from Isa directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Isa
Isa

Former analyst with expertise in data, forecasting, and resource modeling, transitioned to cybersecurity over the past 4 years (as of May 2024). Passionate about security and problem-solving, utilising skills in data and analysis, for cybersecurity challenges. Experience: Extensive background in data analytics, forecasting, and predictive modelling. Experience with platforms like Bugcrowd, Intigriti, and HackerOne. Transitioned to Web3 cybersecurity with Immunefi, exploring smart contract vulnerabilities. Spoken languages: English (Native, British), Arabic (Fus-ha)