Wine Classification Model

Table of contents
- Multi Classification Wine Model with Random Forest.
- Modules Needed
- Loading Dataset
- Changing Quality Values between 3-9 to only 3 Quality Classes 0-1-2
- Dropping NA Values in the Dataset
- Transforming String Values to Numeric Values
- Checking Each Quality Class Has the Same Number of Rows
- Preprocessing Data
- Keras Nearest Neighboor Model
- Naive Bayes Model
- SVC Model
- Random Forest Model
- Stochastic Gradient Descent
- Check-it out

Multi Classification Wine Model with Random Forest.
The Model predicts if a Wine is Regular, Good or Excellent by its levels of alcohol, pH, sulphates, citrics, etc.
The different Machine Learning Algorithms that were used for the Wine Dataset were:
Keras Nearest Neighboor
Naive Bayes
SVC
Random Forest
Stochastic Gradient Descent
The Random Forest Model has the best result, with a 70% Accuracy of the three different Wine Classes.
Modules Needed
import pandas as pd
import sklearn
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
Loading Dataset
dataset = pd.read_csv("./data/winequalityN.csv")
Changing Quality Values between 3-9 to only 3 Quality Classes 0-1-2
dataset.quality = dataset.quality.replace({3: 0, 4: 0, 5: 0, 6: 1, 7: 2, 8: 2, 9: 2})
Dropping NA Values in the Dataset
dataset = dataset.dropna()
Transforming String Values to Numeric Values
dataset.type = dataset.type.replace({"white": 1, "red" : 0})
Checking Each Quality Class Has the Same Number of Rows
df_0 = dataset[dataset['quality']==0]
df_1 = dataset[dataset['quality']==1]
df_2 = dataset[dataset['quality']==2]
df_0 = df_0.sample(1250)
df_1 = df_1.sample(1250)
df_2 = df_2.sample(1250)
dataset = pd.concat([df_0, df_1, df_2])
Preprocessing Data
X = dataset.iloc[:, 0:-1]
y = dataset.iloc[:, -1]
X_Train, X_Test, Y_Train, Y_Test = train_test_split(X, y, test_size=0.2, random_state=100, stratify=y)
numerical_features = X.select_dtypes(include=['float64', 'int64'])
numerical_columns = numerical_features.columns
ct = ColumnTransformer([("only numeric", StandardScaler(), numerical_columns)], remainder='passthrough')
X_Train = ct.fit_transform(X_Train)
X_Test = ct.transform(X_Test)
Keras Nearest Neighboor Model
from sklearn.neighbors import KNeighborsClassifier
knn_model = KNeighborsClassifier(n_neighbors=3)
knn_model.fit(X_Train, Y_Train)
y_predicted = knn_model.predict(X_Test)
print(sklearn.metrics.classification_report(Y_Test, y_predicted))
Naive Bayes Model
from sklearn.naive_bayes import GaussianNB
nb_model = GaussianNB()
nb_model = nb_model.fit(X_Train, Y_Train)
y_pred = nb_model.predict(X_Test)
print(sklearn.metrics.classification_report(Y_Test, y_pred))
SVC Model
from sklearn.svm import SVC
svm_model = SVC()
svm_model = svm_model.fit(X_Train, Y_Train)
y_pred = svm_model.predict(X_Test)
print(sklearn.metrics.classification_report(Y_Test, y_pred))
Random Forest Model
from sklearn.ensemble import RandomForestClassifier
random_forest = RandomForestClassifier()
random_forest.fit(X_Train, Y_Train)
y_pred = random_forest.predict(X_Test)
print(sklearn.metrics.classification_report(Y_Test, y_pred))
Stochastic Gradient Descent
from sklearn.linear_model import SGDClassifier
sgd = SGDClassifier()
sgd.fit(X_Train, Y_Train)
pred_sgd = sgd.predict(X_Test)
print(sklearn.metrics.classification_report(Y_Test, y_pred))
The Random Forest Model has the best result, with a 70% Accuracy of the three different Wine Classes.
Check-it out
Test the Model yourself by running the main.py
file, built with Streamlit
.
streamlit run main.py
Resources:
Subscribe to my newsletter
Read articles from Luis Jose Mendez directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Luis Jose Mendez
Luis Jose Mendez
Hello! My name is Luis Jose, a Current Student at Bicentenaria Aragua University, Venezuela, purchasing a Systems Engineer Degree with Specialization in Artificial Intelligence. Apassionate in Machine Learning, Deep Learning and Computer Vision.