Wine Classification Model


Multi Classification Wine Model with Random Forest.

The Model predicts if a Wine is Regular, Good or Excellent by its levels of alcohol, pH, sulphates, citrics, etc.

The different Machine Learning Algorithms that were used for the Wine Dataset were:

  • Keras Nearest Neighboor

  • Naive Bayes

  • SVC

  • Random Forest

  • Stochastic Gradient Descent

The Random Forest Model has the best result, with a 70% Accuracy of the three different Wine Classes.

Modules Needed

import pandas as pd
import sklearn
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

Loading Dataset

dataset = pd.read_csv("./data/winequalityN.csv")

Changing Quality Values between 3-9 to only 3 Quality Classes 0-1-2

dataset.quality = dataset.quality.replace({3: 0, 4: 0, 5: 0, 6: 1, 7: 2, 8: 2, 9: 2})

Dropping NA Values in the Dataset

dataset = dataset.dropna()

Transforming String Values to Numeric Values

dataset.type = dataset.type.replace({"white": 1, "red" : 0})

Checking Each Quality Class Has the Same Number of Rows

df_0 = dataset[dataset['quality']==0]
df_1 = dataset[dataset['quality']==1]
df_2 = dataset[dataset['quality']==2]

df_0 = df_0.sample(1250)
df_1 = df_1.sample(1250)
df_2 = df_2.sample(1250)

dataset = pd.concat([df_0, df_1, df_2])

Preprocessing Data

X = dataset.iloc[:, 0:-1]
y = dataset.iloc[:, -1]
X_Train, X_Test, Y_Train, Y_Test = train_test_split(X, y, test_size=0.2, random_state=100, stratify=y)

numerical_features = X.select_dtypes(include=['float64', 'int64'])

numerical_columns = numerical_features.columns

ct = ColumnTransformer([("only numeric", StandardScaler(), numerical_columns)], remainder='passthrough')

X_Train = ct.fit_transform(X_Train)
X_Test = ct.transform(X_Test)

Keras Nearest Neighboor Model

from sklearn.neighbors import KNeighborsClassifier
knn_model = KNeighborsClassifier(n_neighbors=3)

knn_model.fit(X_Train, Y_Train)
y_predicted = knn_model.predict(X_Test)
print(sklearn.metrics.classification_report(Y_Test, y_predicted))

Naive Bayes Model

from sklearn.naive_bayes import GaussianNB
nb_model = GaussianNB()
nb_model = nb_model.fit(X_Train, Y_Train)
y_pred = nb_model.predict(X_Test)
print(sklearn.metrics.classification_report(Y_Test, y_pred))

SVC Model

from sklearn.svm import SVC
svm_model = SVC()
svm_model = svm_model.fit(X_Train, Y_Train)
y_pred = svm_model.predict(X_Test)
print(sklearn.metrics.classification_report(Y_Test, y_pred))

Random Forest Model

from sklearn.ensemble import RandomForestClassifier
random_forest = RandomForestClassifier()

random_forest.fit(X_Train, Y_Train)
y_pred = random_forest.predict(X_Test)

print(sklearn.metrics.classification_report(Y_Test, y_pred))

Stochastic Gradient Descent

from sklearn.linear_model import SGDClassifier
sgd = SGDClassifier()
sgd.fit(X_Train, Y_Train)
pred_sgd = sgd.predict(X_Test)

print(sklearn.metrics.classification_report(Y_Test, y_pred))

The Random Forest Model has the best result, with a 70% Accuracy of the three different Wine Classes.

Check-it out

Test the Model yourself by running the main.py file, built with Streamlit.

streamlit run main.py

Resources:

0
Subscribe to my newsletter

Read articles from Luis Jose Mendez directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Luis Jose Mendez
Luis Jose Mendez

Hello! My name is Luis Jose, a Current Student at Bicentenaria Aragua University, Venezuela, purchasing a Systems Engineer Degree with Specialization in Artificial Intelligence. Apassionate in Machine Learning, Deep Learning and Computer Vision.