Deep Learning Sentiment Analysis Guide with Python Code

Introduction

When analyzing text sentiment, we often face a deceptively simple question: how do we truly understand what people mean, not just what they say? This challenge becomes particularly evident to us as we process thousands of customer reviews, social media comments, and feedback messages daily.

Why is that so? You'll often notice that today's reviews (often from feedback surveys, ticketing software, or call logs) are complex, filled with sarcasm, mixed emotions, and contextual nuances that conventional sentiment analysis methods fail to capture. Sure, the simpler methods are effective in identifying basic positive or negative sentiments, but they usually fail when confronted with nuanced expressions. Consider a customer writing, "This is exactly what I needed... said no one ever," or "Great product, if you enjoy constant frustration." The complexity multiplies when we need to analyze thousands of such comments (in the form of free-flowing texts) in real time while maintaining reliable accuracy.

To address this problem, we'll be using sentiment analysis and deep learning approaches to address these limitations: understanding context, recognizing sarcasm, and capturing subtle emotional shifts within the text. This guide will walk you through building an advanced sentiment analysis system from scratch. Here's what we'll create:

Build a robust deep learning architecture that understands context
Implement advanced preprocessing
Monitor and maintain model performance

But first, let's examine why traditional sentiment analysis approaches often fall short when dealing with real-world text data.

What Makes Deep Learning Essential for Modern Sentiment Analysis?

Consider this customer review: "The product is absolutely perfect... if you enjoy solving puzzles just to complete basic tasks." Traditional sentiment analysis would likely flag this as positive due to words like "perfect," completely missing the sarcasm—the challenge multiplies when processing thousands of such comments daily.

By implementing these sentiment analysis techniques with deep learning, we're able to address some of the limitations. We're doing this by:

Understanding contextual relationships between words
Capturing long-range dependencies in text
Recognizing semantic contradictions
Learning complex emotional patterns

How Will Our Flow Work?

This is how we're going to be structuring our flow (See Fig 1):

Start by setting up crucial model parameters like vocabulary size (how many unique words to track), sequence length (how much text to process at once), and embedding dimensions (how rich the word representations should be).
Transform raw text into a format our model can understand. This involves:
- Converting words to numbers using a tokenizer
- Padding sequences to uniform length
- Tracking vocabulary statistics for performance monitoring
Create a deep learning architecture with:
- Embedding layer to convert words into rich vector representations
- Bidirectional LSTM layers to capture the context in both directions
- Dense layers for final sentiment classifications.

The model learns to understand complex patterns, sarcasm, and mixed emotions through multiple training epochs.

Continuously monitor model performance using:
- Accuracy metrics
- Loss values
- Validation scores, if our performance isn't satisfactory, adjust model parameters, and retrain.

Fig 1: How our training will look like

Implementing Deep Learning in Sentiment Analysis Step-by-Step

Now that we've seen the workflow, let's go through a step-by-step process of implementing sentiment analysis using deep learning. However, you'll need to note that we're working with a relatively smaller dataset, which may not give us great training/validation scores. You can, however, use this code as your foundation to run experiments on larger datasets.

For the purpose of this blog, we're generating synthetic data using gretel.ai: 1500 rows and two columns: text and its accompanying sentiment score between 0 and 1. You can see this happening in Fig 2.

Fig 2: Generating a synthetic dataset of 1500 rows using Gretel.ai

Fig 3: Prompt sample for generating the synthetic dataset

You may also choose from a wide variety of open-source datasets for this experiment from here. That said, let's begin with installing the necessary packages to train our model.

Setup Your Environment

# Core frameworks
pip install tensorflow>=2.8.0
pip install numpy>=1.19.2
pip install pandas>=1.3.0
pip install scikit-learn>=0.24.2
pip install matplotlib>=3.4.3
pip install seaborn>=0.11.2
pip install tqdm 
pip install nltk

Import Necessary Libraries

First, let's import the necessary libraries. These are the packages we'll be using throughout our project, from handling data to building and evaluating our model.

import numpy as np
import pandas as pd
import tensorflow as tf
from typing import List, Dict, Union, Tuple
import logging
from datetime import datetime
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
import matplotlib.pyplot as plt
import seaborn as sns

We start by bringing in the libraries that will help with:

Data manipulation and numerical operations: numpy, pandas
Deep learning model building: tensorflow
Logging: logging, to keep track of important events and performance
Date and time tracking: datetime, which will help with logging and tracking performance over time
Model evaluation: sklearn.model_selection for splitting the data, and sklearn.metrics for evaluating model performance
Visualization: matplotlib and seaborn, to plot and visualize performance and data.

Define the Sentiment Analysis Class

Now, let's define the core of our model: DeepSentimentAnalyzer. This is where we're going to have all our functionalities for training, predicting, and analyzing sentiment.

class DeepSentimentAnalyzer:
    """
    Production-ready sentiment analysis system with advanced features.
    """
    def __init__(
        self,
        max_sequence_length: int = 100,
        vocab_size: int = 10000,
        embedding_dim: int = 200,
        lstm_units: int = 128
    ):
        self.max_sequence_length = max_sequence_length
        self.vocab_size = vocab_size
        self.embedding_dim = embedding_dim
        self.lstm_units = lstm_units

        # Initialize tokenizer
        self.tokenizer = tf.keras.preprocessing.text.Tokenizer(
            num_words=vocab_size,
            oov_token='<UNK>',
            filters='!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n'
        )

        # State tracking
        self.model = None
        self.training_history = None
        self.metrics = {
            'preprocessing': {},
            'training': {},
            'performance': {}
        }

        # Setup logging
        logging.basicConfig(
            level=logging.INFO,
            format='%(asctime)s - %(levelname)s - %(message)s'
        )

        # Initialize performance tracking
        self.performance_log = []

Here's a breakdown of what each part does:

Initialization (init):
- max_sequence_length: Max length for input sequences.
- vocab_size: The number of unique words to consider.
- embedding_dim: The size of the word embeddings.
- lstm_units: Number of LSTM units (used for learning sequential patterns in text).
Tokenizer: We're using TensorFlow's tokenizer to process text into sequences of integers, mapping each word to an index.
State Tracking: We'll keep track of the model, its training history, and performance metrics.
Logging: Setting up logging for monitoring the model's progress.

Preprocess Your Text Data

Before training our model, we need to preprocess the text data. Here's where we'll convert the raw text into sequences that the model can understand.

def preprocess(self,
               texts: List[str],
               labels: np.ndarray = None,
               fit_tokenizer: bool = True) -> Tuple:
    """
    Preprocess text data for model input.
    """
    if fit_tokenizer:
        self.tokenizer.fit_on_texts(texts)

    # Convert to sequences
    sequences = self.tokenizer.texts_to_sequences(texts)
    padded_data = tf.keras.preprocessing.sequence.pad_sequences(
        sequences,
        maxlen=self.max_sequence_length,
        padding='post',
        truncating='post'
    )

    # Calculate stats
    stats = self._calculate_preprocessing_stats(texts, sequences)
    self.metrics['preprocessing'].update(stats)

    if labels is not None:
        return padded_data, labels
    return padded_data

We fit the tokenizer on the input text (if fit_tokenizer is True). Then, we convert the text to sequences of integers where each word is represented by its index in the vocabulary.

To handle sequences of different lengths, we pad them to ensure they all have the same length. We also calculate and store some statistics about the preprocessing process (such as sequence length and vocabulary coverage). If labels are provided, it returns the processed data along with the labels, otherwise just the processed data.

Build the Model

Now that the data is preprocessed, we can focus on building the sentiment analysis model. Here's the architecture we'll be using:

def build_model(self, embedding_matrix: np.ndarray = None):
    """
    Build the deep learning architecture.
    """
    model = tf.keras.Sequential([
        # Embedding layer
        tf.keras.layers.Embedding(
            input_dim=self.vocab_size,
            output_dim=self.embedding_dim,
            input_length=self.max_sequence_length,
            weights=[embedding_matrix] if embedding_matrix is not None else None,
            mask_zero=True,
            name='embedding'
        ),

        # BiLSTM layers
        tf.keras.layers.Bidirectional(
            tf.keras.layers.LSTM(
                self.lstm_units, 
                return_sequences=True
            ),
            name='bilstm_1'
        ),
        tf.keras.layers.LayerNormalization(),
        tf.keras.layers.Dropout(0.2),

        # Second BiLSTM
        tf.keras.layers.Bidirectional(
            tf.keras.layers.LSTM(self.lstm_units // 2),
            name='bilstm_2'
        ),
        tf.keras.layers.LayerNormalization(),
        tf.keras.layers.Dropout(0.2),

        # Dense layers
        tf.keras.layers.Dense(64, activation='relu'),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(32, activation='relu'),
        tf.keras.layers.Dense(1, activation='sigmoid')
    ])

    # Compile model
    model.compile(
        optimizer='adam',
        loss='binary_crossentropy',
        metrics=['accuracy']
    )

    self.model = model
    return model

Let's go through each of these layers in more detail:

Embedding Layer: This layer converts words into dense vectors. If we have a pre-trained embedding matrix, we can load it here.
BiLSTM Layers: These layers process the text sequentially, both forward and backward, which helps the model understand the context in both directions.
Dense Layers: After processing the sequence data with LSTM, we use dense layers to classify the sentiment.
Compilation: The model is compiled using the Adam optimizer and binary cross-entropy loss (since we are working with binary sentiment labels).

This is a solid starting point! We've covered some important pieces: importing libraries, setting up the sentiment analyzer class, preprocessing the text, and building the model architecture. Let's dive into the training and evaluation of our model, along with making predictions and tracking performance.

Train the Model

Now that we have our model architecture, it's time to train it using our data.

def train(self,
          X_train: np.ndarray,
          y_train: np.ndarray,
          validation_data: Tuple[np.ndarray, np.ndarray],
          epochs: int = 25,
          batch_size: int = 32):
    """
    Train the model with monitoring.
    """
    callbacks = [
        tf.keras.callbacks.EarlyStopping(
            monitor='val_loss',
            patience=3,
            restore_best_weights=True
        ),
        tf.keras.callbacks.ModelCheckpoint(
            'best_model.h5',
            monitor='val_accuracy',
            save_best_only=True
        )
    ]

    # Train
    history = self.model.fit(
        X_train, y_train,
        epochs=epochs,
        batch_size=batch_size,
        validation_data=validation_data,
        callbacks=callbacks,
        verbose=1
    )

    self.training_history = history
    self._update_training_metrics(history)

    return history

When initializing our DeepSentimentAnalyzer, you'll notice four main parameters that can significantly impact your model's performance. Let's break down each one and understand how to choose them wisely:

max_sequence_length (default=100) Think of this as your text length limit. Here's how to choose it:
Look at your typical text lengths. Are they product reviews (usually short), support tickets (medium), or detailed feedback (long)?
Check your data: print(df['text'].str.len().describe()) to see the distribution
Rule of thumb: Set it to cover 90-95% of your texts' lengths
Warning: Going too high wastes memory, and too low truncates important information
vocab_size (default=10000) This is your dictionary size. You'll need to choose this based on:
Your domain's language variety. Technical support needs more words than simple ratings
Available memory. Larger vocabulary = more memory usages
embedding_dim (default=200) This determines how rich your word representations are:
100-200: Good for general sentiment analysis
300: Standard for complex language understanding
50-100: Sufficient for simple, domain-specific tasks
Memory usage grows linearly with this number, so balance performance with resources
lstm_units (default=128) This affects your model's capacity to learn patterns:
Too few: Model struggles with complex patterns
Too many: Risk of overfitting, slower training
Rule of thumb: Start with 128, then:
- Increase if the loss isn't decreasing enough
- Decrease if you see overfitting (validation loss increases)

We'll also add callbacks to stop training early if the model's performance stops improving, and save the best model instead of letting the model run for longer epochs.

For this, we use EarlyStopping to stop training when the validation loss doesn't improve for 3 consecutive epochs, and ModelCheckpoint to save the model that performs the best on validation accuracy. For illustration purposes, we're going to train the model for 25 epochs. After training, we store the training history and update training metrics.

Make Predictions

Once the model is trained, we can use it to make predictions on new, unseen data. Let's take a look at the predict method, which processes the input and provides the sentiment analysis results.

def predict(self,
            texts: Union[str, List[str]],
            batch_size: int = 32) -> Dict:
    """
    Make predictions with detailed sentiment analysis.
    """
    # Handle single text input
    if isinstance(texts, str):
        texts = [texts]

    # Preprocess
    sequences = self.preprocess(texts, fit_tokenizer=False)

    # Predict
    raw_predictions = self.model.predict(
        sequences,
        batch_size=batch_size
    )

    # Create detailed results
    results = []
    for text, score in zip(texts, raw_predictions):
        sentiment_score = float(score[0])
        sentiment_label = self._get_sentiment_label(sentiment_score)
        confidence = self._get_confidence_level(sentiment_score)

        results.append({
            'text': text,
            'sentiment_score': sentiment_score,
            'sentiment': sentiment_label,
            'confidence': confidence,
            'analysis': self._generate_analysis(sentiment_score, confidence)
        })

    # Log performance
    self._log_prediction_performance(len(texts))

    return results

Single vs Multiple Texts: If the input is a single string, we convert it into a list for consistency.
Preprocessing: We preprocess the input text (without fitting the tokenizer, since it's already fitted during training).
Prediction: We use the trained model to predict sentiment scores for the input texts.
Detailed Results: For each text, we calculate the sentiment score, label, confidence level, and a human-readable analysis of the sentiment.
Performance Logging: After making predictions, we log the performance metrics for this batch.

Helper Functions

These helper methods are used to interpret the sentiment score and generate an analysis based on that score.

def _get_sentiment_label(self, score: float) -> str:
    """
    Convert score to human-readable sentiment.
    """
    if score >= 0.75:
        return "Very Positive"
    elif score >= 0.6:
        return "Positive"
    elif score >= 0.4:
        return "Neutral"
    elif score >= 0.25:
        return "Negative"
    else:
        return "Very Negative"

def _get_confidence_level(self, score: float) -> str:
    """
    Determine confidence level based on how far the score is from neutral.
    """
    distance_from_neutral = abs(score - 0.5)
    if distance_from_neutral >= 0.4:
        return "High"
    elif distance_from_neutral >= 0.25:
        return "Medium"
    else:
        return "Low"

def _generate_analysis(self, score: float, confidence: str) -> str:
    """
    Generate a human-readable analysis of the sentiment.
    """
    if confidence == "Low":
        return "The sentiment is not strongly pronounced, suggesting mixed or subtle emotions."
    elif score >= 0.75:
        return "The text shows extremely positive sentiment with strong emotional expression."
    elif score >= 0.6:
        return "The text conveys generally positive sentiment with clear positive indicators."
    elif score >= 0.4:
        return "The text appears to be relatively neutral or balanced in sentiment."
    elif score >= 0.25:
        return "The text conveys generally negative sentiment with clear negative indicators."
    else:
        return "The text shows extremely negative sentiment with strong negative emotional expression."

Note that some of the functions we're adding here are not absolutely necessary for you to add to your original implementation. We're doing this so as to help us understand the output much better.

Sentiment Labels: We convert the sentiment score into a human-readable label (Very Positive, Positive, Neutral, Negative, Very Negative).
Confidence Levels: We calculate how far the sentiment score is from neutral and use that distance to define the confidence level (High, Medium, Low).
Analysis: Based on the sentiment score and confidence level, we generate a detailed explanation of the sentiment.

Calculate Preprocessing Statistics

We're also calculating some important statistics during preprocessing to better understand the text data.

def _calculate_preprocessing_stats(self, texts: List[str], sequences: List[List[int]]) -> Dict:
    """
    Calculate preprocessing statistics.
    """
    total_words = sum(len(text.split()) for text in texts)
    known_words = sum(1 for seq in sequences for token in seq if token != 0)

    return {
        'vocab_coverage': known_words / total_words if total_words > 0 else 0,
        'sequence_stats': {
            'mean_length': np.mean([len(text.split()) for text in texts]),
            'max_length': max(len(text.split()) for text in texts),
            'truncated': sum(1 for text in texts 
                           if len(text.split()) > self.max_sequence_length)
        }
    }

Total Words: We calculate the total number of words across all texts.
Known Words: We count how many words in the sequences are known (non-zero tokens).
Preprocessing Stats: We return a dictionary with:
- Vocabulary Coverage: The ratio of known words to total words.
- Sequence Stats: Information like mean and max sequence length, and how many sequences got truncated (i.e., longer than the max_sequence_length)

Update Training Metrics

After training the model, we update and store essential metrics about the training process.

def _update_training_metrics(self, history):
    """
    Update training metrics.
    """
    self.metrics['training'].update({
        'final_accuracy': history.history['accuracy'][-1],
        'final_val_accuracy': history.history['val_accuracy'][-1],
        'final_loss': history.history['loss'][-1],
        'final_val_loss': history.history['val_loss'][-1],
        'epochs_trained': len(history.history['accuracy'])
    })

After training, we store the following metrics:

Final accuracy on training and validation data.
Final loss on training and validation data.
The number of epochs trained.

These metrics help track the performance and progress of the model during training.

Log Prediction Performance

Every time we make a batch of predictions, we log performance data for future reference.

def _log_prediction_performance(self, batch_size: int):
    """
    Log prediction performance.
    """
    self.performance_log.append({
        'timestamp': datetime.now(),
        'batch_size': batch_size,
        'vocab_size': len(self.tokenizer.word_index)
    })

We log important performance details, such as:

Timestamp: The time when the predictions were made.
Batch Size: The size of the batch for which predictions were made.
Vocabulary Size: The size of the vocabulary (how many unique words the tokenizer has learned).

This log helps in tracking how performance evolves over time and under different conditions.

Analyze Performance

After training and making predictions, it's important to analyze the performance of the model. This function allows you to view key metrics and visualize the training process.

def analyze_performance(self, plot: bool = True) -> Dict:
    """
    Analyze model performance.
    """
    if self.training_history is None:
        raise ValueError("Model hasn't been trained yet")

    if plot:
        self._plot_training_history()

    return {
        'training_metrics': self.metrics['training'],
        'preprocessing_stats': self.metrics['preprocessing'],
        'performance_log': self.performance_log[-10:]
    }

Training History: We check if the model has been trained (if not, it raises an error).
Plotting: If plot=True, we plot the training and validation accuracy and loss curves.
Return Metrics: We return the training metrics, preprocessing stats, and the last 10 logs of performance.

Plot Training History

This helper function generates visualizations of the training and validation performance (accuracy and loss) over time.

def _plot_training_history(self):
    """
    Plot training history.
    """
    history = self.training_history.history
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))

    ax1.plot(history['accuracy'], label='Training')
    ax1.plot(history['val_accuracy'], label='Validation')
    ax1.set_title('Model Accuracy')
    ax1.set_xlabel('Epoch')
    ax1.set_ylabel('Accuracy')
    ax1.legend()

    ax2.plot(history['loss'], label='Training')
    ax2.plot(history['val_loss'], label='Validation')
    ax2.set_title('Model Loss')
    ax2.set_xlabel('Epoch')
    ax2.set_ylabel('Loss')
    ax2.legend()

    plt.tight_layout()
    plt.show()

We plot the training and validation accuracy and loss over epochs to visualize how well the model is performing. This is going to help us assess if the model is overfitting, underfitting, or learning well.

Run the Model with Sample Data

Now, let's tie everything together and see how this system works end-to-end with some real data. The following code handles the data loading, preprocessing, training, and prediction, and also displays the results.

if __name__ == "__main__":
    # Read data from CSV file
    df = pd.read_csv('/content/navigator-batch-generate-679f8e95f7155791e7653804-data.csv')

    # Assuming column names are 'text' and 'sentiment'
    texts = df['text'].tolist()
    labels = df['sentiment'].values

    # Initialize analyzer
    analyzer = DeepSentimentAnalyzer()

    # Preprocess data
    X_train, X_test, y_train, y_test = train_test_split(
        texts, labels, test_size=0.2, random_state=42
    )

    # Prepare sequences
    X_train_seq = analyzer.preprocess(X_train, fit_tokenizer=True)
    X_test_seq = analyzer.preprocess(X_test, fit_tokenizer=False)

    # Build and train model
    model = analyzer.build_model()
    history = analyzer.train(
        X_train_seq, y_train,
        validation_data=(X_test_seq, y_test),
        epochs=25
    )

    # Analyze performance
    performance = analyzer.analyze_performance()
    print("\nPerformance Metrics:")
    print(performance)

    # Test with new reviews
    test_texts = [
        "Oh wow, this new update is EXACTLY what I needed - more bugs to keep me entertained during my coffee breaks. Thanks for making my day more 'interesting'!",
        "Customer service was AMAZING! They only made me wait 2 hours before hanging up on me!",
        "Absolutely love how this app crashes every 5 minutes. Really keeps me on my toes!",
        "The new camera on this phone is incredible! The photos are crystal clear and the night mode is a game-changer. Best purchase I've made this year.",
        "This restaurant was a complete disappointment. The food was cold, service was slow, and the prices were ridiculously high for such poor quality."
    ]

    # Make and display predictions
    results = analyzer.predict(test_texts)

    print("\nSentiment Analysis Results:")
    print("-" * 50)
    for result in results:
        print(f"Text: {result['text']}")
        print(f"Sentiment: {result['sentiment']}")
        print(f"Sentiment Score: {result['sentiment_score']:.3f}")
        print(f"Confidence: {result['confidence']}")
        print(f"Analysis: {result['analysis']}")
        print("-" * 50)

Let's go through this step by step so you know exactly what's happening:

Step 1: We load the dataset from a CSV file.
Step 2: We split the data into training and testing sets and preprocess the text.
Step 3: The model is trained using the training data and validated on the test data.
Step 4: After training, we analyze and print the model's performance.
Step 4: We run the model on a set of new text samples, print the results, and display detailed sentiment analysis.

Execute the code above in the following manner.

Go to colab.research.google.com and create a new notebook. We're using Colab to take advantage of the A100 GPUs it provides for a limited period, but free of cost, for faster training.
Create 3 cells:
- Cell 1: Package installations (all the pip install commands)
- Cell 2: The DeepSentimentAnalyzer class code
- Cell 3: The running code above
Run the cells in order (1, 2, then 3)

Once you've executed the above code, you should see something like the image below (See Fig 4).

Fig 4: Model training in progress on Google Colab

Understanding the Output

Model Accuracy and Loss Graphs

Fig 5: Model accuracy and loss

Remember how we mentioned earlier that we're working with a relatively small dataset (just 1500 rows)? Well, these graphs tell that story pretty clearly (See Fig 5):

Model Accuracy Plot (Left Graph)

Our model's accuracy graph looks a bit underwhelming at first glance: both lines (training and validation) flatten out quickly after a small initial improvement. Our model is doing its best, but it's working with limited examples to learn from.

Model Loss Plot (Right Graph)

The loss graph gives us some interesting insights. While training loss (blue line) and validation loss (orange line) do decrease, meaning the model is learning something, they don't drop as dramatically as we might hope.

A couple of things to note here:

The flat accuracy lines aren't necessarily a failure - they're a realistic outcome given our dataset size. With only 1500 examples, our model is doing what it can to learn patterns.
The gap between training and validation performance isn't too wide, which is actually a good sign.
The somewhat erratic validation loss suggests our model sometimes struggles to generalize, but that's expected with limited training data.

Trial Tests

This image shows the predictions made on some sample texts after running the model. Here's the analysis for each:

Text 1:
- "Customer service was AMAZING! They only made me wait 2 hours before hanging up on me!"
- Sentiment: Very Negative
- Sentiment Score: 0.152
- Confidence: Medium
- The model identified a very negative sentiment, with a score of 0.152. You'll see that it's able to pick up traces of sarcasm in the text!
Text 2:
- "Absolutely love how this app crashes every 5 minutes. Really keeps me on my toes!"
- Sentiment: Neutral
- Sentiment Score: 0.480
- Confidence: Low
- This one comes out as neutral, with a score close to 0.5, meaning the sentiment isn't strongly pronounced. The low confidence suggests that the model finds this text difficult to categorize clearly. Yet, it's good as it's falling on the lower side of 0.5 for now.
Text 3:
- "The new camera on this phone is incredible! The photos are crystal clear, and the night mode is a game-changer. Best purchase I've made this year."
- Sentiment: Very Positive
- Sentiment Score: 0.903
- Confidence: High
- Here, the model detects a very positive sentiment with a high score of 0.903. The high confidence reflects that the model is very sure about this positive sentiment.
Text 4:
- "This restaurant was a complete disappointment. The food was cold, service was slow, and the prices were ridiculously high for such poor quality."
- Sentiment: Very Negative
- Sentiment Score: 0.122
- Confidence: Medium
- For this text, the model once again detects a very negative sentiment with a score of 0.122.

Conclusion

From these results, it's clear that the model is capable of distinguishing different sentiment types, such as Very Positive, Neutral, and Very Negative. But the accuracy and loss graphs, as we previously discussed, indicate that the model might not be performing optimally during training.

With a sufficiently large dataset, say, in the order of 50,000-100,000 examples, you should expect to see much better performance metrics:

Training accuracy climbing steadily above 70%
Validation accuracy following a similar trend
Loss curves showing smoother, more consistent descent
Better handling of nuanced cases like sarcasm and mixed sentiments

To improve the model's performance, you may need to experiment with adjusting the model architecture. You can do this by exploring different neural network structures, such as adding more layers or changing the LSTM units. Also, make sure to train it for more epochs ~ 100 and above for better results. If you're planning on generating a dataset for this experiment, make sure you write a prompt that can generate a diverse and representative dataset. You could try something like this:

Generate customer feedback and reviews that include:
- Mix of formal and casual language
- Varied sentence lengths (short comments to detailed reviews)
- Different sentiment intensities
- Sarcastic expressions
- Mixed emotions in single reviews
- Industry-specific terminology
- Common misspellings and informal abbreviations
- Regional language variations
- Emojis and special characters
- Implicit sentiments without obvious sentiment words

Format:
text: [the review text]
sentiment: [value between 0 and 1, where:]
- 0.0-0.2: Very negative
- 0.2-0.4: Negative
- 0.4-0.6: Neutral
- 0.6-0.8: Positive
- 0.8-1.0: Very positive

The code and approach we've shared give you a solid starting point. You can adapt this foundation to fit your needs. As you work with more data and try different settings, you'll get better at tuning the system for your specific goals. And hey, with generative AI tools evolving so quickly, a lot of the heavy lifting in building these systems is getting easier by the day, from data generation to model optimization.

Deep Learning in Sentiment Analysis: A Complete Guide (With Code)

Table of contents