Abstract

Social media platforms have made increasing use of irony in recent years. Users can express their ironic thoughts with audio, video, and images attached to text content. When you use irony, you are making fun of a situation or trying to make a point. It can also express frustration or highlight the absurdity of a situation. The use of irony in social media is likely to continue to increase, no matter the reason. By using syntactic information in conjunction with semantic exploration, we show that attention networks can be enhanced. Using learned embedding, unsupervised learning encodes word order into a joint space. By evaluating the entropy of an example class and adding instances, the active learning method uses the shared representation as a query to retrieve semantically similar sentences from a knowledge base. In this way, the algorithm can identify the instance with the maximum uncertainty and extract the most informative example from the training set. An ironic network trained for each labelled record is used to train a classifier (model). The partial training model and the original labelled data generate pseudo-labels for the unlabeled data. To correctly predict the label of a dataset, a classifier (attention network) updates the pseudo-labels for the remaining datasets. After the experimental evaluation of the 1,021 annotated texts, the proposed model performed better than the baseline models, achieving an F1 score of 0.63 on ironic tasks and 0.59 on non-ironic tasks. We also found that the proposed model generalized well to new instances of datasets.

https://dl.acm.org/doi/10.1145/3580496

https://dl.acm.org/doi/pdf/10.1145/3580496

Exercise

1. Preprocessing Text for Irony Detection

import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
import string

nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')

def preprocess_text(text):
    # Tokenization
    tokens = word_tokenize(text.lower())

    # Remove punctuation
    tokens = [word for word in tokens if word not in string.punctuation]

    # Remove stop words
    stop_words = set(stopwords.words('english'))
    tokens = [word for word in tokens if word not in stop_words]

    # Lemmatization
    lemmatizer = WordNetLemmatizer()
    tokens = [lemmatizer.lemmatize(word) for word in tokens]

    return tokens

# Example
text = "Irony is often used to highlight the absurdity of a situation!"
print(preprocess_text(text))

2. Building an Attention Network

import tensorflow as tf
from tensorflow.keras.layers import Input, Embedding, LSTM, Dense, Attention
from tensorflow.keras.models import Model

def build_attention_model(vocab_size, embed_dim, lstm_units):
    inputs = Input(shape=(None,))
    embedding = Embedding(input_dim=vocab_size, output_dim=embed_dim)(inputs)
    lstm_out = LSTM(lstm_units, return_sequences=True)(embedding)

    # Attention mechanism
    attention = Attention()([lstm_out, lstm_out])
    context_vector = tf.reduce_sum(attention, axis=1)

    outputs = Dense(1, activation='sigmoid')(context_vector)
    model = Model(inputs, outputs)
    return model

# Example
vocab_size = 5000
embed_dim = 128
lstm_units = 64
model = build_attention_model(vocab_size, embed_dim, lstm_units)
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.summary()

3. Word Embedding and Joint Space Encoding

from gensim.models import Word2Vec

# Example sentences
sentences = [
    "Irony is used to make a point".split(),
    "Irony highlights the absurdity of situations".split(),
    "Semantic and syntactic features are important".split()
]

# Train Word2Vec
model = Word2Vec(sentences, vector_size=100, window=5, min_count=1, workers=4)

# Encode sentence as joint space vector (average of word embeddings)
def encode_sentence(sentence, model):
    vectors = [model.wv[word] for word in sentence.split() if word in model.wv]
    return sum(vectors) / len(vectors) if vectors else [0] * model.vector_size

print(encode_sentence("Irony is important", model))

4. Active Learning with Uncertainty Sampling

import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.metrics import entropy

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2)
model = RandomForestClassifier()

# Active learning loop
labeled_indices = np.random.choice(len(X), 100, replace=False)
unlabeled_indices = list(set(range(len(X))) - set(labeled_indices))

for i in range(5):  # Perform 5 iterations
    model.fit(X[labeled_indices], y[labeled_indices])
    probs = model.predict_proba(X[unlabeled_indices])

    uncertainties = [entropy(p) for p in probs]
    most_uncertain_idx = np.argmax(uncertainties)
    labeled_indices = np.append(labeled_indices, unlabeled_indices[most_uncertain_idx])
    unlabeled_indices.pop(most_uncertain_idx)

print("Active learning completed.")

5. Pseudo-Labeling for Semi-Supervised Learning

from sklearn.model_selection import train_test_split

# Split into labeled and unlabeled data
X_train, X_unlabeled, y_train, _ = train_test_split(X, y, test_size=0.7)
model = RandomForestClassifier()

# Train initial model on labeled data
model.fit(X_train, y_train)

# Pseudo-labeling loop
for _ in range(5):
    pseudo_labels = model.predict(X_unlabeled)
    X_train = np.vstack((X_train, X_unlabeled))
    y_train = np.append(y_train, pseudo_labels)
    model.fit(X_train, y_train)

print("Pseudo-labeling completed.")

6. Evaluating Classifier Performance

from sklearn.metrics import classification_report, confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt

# Example predictions
y_true = [1, 0, 1, 1, 0, 1, 0]
y_pred = [1, 0, 0, 1, 0, 1, 1]

# Classification report
print(classification_report(y_true, y_pred))

# Confusion matrix
cm = confusion_matrix(y_true, y_pred)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted')
plt.ylabel('True')
plt.show()

7. Dataset Augmentation

from textblob import TextBlob
import random

# Example sentence
sentence = "Irony is a complex and fascinating subject."

# Synonym replacement
def augment_with_synonyms(sentence):
    words = sentence.split()
    augmented = []
    for word in words:
        synonyms = TextBlob(word).synonyms
        augmented.append(random.choice(synonyms) if synonyms else word)
    return " ".join(augmented)

print(augment_with_synonyms(sentence))

Emotional Intelligence Attention Unsupervised Learning Using Lexicon Analysis for Irony-based Advertising