Emotional Intelligence Attention Unsupervised Learning Using Lexicon Analysis for Irony-based Advertising
data:image/s3,"s3://crabby-images/400c0/400c091d5d5c8009813fd481232ec54af1360be7" alt="Mohamad Mahmood"
data:image/s3,"s3://crabby-images/018ea/018ea15d3d0f20057cb2a0e239efa67a385cb3b6" alt=""
Abstract
Social media platforms have made increasing use of irony in recent years. Users can express their ironic thoughts with audio, video, and images attached to text content. When you use irony, you are making fun of a situation or trying to make a point. It can also express frustration or highlight the absurdity of a situation. The use of irony in social media is likely to continue to increase, no matter the reason. By using syntactic information in conjunction with semantic exploration, we show that attention networks can be enhanced. Using learned embedding, unsupervised learning encodes word order into a joint space. By evaluating the entropy of an example class and adding instances, the active learning method uses the shared representation as a query to retrieve semantically similar sentences from a knowledge base. In this way, the algorithm can identify the instance with the maximum uncertainty and extract the most informative example from the training set. An ironic network trained for each labelled record is used to train a classifier (model). The partial training model and the original labelled data generate pseudo-labels for the unlabeled data. To correctly predict the label of a dataset, a classifier (attention network) updates the pseudo-labels for the remaining datasets. After the experimental evaluation of the 1,021 annotated texts, the proposed model performed better than the baseline models, achieving an F1 score of 0.63 on ironic tasks and 0.59 on non-ironic tasks. We also found that the proposed model generalized well to new instances of datasets.
https://dl.acm.org/doi/10.1145/3580496
https://dl.acm.org/doi/pdf/10.1145/3580496
Exercise
1. Preprocessing Text for Irony Detection
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
import string
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')
def preprocess_text(text):
# Tokenization
tokens = word_tokenize(text.lower())
# Remove punctuation
tokens = [word for word in tokens if word not in string.punctuation]
# Remove stop words
stop_words = set(stopwords.words('english'))
tokens = [word for word in tokens if word not in stop_words]
# Lemmatization
lemmatizer = WordNetLemmatizer()
tokens = [lemmatizer.lemmatize(word) for word in tokens]
return tokens
# Example
text = "Irony is often used to highlight the absurdity of a situation!"
print(preprocess_text(text))
2. Building an Attention Network
import tensorflow as tf
from tensorflow.keras.layers import Input, Embedding, LSTM, Dense, Attention
from tensorflow.keras.models import Model
def build_attention_model(vocab_size, embed_dim, lstm_units):
inputs = Input(shape=(None,))
embedding = Embedding(input_dim=vocab_size, output_dim=embed_dim)(inputs)
lstm_out = LSTM(lstm_units, return_sequences=True)(embedding)
# Attention mechanism
attention = Attention()([lstm_out, lstm_out])
context_vector = tf.reduce_sum(attention, axis=1)
outputs = Dense(1, activation='sigmoid')(context_vector)
model = Model(inputs, outputs)
return model
# Example
vocab_size = 5000
embed_dim = 128
lstm_units = 64
model = build_attention_model(vocab_size, embed_dim, lstm_units)
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.summary()
3. Word Embedding and Joint Space Encoding
from gensim.models import Word2Vec
# Example sentences
sentences = [
"Irony is used to make a point".split(),
"Irony highlights the absurdity of situations".split(),
"Semantic and syntactic features are important".split()
]
# Train Word2Vec
model = Word2Vec(sentences, vector_size=100, window=5, min_count=1, workers=4)
# Encode sentence as joint space vector (average of word embeddings)
def encode_sentence(sentence, model):
vectors = [model.wv[word] for word in sentence.split() if word in model.wv]
return sum(vectors) / len(vectors) if vectors else [0] * model.vector_size
print(encode_sentence("Irony is important", model))
4. Active Learning with Uncertainty Sampling
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.metrics import entropy
# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2)
model = RandomForestClassifier()
# Active learning loop
labeled_indices = np.random.choice(len(X), 100, replace=False)
unlabeled_indices = list(set(range(len(X))) - set(labeled_indices))
for i in range(5): # Perform 5 iterations
model.fit(X[labeled_indices], y[labeled_indices])
probs = model.predict_proba(X[unlabeled_indices])
uncertainties = [entropy(p) for p in probs]
most_uncertain_idx = np.argmax(uncertainties)
labeled_indices = np.append(labeled_indices, unlabeled_indices[most_uncertain_idx])
unlabeled_indices.pop(most_uncertain_idx)
print("Active learning completed.")
5. Pseudo-Labeling for Semi-Supervised Learning
from sklearn.model_selection import train_test_split
# Split into labeled and unlabeled data
X_train, X_unlabeled, y_train, _ = train_test_split(X, y, test_size=0.7)
model = RandomForestClassifier()
# Train initial model on labeled data
model.fit(X_train, y_train)
# Pseudo-labeling loop
for _ in range(5):
pseudo_labels = model.predict(X_unlabeled)
X_train = np.vstack((X_train, X_unlabeled))
y_train = np.append(y_train, pseudo_labels)
model.fit(X_train, y_train)
print("Pseudo-labeling completed.")
6. Evaluating Classifier Performance
from sklearn.metrics import classification_report, confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt
# Example predictions
y_true = [1, 0, 1, 1, 0, 1, 0]
y_pred = [1, 0, 0, 1, 0, 1, 1]
# Classification report
print(classification_report(y_true, y_pred))
# Confusion matrix
cm = confusion_matrix(y_true, y_pred)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted')
plt.ylabel('True')
plt.show()
7. Dataset Augmentation
from textblob import TextBlob
import random
# Example sentence
sentence = "Irony is a complex and fascinating subject."
# Synonym replacement
def augment_with_synonyms(sentence):
words = sentence.split()
augmented = []
for word in words:
synonyms = TextBlob(word).synonyms
augmented.append(random.choice(synonyms) if synonyms else word)
return " ".join(augmented)
print(augment_with_synonyms(sentence))
Subscribe to my newsletter
Read articles from Mohamad Mahmood directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
data:image/s3,"s3://crabby-images/400c0/400c091d5d5c8009813fd481232ec54af1360be7" alt="Mohamad Mahmood"
Mohamad Mahmood
Mohamad Mahmood
Mohamad's interest is in Programming (Mobile, Web, Database and Machine Learning). He studies at the Center For Artificial Intelligence Technology (CAIT), Universiti Kebangsaan Malaysia (UKM).