DIN: Deep Interest Network for Click-Through Rate Prediction


DIN is published by Alibaba and Taobao in KDD’2018. The paper mentions that DIN was successfully deployed in the online display advertising system in Alibaba, serving the main traffic.
Full Paper: Deep Interest Network for Click-Through Rate Prediction
Motivation Behind DIN
DNN models with Embedding&MLP paradigm:
Map sparse inputs to low dimensional embedding vectors
Transform embedding vectors to fixed length vectors for each feature group
Concatenate all the resulting vectors before passing them to MLP layers
The bottleneck with this approach is that:
User feature vector learned from their behavior sequences is the same irrespective of the target item.
However, in context of a target item, only some part of the historical user behavior will be relevant and therefore the entire user history does not contribute equally to capture user interest w.r.t. a target item
The example in the paper explains it as: “a female swimmer will click a recommended goggle mostly due to the purchase of bathing suit rather than the shoes in her last week’s shopping list“.
Lets see what is proposed with DIN.
Deep Interest Network (DIN)
Feature sets used in online display advertising system in Alibaba are shown below:
In Embedding&MLP paradigm
One-hot features get converted to fixed-length feature vectors directly by the embedding layer
Variable length sequences such as item ids visited by the user are represented as multi-hot features, embedding are looked up for each index and then pooling operation is used to get fixed-length feature vector
In DIN, multi-hot features which contain rich user behaviors, are converted to fixed feature length while capturing relevant information in context of the target item.
DIN Architecture
Notice the changes in the architecture w.r.t. the Embedding&MLP paradigm.
DIN introduces a local activation unit which is applied on the user behavior features, and performs a weighted sum pooling to calculate user representation for a given candidate item A.
\(v_U(A)\) represents the user vector given a target item A
\(e_1, e_2, ..., e_H\) are the embeddings of the H items user has interacted with
\(v_A\) is the embedding of the target item A
\(a(.)\) is a feed-forward network with output as the activation weight
It keeps rest of the architecture same.
IMP: Compared to traditional attention method where sum of attention weights is constrained to 1 using softmax operation, this normalization is relaxed in DIN. Instead, sum of attention weights is treated as “an approximation of the intensity of activated user interests”.
Two more novel contributions mentioned in the paper are:
Mini-batch Aware Regularization where L2 regularization updates only the weights which are active in the forward pass and
Dice activation function may be viewed as a generalization of PReLu
Experiments
Dataset
Metrics
AUC and RelaImpr are used to measure performance.
Offline Results on Alibaba Dataset
Online Results
Online A/B testing in the display advertising system in Alibaba was conducted from 2017-05 to 2017-06. DIN trained with the proposed regularizer and activation function resulted in up to 10.0% CTR and 3.8% RPM(Revenue Per Mille) improvement compared with the BaseModel.
Other Insights
Ads are ranked by \(CTR^\alpha * bid\_price\) with \(\alpha > 1\), which controls the balance between CTR and RPM
Inference time is less than 10ms for hundreds of ads per visitor
Request batching is done to take full advantage of GPU
Concurrent kernel computation is done which allows execution of matrix computations to be processed with multiple CUDA kernels concurrently
Subscribe to my newsletter
Read articles from Abhay Shukla directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
