1 Introduction

In modern e-commerce advertising systems, predicted click-through rate (pCTR) and predicted conversion rate (pCVR) are foundational components that drive performance and revenue. These models estimate the likelihood that a user will click on or convert after viewing an ad, respectively, and serve as key signals in both ad ranking and bid optimization.

A common strategy in ad auctions is to sort advertisements by effective cost per mille (eCPM)—a value calculated as the product of the pCTR and the advertiser's bid price. This approach ensures that ads most likely to generate revenue are prioritized, aligning platform incentives with user engagement.

Beyond ranking, pCVR plays a critical role in Optimized Cost Per Click (OCPC) [1] systems. Here, platforms dynamically adjust the bid price based on the predicted conversion probability, enabling a win-win strategy: advertisers achieve better return on investment (ROI) by paying closer to the value of actual conversions, while platforms maximize long-term revenue and user satisfaction.

These predictive models are not limited to advertising alone. In recommendation systems, pCTR and pCVR contribute to a more holistic understanding of user behavior—helping balance the short-term goal of click engagement with the long-term objective of driving purchases and conversions. By modeling both what users want to click and what they may buy, platforms can show better content that benefits both users and the business.

As user behavior becomes increasingly dynamic and complex, the need for accurate and robust pCTR and pCVR models continues to grow. In the sections that follow, we’ll explore how these models are built, the challenges they present, and the techniques that top ad systems use to overcome them.

2 Data Foundations

Building a high-quality dataset is fundamental to training effective pCTR and pCVR models. Since pCVR dataset construction is similar to pCTR, we concentrate on the pCTR dataset. As a supervised learning task, pCTR relies on labeled data—clicked impressions as positive samples and non-clicked impressions as negatives. However, accurately defining and collecting these samples is challenging and has a critical impact on model performance.

2.1 Label Collection and Filtering Criteria

To ensure high-quality labels, we apply several filtering rules during data collection:

Spam User Filtering: Users exhibiting abnormal behavior—such as clicking on an unusually large number of ads or viewing many ads without clicking any—are excluded. These patterns often indicate bots, click farms, or low-quality interactions that can introduce noise into the dataset.

Session-level No-click Filtering: If a user session contains ad impressions but no clicks at all, we exclude those impressions. In recommendation and advertising systems, ad targeting is usually relevance-based. If a user doesn't click on any ad in a session, it's often due to external factors (e.g., user distraction), not necessarily because the ads were irrelevant. Including such impressions as negative samples could mislead the model.

Invalid Clicks Filtering: Clicks with very short dwell time (e.g., user bounces within 1–2 seconds) are treated as invalid positive samples. These cases often indicate accidental clicks or low user interest, and counting them as true positives could harm model accuracy.

2.2 Addressing Class Imbalance

pCTR datasets are extremely imbalanced, with far more non-clicked impressions than clicked ones. To mitigate this, we apply strategic sampling techniques to balance the dataset while preserving meaningful patterns.

User-based Negative Sampling: For users with many clicks, we proportionally retain more non-click impressions. This ensures that both clicked and non-clicked behaviors are represented fairly in the dataset.

Skip-above Sampling: This strategy focuses on ad positions relative to the user's last click within a session. We retain non-clicked impressions that appear before the last click and discard those that appear after it. The intuition is that impressions after the last click are likely ignored or unseen, making them unreliable as negative samples.

2.3 Hard Negative Mining

Inspired by hard negative mining, we also log real-time predicted CTR scores in the online system. If an impression has a high predicted CTR but was not clicked, we treat it as a hard negative—a sample that the model wrongly considered attractive.

We then apply a threshold-based filtering strategy: non-clicked impressions with predicted CTR above a certain value (e.g., 0.6) are retained as hard negatives. Incorporating these samples during training helps the model learn fine-grained distinctions between relevant and irrelevant ads, boosting overall accuracy and robustness.

2.4 Feature Engineering

Feature engineering plays a crucial role in capturing user intent and ad relevance. In pCTR models, features are generally categorized into three types:

User Features: These include both static profiles (e.g., gender, age, location) and behavioral features (e.g., recent click history, preferred categories, interaction with specific sellers). Modeling user behavior over time helps capture evolving interests.

Campaign (Ad) Features: These describe the product and seller being advertised. In e-commerce, ads usually represent products, and the advertiser is the seller. Key features include product category, price, historical purchase volume, seller reputation, and brand identity.

Context Features: These capture the environment in which the ad impression occurs—such as ad placement (slot ID), timestamp, device type, OS, and network condition. Temporal and spatial context can significantly affect user behavior.

In deep learning models, ID-type features (e.g., user ID, product ID, seller ID) are often embedded into dense vectors, allowing the model to learn complex patterns across high-cardinality features.

3 Modelling Techniques

Accurately modeling predicted Click-Through Rate (pCTR) and predicted Conversion Rate (pCVR) is critical to the effectiveness of online advertising and auction systems. Over the years, modeling techniques have evolved from simple linear models to complex deep learning architectures, driven by the need to capture high-dimensional, sparse, and sequential user behavior.

3.1 Traditional Models

Logistic Regression (LR) has long been the baseline model for pCTR and pCVR tasks, especially before the deep learning era. It is favored for its simplicity, efficiency, and interpretability. LR can be viewed as a shallow neural network with a single linear layer, where each model weight reflects the contribution of a corresponding feature to the predicted outcome.

However, real-world ad systems involve sparse and high-dimensional features, requiring special handling. Techniques such as feature hashing, one-hot encoding, and regularization are commonly used to scale LR to massive datasets.

Another popular traditional method is Gradient Boosted Decision Trees (GBDT), including implementations like XGBoost and LightGBM. These models are capable of capturing non-linear feature interactions and reducing reliance on manual feature engineering, making them particularly effective in handling structured tabular data with categorical variables.

3.2 Deep Learning Models

With the availability of rich user behavior data on e-commerce platforms, deep learning models have become dominant due to their ability to learn complex patterns from high-dimensional and sequential data.

To better model user interest, DIN (Deep Interest Network) [3] introduces a local activation mechanism, inspired by attention mechanisms, to dynamically compute user representations with respect to a specific target ad. Instead of learning a single static user embedding, DIN adaptively aggregates relevant user behaviors based on their similarity to the candidate item, thereby capturing contextual and personalized interests.

To further model the temporal evolution of user interest, DIEN (Deep Interest Evolution Network) [4] employs Gated Recurrent Units (GRU) to encode sequential user behaviors. It introduces an auxiliary loss that uses the next-click behavior to supervise the learning of the current GRU hidden state, encouraging the model to capture semantically meaningful interest representations. DIEN also proposes AUGRU (GRU with Attentional Update Gate), which selectively emphasizes interest states that are more relevant to the target ad, strengthening the influence of related behaviors while suppressing irrelevant ones.

3.3 Calibration Techniques

Accurate probability estimation is essential for bidding and auction. Therefore, model calibration is used to align predicted probabilities with actual click-through or conversion rates.

After training, models are often evaluated or served on datasets with different class distributions due to negative downsampling. To correct this, we apply a downsampling calibration formula [5]:

$$q = \frac{p}{p+(1-p)/w}$$

Where:

$p$ is the predicted probability on the downsampled dataset
$w$ is the negative sampling rate
$q$ is the calibrated probability in the original space

However, this adjustment may still leave some misalignment. To further improve calibration, we apply isotonic regression:

Sort the calibrated predictions in ascending order.
Divide them into bins of equal size or quantiles.
Compute the empirical CTR for each bin (clicks / total impressions).
Fit an isotonic regression model to map predicted scores to observed CTRs.

Use this mapping during online serving to provide better-calibrated probabilities.

3.4 Modeling Conversion

A unique challenge in pCVR modeling is sample selection bias (SSB). Conventional pCVR models are trained only on clicked impressions but are expected to generalize to all impressions at inference time. This mismatch leads to biased predictions. Additionally, conversions are rare events, making the dataset extremely sparse.

ESMM (Entire Space Multi-task Model) addresses these issues by modeling the entire space: from impression to click to conversion. Instead of modeling pCVR directly, ESMM decomposes it as:

$$pCVR= \frac{pCTR}{pCTCVR} $$

Where pCTCVR is the predicted probability of click and conversion, and pCTR is the predicted probability of click.

Both auxiliary tasks—CTR and CTCVR—are trained over the entire impression space, enabling CVR estimation for all impressions, not just those that were clicked. ESMM uses a multi-task learning approach, sharing embeddings and lower layers across tasks. This setup allows the rich signals from CTR data to enhance training for the sparser CVR task.

4 Evaluation Metrics

AUC (Area Under the ROC Curve) is one of the most commonly used metrics to evaluate model performance in pCTR and pCVR tasks. It measures the model’s ability to rank positive samples (e.g., clicks or conversions) higher than negative ones, providing a global view of its discrimination capability.

However, a higher AUC on the test set doesn’t always translate into better online performance. This discrepancy arises because AUC is calculated across the entire dataset, without considering differences among users or contexts. For instance:

Users who never click any ads contribute negatively to the AUC.
Low-traffic or obscure ad slots may skew the AUC and mask the model’s performance on high-value placements.

To address these limitations, we adopt GAUC (Group AUC), which evaluates model performance at a more fine-grained level by grouping data and computing AUC within each group.

4.1 What is GAUC?

GAUC stands for Grouped AUC, and it provides a weighted average of AUC values computed per group, allowing for a more fine-grained evaluation. In many ad systems, the most common grouping is by user, so GAUC often reflects how well the model performs at the individual user level. This is particularly important in personalization and ranking tasks.

4.2 How GAUC is Computed

Group Definition: First, the dataset is divided into groups based on a chosen criterion. The most common grouping is by user, but it can also be generalized to other dimensions, such as age, gender, device type, geographic region, or even predicted CTR buckets, depending on the application.
Per-Group AUC Calculation: For each group, compute the AUC based on the group’s impressions, clicks, and predictions. This gives a local performance measure for that segment.
Weighted Averaging: Compute the final GAUC as the weighted average of all group-level AUCs, where the weight is typically proportional to the number of impressions or clicks in each group.

$$\text{GAUC} = \frac{\sum_{i=1}^{N} w_i \cdot \text{AUC}i}{\sum{i=1}^{N} w_i}$$

Where:

$\text{AUC}_i$ is the AUC of the 𝑖 i-th group
$w_i$ is the weight (e.g., number of impressions or queries in the group)
$N$ is the total number of groups

4.3 Why GAUC Matters

Personalization Insight: GAUC reflects how well the model ranks ads within a user’s context, making it more aligned with real-world performance in personalized systems.

Better Correlation with Online Metrics: GAUC often correlates more strongly with CTR lift, conversion uplift, and revenue impact observed in A/B testing.

Noise Reduction: It reduces the influence of non-informative users (e.g., those who never click) or cold-start scenarios, offering a clearer picture of performance where it truly matters.

5 References

[1] Zhu, Han, et al. "Optimized cost per click in taobao display advertising." Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 2017.

[2] Ma, Xiao, et al. "Entire space multi-task model: An effective approach for estimating post-click conversion rate." The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 2018.

[3] Zhou, Guorui, et al. "Deep interest network for click-through rate prediction." Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 2018.

[4] Zhou, Guorui, et al. "Deep interest evolution network for click-through rate prediction." Proceedings of the AAAI conference on artificial intelligence. Vol. 33. No. 01. 2019.

[5] He, Xinran, et al. "Practical lessons from predicting clicks on ads at facebook." Proceedings of the eighth international workshop on data mining for online advertising. 2014.

How to Build Effective pCTR and pCVR Models: Insights from Ad Systems