Tracking 101: How Your Data Is Stolen by Design

What is Tracking

Tracking refers to the set of technologies and techniques used to observe, record, correlate, and analyze a user's digital behavior.

It’s not just about who you are. It’s about what you do, when, where, for how long, how often, with whom, and from what device.

Tracking happens across multiple layers:

Application-level (cookies, pixels, fingerprinting, analytics)
Network-level (DNS, IP, TLS metadata, SNI, packet timing)
System-level (device IDs, persistent sessions, SDKs)
Contextual-level (mouse movement, scroll depth, clickstream, behavioral patterns)

Every session, every interaction, every moment online is potentially harvested.

Why It Exists (and Thrives)

Reason 1. Targeted Advertising

Tracking is the foundation of modern adtech. Each time you visit a website, your behavioral profile is auctioned off via Real-Time Bidding (RTB). Ads are tailored to you based on microtargeting logic.

💡

Real-time bidding

Real-Time Bidding is a programmatic advertising process where your behavioural profile is sold in an auction that takes place while a web page is loading.

Every time you visit a site with ads:

A request is sent to multiple ad exchanges (platforms that connect advertisers and websites).
This request includes several pieces of information, the most remarkable being:
- Your IP address
- Device details (browser, OS, screen size)
- Geolocation
- URL you’re visiting
- Behavioural data (from cookies, trackers, fingerprinting)

Advertisers bid to show you their ad — based on how valuable your profile seems for their goal.
The highest bidder wins. Their ad is displayed. This happens in less than 100 milliseconds.

Shortly: you’re not just looking at a page. You’re being priced, profiled, and auctioned off in real time.

Reason 2. Retention & Optimization

Platforms want you to stay. The longer you engage, the more valuable you become. Tracking identifies what grabs your attention — and replicates it.

Reason 3. Machine Learning Fuel

Modern AI systems — from recommendation engines to large language models — rely on vast amounts of labeled behavioural data to learn, generalise, and predict.

Tracking provides that data, continuously and at scale.

Each click, pause, scroll, or bounce becomes a training signal. Each session adds to the dataset. Each pattern strengthens the model that defines what someone like you is likely to do next.

Examples:

YouTube’s recommender refines what to serve based on watch time, video abandonment rate, and click-through chains — all fed by real-time tracking.
Instagram and TikTok optimize dopamine loops by analyzing swipe speed, dwell time, and engagement frequency.
E-commerce platforms (like Amazon or Shein) adapt pricing, product positioning, and discount visibility based on your micro-behaviors.

And it’s not just content.

Tracking also fuels fraud detection models, sentiment analysis engines, and customer scoring systems — many of which influence decisions without your awareness or consent.

If we want to see it as a system:

Behavioral data is the oil.
Tracking is the pipeline.
Machine learning is the refinery.

Reason 4. Surveillance & Control

Whether corporate or governmental, modern surveillance systems ride on the back of tracking infrastructure.

They don’t need to build new sensors — they repurpose the ones already embedded in your browser, your phone, your apps.

Corporate surveillance

Enterprises use tracking to:

Monitor employee behavior via endpoint telemetry, session recording, keystroke analysis
Detect insider threats using behavioral baselines derived from app usage and device posture
Implement digital workplace scoring, where performance and trust are inferred from click patterns and presence signals

The same tracking logic used to serve ads can be retooled to decide if you’re “productive enough”.

Governmental surveillance

State actors exploit tracking in multiple layers:

By subpoenaing adtech platforms, they can obtain location trails, IP-to-device mapping, and behavioral fingerprints
Passive data collection (from public DNS logs, TLS metadata, etc.) allows for network-wide pattern analysis
Covert trackers embedded in state-controlled media or compromised apps can leak operational metadata in hostile environments

In some cases, corporate tracking directly feeds national intelligence systems (see: China’s data fusion practices, the NSA’s XKeyscore, or India’s Aadhaar-linked web tracking).

What started as advertising telemetry has evolved into population-scale telemetry.

💡

Telemetry

Telemetry is the automatic collection and transmission of data from a system to a remote server for monitoring, analysis, or optimization.

In the context of software and devices, it includes:

App usage patterns
System performance metrics
Error reports
Device configuration and state
Interaction logs (clicks, scrolls, keystrokes, etc.)

Telemetry is often marketed as diagnostic data.

In practice, it’s a firehose of behavioural and technical information, sent continuously — often without granular consent or visibility.

The scariest part.

And the scariest part? It’s opt-out only if you’re technical. For everyone else, it’s just there — quiet, systemic, and persistent.Why It’s Everywhere

It is Invisible by Design: You’re rarely asked for real consent. And even when you are, the tracking happens anyway — through loopholes, fingerprinting, or network metadata.
Opaque Collaboration Networks: Trackers don’t work alone. They sync identifiers across domains, share data via redirect chains, cloak domains using CNAME tricks, and respawn cookies through cache abuse.
Resilient Techniques: Blocking third-party cookies did not kill tracking. It just moved to more sophisticated methods: fingerprinting, bounce tracking, URL decoration, DNS-level correlation, TLS fingerprinting.
Incentives Architecture: The modern internet economy is based on attention extraction and behavior prediction. More tracking leads to better models, which leads to more engagement, which ultimately leads to more profit.

Why It’s a Problem

Privacy Destruction

Granular tracking makes de-anonymization trivial. Even without your name, behavioral patterns are unique enough to identify and profile you precisely.

Algorithmic Discrimination

Prices, visibility, and access can change based on who the algorithm thinks you are. The logic is opaque. The impact is real.

Manipulation

Tracking enables cognitive shaping: bubble filters, nudging, hyperpersonalized content. You don’t choose what you see — it’s chosen for you.

Security Surface

Tracking creates attack surfaces: correlation vectors, lateral channels, forensic identifiers, and potential exfiltration points.

Tracking IS NOT Analytics

It’s important to distinguish: not all user measurement is invasive.

There are privacy-respecting analytics tools (e.g. Plausible, self-hosted Matomo) that don’t fingerprint, don’t use cookies, don’t correlate sessions.

But in most real-world cases, “analytics” is a euphemism for behavioral profiling pipelines.

More About Telemetry

Mozilla Firefox

In this experiment, I will do something very straightforward. I will use BURP Community Edition to intercept the traffic from a Firefox instance.

The browser will not be instructed to open any page. There will be no user activity, no search queries, no bookmark clicks. Just a clean launch.

Objectives

I just want to observe and analyze what Firefox does on its own, right after being opened:

What domains are contacted?
What headers or payloads are sent?
Which services get pinged before I even type a URL?

Setup

System: macOS (clean profile)

Tool: Burp Suite CE

Firefox Version: Stable release, default settings

Proxy Configuration: manual proxy to 127.0.0.1:8080

Cert: Burp CA imported and trusted via Keychain Access

Startup Mode: new profile, launched via --ProfileManager to ensure isolation

Initial Traffic Snapshot

The image below shows what happens immediately after launching Firefox, with no user activity.

As you can see, the browser sends multiple GET requests to:

/canonical.html
/success.txt?ipv4

These are directed to detectportal.firefox.com, Mozilla’s portal detection service.

Despite being innocuous in appearance, this behavior:

Happens without consent
Uses both HTTPS and plaintext HTTP
Leaks device presence on the network (especially over insecure Wi-Fi)

Even worse: it repeats — Firefox loops the portal check periodically while open.

canonical.html

Request

GET /canonical.html HTTP/1.1
Host: detectportal.firefox.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:139.0) Gecko/20100101 Firefox/139.0
Accept: */*
Accept-Language: en-GB,en;q=0.5
Accept-Encoding: gzip, deflate, br
Cache-Control: no-cache
Pragma: no-cache
Connection: keep-alive

Response

HTTP/1.1 200 OK
Server: nginx
Content-Length: 90
Via: 1.1 google
Date: Tue, 17 Jun 2025 17:55:46 GMT
Age: 52249
Content-Type: text/html
Cache-Control: public,must-revalidate,max-age=0,s-maxage=3600

<meta http-equiv="refresh" content="0;url=https://support.mozilla.org/kb/captive-portal"/>

success.txt

Request

GET /success.txt?ipv4 HTTP/1.1
Host: detectportal.firefox.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:139.0) Gecko/20100101 Firefox/139.0
Accept: */*
Accept-Language: en-GB,en;q=0.5
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
Priority: u=4
Pragma: no-cache
Cache-Control: no-cache

Response

HTTP/1.1 200 OK
Server: nginx
Content-Length: 8
Via: 1.1 google
Date: Tue, 17 Jun 2025 19:28:55 GMT
Age: 46661
Content-Type: text/plain
Cache-Control: public,must-revalidate,max-age=0,s-maxage=3600

success

Google Chrome

With Google Chrome, the situation is radically different.

Even without user input, the browser launches with Google.com as its homepage — which means immediate and automatic tracking via:

multiple Google domains (google.com, gstatic.com, googleapis.com, etc.)
embedded service calls (ads, analytics, autofill sync, safe browsing)
JSON calls to OpenAI, extensions, and suggestion engines

I will not dissect the traffic here — not because it’s uninteresting, but because the volume of calls, third-party redirects, and embedded service interactions is nontrivial to map linearly.

Understanding Chrome’s behaviour requires correlating multiple telemetry layers, which will be the focus of a dedicated analysis later in the series.

For now, it suffices to say:

You don’t use Chrome. Chrome uses you.

Safari

Identifying Safari’s telemetry on this machine requires a completely different approach.

Unlike Chrome or Firefox, Safari relies heavily on system-level services (nsurlsessiond, locationd, apsd, suggestd, etc.), many of which operate outside the browser’s visible context and don’t send traffic directly under Safari’s name.

Additionally, most of its networking is tightly integrated with:

Apple’s private relay mechanisms (if enabled)
iCloud session sync and Siri suggestions
Launchd-driven agents, some of which trigger on startup regardless of user activity

For these reasons, a clean analysis of Safari’s background behavior requires low-level system inspection (e.g., lsof, tcpdump, or Little Snitch logs) — not just proxy interception.

I will write a dedicated article about this soon… The rabbit hole is deeper than it looks.

Other Browsers

Given the current market share and telemetry transparency (or lack thereof), I will not consider Opera in this investigation.

Despite its Chromium base, Opera routes a significant amount of traffic through its own infrastructure — including VPN-like proxies and extension bundles — making it a separate case entirely.

And honestly: no one serious in security uses Opera.

As for Microsoft Edge:

I promise I’ll dig into its artifacts as soon as I have a Windows machine under my fingers.

I promise.

(Even if that means borrowing one from a corporate graveyard.)

For this time, That’s all, folks. Have fun. But be a little scary, it’s cool!

Tracking: Foundations of a Systemic Theft

Table of contents

What is Tracking

Why It Exists (and Thrives)

Reason 1. Targeted Advertising

Reason 2. Retention & Optimization

Reason 3. Machine Learning Fuel

Reason 4. Surveillance & Control

Corporate surveillance

Governmental surveillance

The scariest part.

Why It’s a Problem

Privacy Destruction

Algorithmic Discrimination

Manipulation

Security Surface

Tracking IS NOT Analytics

More About Telemetry

Mozilla Firefox

Objectives

Setup

Initial Traffic Snapshot

canonical.html

Request

Response

success.txt

Request

Response

Google Chrome

Safari

Other Browsers

Subscribe to my newsletter

Gabriele Biondo

Gabriele Biondo