How To Use APIs To Stop Crawlers Harm My Site

Nia RousebergNia Rouseberg
3 min read

You ever check your logs and think,

“Why the hell is OpenAI crawling me 20,000 times a day?”

I did. And what followed was a wake-up call about how broken bot detection really is — and how to fix it properly with two simple API integrations: BotDetect and GeoIP ASN Lookup.

Let’s break it down.


🚨 The Problem: Fake Bots Everywhere

I run a relatively high-traffic site: ~20,000 real human visits/day.

One week, my logs blew up.

  • Over 500,000 bot requests per day

  • Most pretending to be:

    • "OpenAI-UserAgent"

    • "Googlebot/2.1"

    • "bingbot"

But my analytics (and common sense) told a different story:
Only a handful of real visits from OpenAI or Googlebot.

Then came the bigger hit:

😡 AdSense Policy Center flagged me with:

vbnetCopyEditCrawler – Unknown Error  
Status: Restricted Ad Serving

Boom. Revenue dropped.

Why? Turns out I was unknowingly blocking Google's own Media Partner bots that don’t use the traditional Googlebot IP ranges.


🔍 Step 1 – IP-Based Bot Verification with BotDetect API

The first step was dropping the naive approach of checking User-Agent headers (which are laughably easy to fake).

I switched to BotDetect API:

✅ How it works:

  • Send the visitor’s IP (optionally User-Agent)

  • It tells you:

    • Is this a known bot?

    • Which vendor (Google, OpenAI, Bing, etc)?

    • Is the IP in their official IP space?

🧪 Example:

bashCopyEditPOST /api/bot-detect
{
  "ip": "203.0.113.45",
  "ua": "OpenAI-UserAgent"
}

Response:

jsonCopyEdit{
  "is_bot": true,
  "vendor": "openai",
  "verified": false
}

If not verified → it’s a fake.

In our case?
99.2% of the “OpenAI” traffic was spoofed from cheap VPS IPs.


🧠 Step 2 – Avoid Overblocking With GeoIP ASN Lookup

Great — now we were blocking all the fakes.
But suddenly, AdSense still wasn’t happy.

Here’s the twist:
Google Media Partners use non-Googlebot IPs to validate ads and policy.

They don’t show up in Googlebot CIDRs.

So I added a second check:
GeoIP ASN Lookup.

🔎 It tells you:

  • What ASN the IP belongs to (e.g., AS15169 for Google)

  • Which org owns it (Google LLC, Amazon, Hetzner, etc)

  • Country and netblock metadata

Example:

jsonCopyEdit{
  "asn": "AS15169",
  "org": "Google LLC",
  "country": "US"
}

So now, if BotDetect fails but ASN is still Google, we let it through.


🧰 The Final Logic

phpCopyEdit$response = callBotDetect($ip, $ua);
$asn = callGeoIPASN($ip);

if (!$response['verified'] && !str_contains($asn['org'], 'Google')) {
    blockOrRateLimit($ip);
} else {
    allowThrough($ip);
}

📊 The Results

MetricBeforeAfter
Total bot traffic~500K/day~20K/day
Real OpenAI hits~0.8% of all trafficAccurately verified
AdSense statusRestricted Ad Serving✅ Clean
RevenueDeclining✅ Recovered
Server loadOverwhelmed✅ Normalized

💡 Key Takeaways

  • Never trust User-Agent strings

  • IP verification is essential

  • But ASN-level checks prevent overblocking

  • AdSense and crawlers are more fragile than they seem

  • Combine multiple data points before making traffic decisions


🛠 Tools Used

  • 🔍 BotDetect API

    IP-based verification for Google, OpenAI, Bing, etc.

  • 🌐 GeoIP ASN Lookup API

    Returns org, ASN, and metadata for any IP.


🧪 Try It Yourself

Want to test your own traffic?

Or start hitting the APIs directly with any language.


0
Subscribe to my newsletter

Read articles from Nia Rouseberg directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Nia Rouseberg
Nia Rouseberg