How To Use APIs To Stop Crawlers Harm My Site


You ever check your logs and think,
“Why the hell is OpenAI crawling me 20,000 times a day?”
I did. And what followed was a wake-up call about how broken bot detection really is — and how to fix it properly with two simple API integrations: BotDetect and GeoIP ASN Lookup.
Let’s break it down.
🚨 The Problem: Fake Bots Everywhere
I run a relatively high-traffic site: ~20,000 real human visits/day.
One week, my logs blew up.
Over 500,000 bot requests per day
Most pretending to be:
"OpenAI-UserAgent"
"Googlebot/2.1"
"bingbot"
But my analytics (and common sense) told a different story:
Only a handful of real visits from OpenAI or Googlebot.
Then came the bigger hit:
😡 AdSense Policy Center flagged me with:
vbnetCopyEditCrawler – Unknown Error
Status: Restricted Ad Serving
Boom. Revenue dropped.
Why? Turns out I was unknowingly blocking Google's own Media Partner bots that don’t use the traditional Googlebot IP ranges.
🔍 Step 1 – IP-Based Bot Verification with BotDetect API
The first step was dropping the naive approach of checking User-Agent
headers (which are laughably easy to fake).
I switched to BotDetect API:
✅ How it works:
Send the visitor’s IP (optionally User-Agent)
It tells you:
Is this a known bot?
Which vendor (Google, OpenAI, Bing, etc)?
Is the IP in their official IP space?
🧪 Example:
bashCopyEditPOST /api/bot-detect
{
"ip": "203.0.113.45",
"ua": "OpenAI-UserAgent"
}
Response:
jsonCopyEdit{
"is_bot": true,
"vendor": "openai",
"verified": false
}
✅ If not verified → it’s a fake.
In our case?
99.2% of the “OpenAI” traffic was spoofed from cheap VPS IPs.
🧠 Step 2 – Avoid Overblocking With GeoIP ASN Lookup
Great — now we were blocking all the fakes.
But suddenly, AdSense still wasn’t happy.
Here’s the twist:
Google Media Partners use non-Googlebot IPs to validate ads and policy.
They don’t show up in Googlebot CIDRs.
So I added a second check:
✅ GeoIP ASN Lookup.
🔎 It tells you:
What ASN the IP belongs to (e.g., AS15169 for Google)
Which org owns it (Google LLC, Amazon, Hetzner, etc)
Country and netblock metadata
Example:
jsonCopyEdit{
"asn": "AS15169",
"org": "Google LLC",
"country": "US"
}
So now, if BotDetect fails but ASN is still Google
, we let it through.
🧰 The Final Logic
phpCopyEdit$response = callBotDetect($ip, $ua);
$asn = callGeoIPASN($ip);
if (!$response['verified'] && !str_contains($asn['org'], 'Google')) {
blockOrRateLimit($ip);
} else {
allowThrough($ip);
}
📊 The Results
Metric | Before | After |
Total bot traffic | ~500K/day | ~20K/day |
Real OpenAI hits | ~0.8% of all traffic | Accurately verified |
AdSense status | Restricted Ad Serving | ✅ Clean |
Revenue | Declining | ✅ Recovered |
Server load | Overwhelmed | ✅ Normalized |
💡 Key Takeaways
Never trust
User-Agent
stringsIP verification is essential
But ASN-level checks prevent overblocking
AdSense and crawlers are more fragile than they seem
Combine multiple data points before making traffic decisions
🛠 Tools Used
🔍 BotDetect API
IP-based verification for Google, OpenAI, Bing, etc.
🌐 GeoIP ASN Lookup API
Returns org, ASN, and metadata for any IP.
🧪 Try It Yourself
Want to test your own traffic?
Or start hitting the APIs directly with any language.
Subscribe to my newsletter
Read articles from Nia Rouseberg directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
