Introduction — hunting for assurance and OSINT

Exposed SafeBase portals serve two audiences at once: prospects who need proof of your security posture, and security researchers (blue and red teamers). Each green tick reveals a control that (supposedly) exists today; every missing tick is an equally loud hint at what doesn’t.

The detection method outlined in this post does three things:

Harvests every control title directly from the HTML body using Nuclei in headless mode.
Normalises those titles into machine‑readable findings via a single, regex based template
Outputs JSON/CLI matches of internal security controls

Why collect this data?

Compliance frameworks like ISO 27001, SOC 2, PCI-DSS, and the growing privacy regulations in different regions require "evidence" that security controls are in place and functioning. Safebase offers a convenient way to provide this without endless email exchanges for proof or documents.

Scraping your own public trust center might seem unnecessary since you created it, but there are other good reasons to do so:

Continuous assurance – Auditors increasingly prefer ongoing evidence.
Policy titles, groupings, and even control names can change after a platform update or acquisition. Automated scraping can detect these changes before customers notice.
Third-party mappings – Converting the list into JSON allows you to cross-reference against control libraries (e.g., NIST 800-53) and automatically prove coverage.
For blue teams, this ensures that public claims remain accurate; for red teams, it provides a quick comparison with your own control library to identify "weaknesses."

Detection

While you can view the page manually, creating a script is easier when doing this on a large scale. Therefore, I decided to use ProjectDiscovery / Nuclei for detection.which is a powerful and efficient scanning tool. Nuclei operates using an engine driven by YAML templates. This tool is particularly well-suited for our needs because it allows for rapid scanning and detection of potential issues or changes. By utilising Nuclei, we can create custom templates that are tailored to our specific requirements, ensuring that our detection process is both comprehensive and precise.

Matchers

The template includes 70 matchers. The full list is in the repo, but here is a glimpse of what the matcher looks like (utilising regex) for “Acceptable use Policy” and “Privilege Escalation Process”:

      - type: regex
        name: firewall
        regex: ['(?i)Firewall(?:[\s\S]{0,600}?data-testid="enabled")?']

      - type: regex
        name: incident-response-policy
        regex: ['(?i)Incident\s+Response\s+Policy(?:[\s\S]{0,600}?data-testid="enabled")?']

There are a few regex tricks I utilised to prevent any issues when matching:

Problem	Fix
Different capitalisation	`(?i)` flag
Mixed runs of spaces / new‑lines	`\s+` between words
Unknown gap between title and icon	`[\s\S]{0,600}?`

Eliminating False Positives

I don't anticipate any false positives with the addition of this extra DSL matcher:

      - type: dsl
        name: names
        dsl:
          - 'status_code == 200'
          - 'contains_any(body, "Powered by SafeBase")'
        condition: and

This DSL checks two things, and both must be TRUE for it to be considered:

Whether the target server returns a status code of 200.
Whether Powered by SafeBase is included in the HTML body.

Bypassing Cloudflare WAF

In every case where I encountered Safebase, I faced a Cloudflare WAF that blocked my web scraping tests. Although there are different techniques like padding and residential proxies, I found that using Nuclei’s Headless capabilities worked best. This method is more accessible to everyone without needing to pay for an additional service for it work.

The headless step (Chromium + 1‑second sleep) executes the Managed Challenge JavaScript which then proceeds to the Safebase page.

With the page in hand we can start pattern‑matching.

headless:
  - steps:
      - action: navigate
        args:
          url: "{{BaseURL}}"
      - action: sleep
        args:
          duration: 1s

navigate – launches a full Chromium instance (bundled with Nuclei) and loads the target portal. {{BaseURL}} is substituted with each URL in your scan list.
sleep 1s – pauses execution so any JavaScript (including Cloudflare’s Managed Challenge) can run, cookies can be set, and the page’s dynamic content renders. One second is plenty for most portals, but you can raise it if your first runs return incomplete HTML.

If you are not seeing results when you know you should be then your IP is likely being blocked, I’d recommend using a proxy if you run into this issue.

Scanning at Scale

Now onto the best part, scanning infrastructure at scale to obtain the information.

1. Build your target list

# domains.txt – one URL per line
https://trust.gitlab.com
https://security.projectdiscovery.io
https://trust.your‑vendor‑here.com

You can also add existing hosts to this input list, gathered from your subdomain enumeration or web crawling operations.

Example scan against security.projectdiscovery.io

2. Fire up Nuclei with concurrency

nuclei \
  -t safebase-checks-enabled.yaml \
  -headless \
  -l matches.txt \
  -c 50                 # 50 parallel browsers | will be resource intensive so adjust accordingly :D
  -o safebase-scan.json  # structured output for SIEM/CI | You can also use txt outputs

-l feeds the file; Nuclei spawns a headless Chromium per target up to -c concurrency.
Output can be JSON, CSV or stdout. Ship it to S3, Slack or your SIEM for alerting.

The template can be found here:
https://github.com/rxerium/internal-security-detect

3. Cron it in CI/CD

You can automate this process using GitHub Actions:

jobs:
  safebase-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: |
          nuclei -t safebase-checks-enabled.yaml -headless -l domains.txt -c 50 -o results.json
      - name: Upload artefact
        uses: actions/upload-artifact@v4
        with:
          name: safebase-results
          path: results.json

Final thoughts

While this technique is limited to organisations running SafeBase, it is a powerful example of how passive OSINT can intersect with compliance, assurance, and detection goals. By leveraging publicly accessible portals security researchers can gain insight into control frameworks, view internal security claims, and automate change detection without ever needing login credentials or direct access.

Internal Security Detection

Table of contents