SFA: /1 - Conceptualising SFA – Building the Foundation (Version 1)

IsaIsa
2 min read

When I first sat down to create the Single File AI-Agent (SFA), my goal was ambitious yet straightforward: to streamline the process of security analysis for smart contracts by using a fully localised AI-driven system. The inspiration for this project came directly from a YouTube video by "Single File AI Agents", which explored the idea of creating robust, modular AI solutions without external dependencies. This video sparked my interest and shaped my vision from the start.

Why SFA?

The tedious, repetitive nature of security auditing, especially in the blockchain space, convinced me there must be a better way. Traditionally, this process involved manual reading of lengthy Markdown audit reports, cross-referencing vulnerability databases, and performing in-depth smart contract audits—all of which were time-consuming and error-prone.

I aimed for a system that could:

  • Automatically download and process Markdown-formatted security audit reports from GitHub.

  • Compute semantic embeddings for quick and effective information retrieval.

  • Store structured data locally in a SQLite database to ensure privacy and autonomy.

  • Utilise a local language model (via Ollama) to summarise vulnerabilities and suggest mitigations in real-time.

Building a Strong Base

I began by setting up a local SQLite database with tables dedicated to reports and code audits:

pythonCopyEditdef init_db(db_path="vectorisation.db"):
    conn = sqlite3.connect(db_path)
    c = conn.cursor()
    c.execute('''
        CREATE TABLE IF NOT EXISTS reports (
            id TEXT PRIMARY KEY,
            source TEXT,
            content TEXT,
            overall_embedding TEXT,
            section_embeddings TEXT,
            analysis_summary TEXT,
            metadata TEXT
        )
    ''')
    conn.commit()
    conn.close()

This database ensured data integrity and made future data retrieval straightforward.

Early Challenges and Solutions

One initial challenge was handling GitHub Markdown URLs. Many links pointed to files that needed conversion from "blob" URLs to "raw" URLs for direct downloading. To handle this, I created a helper function:

pythonCopyEditdef convert_to_raw_url(url: str) -> str:
    if "github.com" in url and "/blob/" in url:
        return url.replace("github.com", "raw.githubusercontent.com").replace("/blob/", "/")
    return url

Initial Results

output of vectorised reports on 'precision loss'

Although version 1 lacked sophisticated error handling, it successfully demonstrated that the system could autonomously download Markdown files, compute embeddings using the sentence-transformers library, and store structured data locally. While rudimentary, the initial results were promising, affirming that the core concept of a fully local, AI-driven tool was viable.

So, that makes a start, see you in the next one.

pxng0lin.

0
Subscribe to my newsletter

Read articles from Isa directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Isa
Isa

Former analyst with expertise in data, forecasting, and resource modeling, transitioned to cybersecurity over the past 4 years (as of May 2024). Passionate about security and problem-solving, utilising skills in data and analysis, for cybersecurity challenges. Experience: Extensive background in data analytics, forecasting, and predictive modelling. Experience with platforms like Bugcrowd, Intigriti, and HackerOne. Transitioned to Web3 cybersecurity with Immunefi, exploring smart contract vulnerabilities. Spoken languages: English (Native, British), Arabic (Fus-ha)