SBOMgen: A Fully automated SBOM generator with added vulnerability detection

Divij SharmaDivij Sharma
11 min read

Hello readers! Hope you all are doing great :)

I'm filled with immense pride and a touch of awe as I share with you a creation that's very dear to us. It's a product of not just code and concepts, but of collective wisdom and spirited teamwork. Born in the bustling innovation arena of the Smart India Hackathon, we present to you "SBOMgen: A Fully Automated SBOM Generator with Added Vulnerability Detection”.

Problem Statement:

We took on an exciting challenge, SIH1449, from the National Technical Research Organisation (NTRO). Our goal was clear yet complex: develop a tool to generate a detailed software bill of materials (SBOM) for custom software, including in-house developments by various organizations. Additionally, they asked for a tool capable of identifying vulnerabilities in the used dependencies, a kind of checker which can highlight security flaws.

Before all this, what exactly is SBOM? Imagine you're given a complex gadget and you want to know what's inside it. An SBOM, or Software Bill of Materials, is like a detailed list that tells you every single part that makes up a piece of software. Just like a recipe shows you all the ingredients you need to make a dish, an SBOM shows you all the components, libraries, and bits of code that are used to build software. It's a crucial tool for understanding what's under the hood of the software we use every day, especially when it comes to spotting any potential vulnerabilities that could be hiding in those components. Below code block shows a view of SBOM in JSON format.

{
    "bomFormat": "CycloneDX",
    "specVersion": "1.5",
    "version": "1",
    "metadata": {
        "timestamp": "2024-01-15T16:55:11.431824",
        "component": {
            "group": "",
            "name": "Drawn2Shoe",
            "version": "0.0.0",
            "type": "application"
        }
    },
    "components": [
        {
            "group": "@mapbox",
            "name": "node-pre-gyp",
            "version": "1.0.11",
            "scope": "optional",
            "purl": "pkg:npm/%40mapbox%2Fnode-pre-gyp@1.0.11",
            "type": "library",
            "bom-ref": "pkg:npm/@mapbox/node-pre-gyp@1.0.11",
            "evidence": {
                "identity": {
                    "field": "purl",
                    "confidence": 1,
                    "methods": [
                        {
                            "technique": "manifest-analysis",
                            "confidence": 1,
                            "value": "/home/dvjsharma/Dev/Drawn2Shoe/server/package-lock.json"
                        }
                    ]
                }
            },
            "properties": [
                {
                    "name": "SrcFile",
                    "value": "/home/dvjsharma/Dev/Drawn2Shoe/server/package-lock.json"
                }
            ]
        }
    ],
    "services": [],
    "dependencies": [
        {
            "ref": "pkg:npm/@mapbox/node-pre-gyp@1.0.11",
            "dependsOn": [
                "pkg:npm/detect-libc@2.0.2",
                "pkg:npm/https-proxy-agent@5.0.1",
                "pkg:npm/make-dir@3.1.0",
                "pkg:npm/node-fetch@2.7.0",
                "pkg:npm/nopt@5.0.0",
                "pkg:npm/npmlog@5.0.1",
                "pkg:npm/rimraf@3.0.2",
                "pkg:npm/semver@6.3.1",
                "pkg:npm/semver@7.5.4",
                "pkg:npm/tar@6.2.0"
            ]
        }
    ]
}

Please note: For simplicity and clarity, the above SBOM displays only a single component and dependency. In reality, SBOMs are comprehensive documents encompassing numerous components and dependencies, often reaching considerable size and complexity. Find the complete SBOM here.

With this SBOM, security professionals can anticipate potential vulnerabilities in components or dependencies, facilitating a thorough security analysis of the software. Additionally, it's worth noting that SBOMs come in various formats, and for this project, we opted for the widely recognized CycloneDX format endorsed by OWASP.

Now that we understand what SBOM is, we can better grasp the problem statement. The goal was to create a comprehensive tool that can generate SBOM and check for potential vulnerabilities. Let's proceed to the project structure!

Project Overview:

Now, as this is a project for industrial use, various factors and use cases need to be taken care of -

  • Multi Language Support: For SBOM generation, the safest method is to scan the lock file of any language/package manager and retrieve all the information in a specific format. Since many industry standard software are written in multiple languages (some might use multiple languages in a single software, e.g. React for frontend and a Python based backend), multi-language support becomes an important use case to ensure that the generated SBOM contains complete information.

  • Vulnerability Detection: A method should be implemented to scan the entire SBOM, generate a vulnerability report, and notify project managers about the known vulnerabilities in a timely manner.

  • Automation: Given the time-consuming nature of SBOM generation, a solution should be devised to automate the process of SBOM generation and vulnerability detection at regular intervals. This will save project managers from the manual task of triggering the action and ensure long-term project security (essentially for the projects which are not maintained or whose repos are dead but software is still in use). Additionally, relevant personnel should be promptly notified about any detected vulnerabilities to take subsequent actions.

  • Internet Access: Since this project is for NTRO, there is a high likelihood that it will be deployed in situations where the internet is not available, particularly for security-sensitive projects. Therefore, the solution must be designed to accommodate both scenarios: users with internet access and users without internet access.

Keeping all these things in mind, following structure was developed

Global package & CLI Tool

Considering the demands of the project, we developed a CLI tool which is capable of generating SBOM and supports following languages and package managers

Every particular parser searches for its lock or similar file throughout the entire project directory in a recursive manner, analyzes it and modifies it to a particular format, and appends the generated SBOM to a global SBOM object which in turn is written into a file. This ensures that all parsers in the list are executed for the input software, guaranteeing that no language and its dependencies are excluded from the SBOM, even for the projects which have multiple languages.

The development of these parsers presented unique challenges due to the lack of consistency in formats across different package managers. Each package manager has its own set of rules, resulting in varied formats and content within lock files. Additionally, some managers do not generate lock files at all, necessitating the entire project to be built in order to extract dependencies. This is why we were unable to support dependencies or sub-dependencies in certain package managers. We are constantly exploring methods to extract them without the need to build the project, as this would require extensive scripting and resources, making it less practical for deployment on action servers.

The choice of scripting language played a crucial role in developing the parsers. We needed a fast language, and after considering Python and Go, we opted for Python. This decision was influenced by the availability of extensive libraries and a familiar syntax, making the development process more efficient.

Furthermore, to ease the process of SBOM generation, we also exported scripts as a global pip packages which can be easily installed in any OS (we made sure multi-os support is provided with the help of python libraries).

## For Actions and CI/CD (has less features)
pip install sbomgen
sbomgen --help
## For CLI based users (has all features)
pip install sbomgencli
pip install owasp-depscan # to downlaod NVD security database locally
sbomgen --help

This global package serves as the central hub for the entire project's workflow. This was a significant breakthrough, providing us with a streamlined way to automate the process using Github Actions (which will be covered later). The beauty of this is that the packages can be effortlessly installed on any remote machine with a simple pip install command.

Additionally, CLI tool is designed so as to support the following features:

  • -p flag is used to input project path, both absolute and relative paths work fine.

  • -f flag is used to input SBOM output format, it can be either JSON or XML.

  • --vul flag, if specified generates the vulnerability report of the repository.

  • --tree flag, if specified generates the dependency tree of the repository.

  • --check flag, if specified generates a report which lists out all the outdated dependencies with the available newer version.

    (not available on pip package)

You can access all sample pictures here

Additionally, if none of them are entered, a nice display message appears which asks for the same inputs. Zip files are also supported.

Access the CLI tool repository here

Automation: Github Actions & Cron Jobs

As previously mentioned, automating the SBOM generation and vulnerability detection process is extremely important. Consider a scenario where the software is being used in an industrial setting but the repository is no longer being maintained. In such cases, regular scanning and detection of vulnerabilities becomes crucial in order to ensure software security. To address this, we have developed the following system.

  • Github Actions: As GitHub serves as a widely adopted repository for software projects, implementing a check becomes invaluable. We've set up an automated process that triggers the SBOM generation and vulnerability detection workflow upon any new push or pull request. This proactive approach ensures that project managers receive notifications about potential vulnerabilities before merging the request into the main codebase. In the GitHub Actions configuration, we've specified an Ubuntu machine that clones the repository and utilizes our global pip package to execute the SBOM generation and vulnerability detection. The generated reports are stored in GitHub Artifacts and, upon each trigger, sent to the project manager via email. This robust setup guarantees that any code with potential vulnerabilities is identified and addressed before it mingles with the clean codebase.

  • Cron Jobs: While GitHub Actions effectively prevents the introduction of new code vulnerabilities with each push, the need to periodically check existing code for potential vulnerabilities remains crucial. To address this, we've implemented cron jobs. These scheduled tasks run at specified intervals, independently examining the existing codebase for any vulnerabilities that may have arisen over time. By complementing the real-time checks performed with each push, these periodic cron jobs provide a comprehensive approach to maintaining the security of the entire code repository, ensuring that any vulnerabilities in the existing code are promptly identified and addressed.

Access the Config YAML files for the actions here

Graphical User Interface:

Although this project is primarily designed as a command-line interface (CLI) tool, incorporating a graphical user interface (GUI) adds an extra layer of versatility. We will be exploring two distinct scenarios to cater to a broader user base. The first scenario involves users without internet access, requiring a self-contained solution. In contrast, the second scenario is tailored for users with internet access, allowing for more dynamic interactions and real-time updates. This dual approach ensures that the tool remains accessible and functional in diverse environments, providing a seamless experience for users with varying connectivity conditions, especially for cases where security is a concern and softwares are used locally, on a private network.

Github Based GUI design : To carter users with internet access

This app specifically targets project managers, providing them with a secure way to log in and access SBOM and vulnerability reports. The app operates on the following architecture.

In this innovative architecture, the traditional reliance on a database is entirely replaced by a dynamic system where data is securely fetched from GitHub Artifacts using GitHub APIs. But how does the magic happen? The actions and cron jobs we deployed in the previous section play a crucial role. These automated processes diligently store all the generated reports in GitHub Artifacts, creating a repository of valuable insights for a specific period. It's like having a secure vault of information right at our fingertips! Leveraging GitHub's security. Moreover, the backend exclusively retrieves specific requested data, minimizing unnecessary access and ensuring a focused, streamlined interaction with GitHub Artifacts. This not only reinforces the project's security measures but also optimizes the efficiency of data retrieval, aligning with a more tailored and secure environment. However, how do we decide who can access specific reports? This is where GitHub Authentication becomes important. Once implemented, project managers must register using GitHub and they will be able to execute actions on all the repositories where they have write access.

So in a nutshell, we are not utilizing our local resources to create the SBOM and vulnerability reports, but instead using GitHub as an intermediary in a highly secure manner via triggering actions to perform the tasks for us, store our results which we retrieve via APIs, and display them on the frontend. This make it very easy to set up the app, as the dependency requirement is minimum. User just has to set up a basic MERN App. Meanwhile, since Github servers are very fast, the process also becomes very quick.

Tech Stack
Our setup includes a React frontend with Redux for state management, backed by Express for the server. GitHub Authentication ensures secure access. For styling, we've kept it clean with TailwindCSS and incorporated various React libraries for design and UI enhancements.

Local App: To carter users without internet access

This dedicated app is designed for catering to users without internet access, and due to its nature, it has more dependency requirements. It necessitates the installation of the sbomgen and depscan package. Although the original plan was to create an electron app, due to time constraints, we were only able to develop a web app. The application is crafted to function without internet access, the entire vulnerability database is downloaded during the depscan installation process. Users simply need to input the repository path, and the complete SBOM and vulnerability reports will be generated and stored in a local SQL database for future reference. Additionally, a user-friendly UI has been developed to display all the reports and offer various functionalities such as sorting and searching.

Tech Stack
Frontend built with React, backed by an Express server. Utilizing a MySQL database to store SBOM information.

Enhancements & Conclusion:

In conclusion, SBOMgen stands as a testament to the power of collaboration and innovation. Born from the challenges posed by the Smart India Hackathon, our fully automated SBOM Generator with Added Vulnerability Detection addresses the critical need for comprehensive software transparency and security.

Our journey does not end here, we remain committed to refining, expanding, and adapting SBOMgen to meet the evolving needs of the software development community. Thank you for joining us on this exciting journey, and we look forward to a future where transparency and security go hand in hand in the realm of software development. Currently the following ideas are under development

  • Currently, we support multiple languages and package managers, but we still need to research a way to generate SBOM with compiled images, not just raw files.

  • The tool should support scanning of Debian packages, Docker images, and operating systems.

  • Currently, we support the most popular scenarios, one with Github and another with a local app. However, a new idea needs to be considered for scenarios that don't have both.

  • Frontend definitely can be revamped :)

If you have any suggestions on the project, feel free to hit me up on Twitter or LinkedIn.

Thanks for reading!

P.S. Thanks to the awesome team without which this wouldn't have been possible, Divyansh, Akash, Gaurangi, Rishikesh & Ajay. Special thanks to Shivkant Sir & Priyansh Sir for continuous help and guidance.


14
Subscribe to my newsletter

Read articles from Divij Sharma directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Divij Sharma
Divij Sharma

Passionate about open source development and writing. SIH'23 Finalist, Codeforces specialist with 3 stars at CodeChef. Currently studying Computer Science at IIIT-J (Class of '26).