A Software Bill of Materials (SBOM) is an inventory of open-source or commercial components used to create a product. It lists components, libraries, tools, snippets, and dependencies with the corresponding version, license, and other useful details regarding authorship, origin, and usage.

The SBOMs can help software vendors avoid the usage of software components in an uncompliant way, as software products are often created by assembling open-source and commercial software. It is also the basis for tracking and remediating security vulnerabilities in the deployed components.

What does the SBOM include?

Creating an accurate SBOM is important because it ensures a clear view of the components part of the products. Two types of SBOMs can be created:

☑️ The dependency software bill of materials contains the identification of open-source or third-party libraries added during the build process (dependencies). They are recognized via their checksum and the license is determined from the component metadata. A dependency SBOM is typically created automatically by the build system or specialized tools, in a standardized format like SPDX or CycloneDX to facilitate interchange and post-processing. The analysis is done for the main open-source or commercial license of the identified components.

☑️ The forensic software bill of materials is the result of a deeper analysis, where the source code of the product and the dependencies are analyzed for embedded open-source artifacts (code snippets or files) and licenses. Identifications are done on file levels. The creation of the forensic SBOM typically involves code scanning and requires manual work to identify the components and licenses from the data generated by the tool. Besides findings like open-source snippets, copied and integrated into the product's code, the forensic SBOM also covers modified dependencies which could generate high risks for product owners.

The Software Bill of Materials lists the OSS components, and parts required to make a product in a parent-child, top-down method. It provides a list of all items that are in parent-children relationships. When an item is a sub-component, of a (parent) component, it can in turn have its child components, and so on. The resulting top-level BOM would include children; a mix of finished sub-assemblies, various libraries, and snippets of code. A multi-level structure can be illustrated by a tree with several levels. Creating a complete forensic SBOM is important to ensure a clear view of the third-party components part of the products and to avoid license or security risks.

The compliance journey has an important role in the open-source ecosystem. One of the challenges is to identify the license associated with each open-source component and determine whether the licenses are permissive or restrictive (e.g., GPL, MIT, etc). In various scenarios, libraries may have code that is governed by copyleft licenses. Even a small code snippet, from a source like stack overflow, can have a significant impact on the entire software due to the strong copyleft effect. The most important is to identify these risks timely and fix them accordingly, so the final software product is compliant from a legal point of view.

AI and compliance

Lately, a new player joined the compliance topic, and this is the Artificial Intelligence (AI) and part of it is the software code generated by it. Many developers find help in generative AI in their daily tasks, but this comes with certain risks and challenges.  While generative AI can be a powerful tool for automating certain aspects of coding and generating code snippets, some aspects must be considered, and one of them is the license and legal topic.

If the software developer just copy-pastes the generated code, this leads to license and copyright violations or legal requirements. Software businesses must be aware of legal risks and must ensure compliance with relevant actions, like open-source scanning and SBOM creation. Only with a forensic SBOM, full transparency can be obtained about the ownership and licenses for AI-generated code.

Generative AI models are commonly trained with LMS models, using online accessible source code. This training data enables the AI system to learn coding patterns. It is essential to note that the generated code may include verbatim code copied from open-source repositories. This raises concerns about compliance, regarding open-source licenses, when integrating potentially insecure code into the product.

Having a complete forensic SBOM is more important than ever, and it should be produced as an integral part of the project’s lifecycle.

Open-source management services

Open-source management services

Ensure transparency and mitigate risks with a forensic software bill of materials (SBOM). Our experts are ready to guide you through open-source licensing intricacies and AI-generated code compliance.

Contact us

Get in Touch

Talk to our Specialists and learn how our Open-Source Management Services can help your Business.