Many AI models claim to be open source, but often they only share partial code, impose restrictive licenses (e.g., Llama, MosaicML, Gemma), or release just the final product with no transparency. In reality, “open source” is sometimes just a marketing tactic.
“Open source” should mean transparency, collaboration, and freedom, but in AI it’s often improperly used. Companies name models open while hiding key parts; that is open‑source washing.
This matters because unclear or custom licenses can create legal and compliance risks, make businesses dependent on a single vendor, and compromises companies’ reputation. In practice, many so-called 'open' releases share only partial components of the model, limiting transparency, keep training data and code private, use licenses that limit how you can use results, and disclose little about how the model was built or tested.
What is open-source washing?
Open-source washing is a misleading marketing practice where companies present its AI models or software as “open-source” but does not fully share the source code, data or use restrictive licenses.
How Open-Source Washing Happens in AI?
There are several common tactics that companies apply:
Restrictive use of licenses
- Meta Llama (3.x / 2): Meta’s Llama Community License allows broad use but adds conditions: e.g., organisations with >700M monthly active users must request a special license, and there are brand/attribution and acceptable‑use requirements. It’s a custom license (not OSI‑approved), so despite “open” positioning, it’s not open source.
- Meta’s LLaMA 3.x models are restricted in the European Union due to regulatory concerns. The EU’s AI Act and GDPR require high levels of transparency, accountability, and data protection, which create compliance challenges for advanced AI systems. To avoid legal risks, Meta has chosen not to release multimodal versions of Llama in the EU, limiting availability to text-only models under specific licensing terms. These restrictions are not technical but regulatory, reflecting concerns about sensitive data and responsible AI use.Google calls Gemma “open”, but its license adds restrictions that break true open-source rules (such as: It adds obligations to enforce Google’s restrictions on anyone you share the model with, which breaks the rule of free redistribution or you cannot freely distribute modified versions without following Google’s terms). It limits how the model can be used and shared, which goes against the principle of free use. In short, Gemma is open-weight, not fully open-source.
Lack of transparency in development process
- Mistral (e.g., Mistral 7B, Mixtral 8×7B): Mistral has explicitly stated it can’t share trainingdata details “due to the highly competitive nature of the field,” limiting visibility into dataset composition and its implications.
- Meta Llama 3 (and Llama 2): training data is not revealed in detail; license is not OSI‑approved. Even with partial evaluation notes, the community can’t fully audit data provenance or reproduce training.
- Google Gemma (Gemma 3): Google uses a custom license with extra rules, and its documentation doesn’t fully reveal what data was used or how the model was trained. Because of this lack of transparency, Gemma doesn’t meet true open-source standards.
Dual licensing
- Meta Llama (3.x / 2)- Meta provides Llama under a Community License but requires a separate, special license if your product or your affiliates exceed 700 million monthly active users—a use‑based gate that shifts you to a different commercial arrangement. This effectively creates two licensing tracks for the same model.
- Stability AI – Stable Diffusion (SDXL / SD3 family)- Stability offers a Community License that’s free only if an individual or organisation earns less than $1 million a year. Anyone making more than that must buy an Enterprise license. The company calls this “flexible licensing,” but in reality, it gives different access to the same models based on revenue.
How can we fix this?
It is not enough to label a release as “open” while holding critical components and information. To truly solve the challenges around transparency in AI, organisations must embrace original open-source principles. By embracing these values, organizations encourage progress, trust and ensure that openness is more than a marketing label:
- Check open-source AI models releases should include the source code, training data, configuration files, license documentation also include environment setup guides.
- Perform independent reviews or audits to verify the openness of AI models.
- Involve open-source communities in decision-making around model release practices and support forums for feedback, issue tracking, and collaborative improvement.
- Share case studies of successful open-source AI projects that enabled innovation and collaboration.