Open ML/AI Definitions#

General accepted definitions for open ML/AI are a must for Free and Open ML / AI systems and software. But despite several attempts there is still no general de facto accepted definition for ML/AI software and applications.

Reasons for open ML / AI#

  • Autonomy: FOSS ML/AI helps to develop and maintain software and models that suits your needs. Commercial ML/AI software is great, but your business goals are never exactly the same as the business goals of your vendor.

  • Share & Copy: FOSS ML/AI gives you the freedom to run,install and use the solution they way you want. Without being worried about higher payments. Calculating and predicting cost when using commercial LLMs has become complex. Due to the way tokens are counted and defined and weighted.

  • Collaboration: FOSS ML/AI can be shared and used in a non-exclusive way by everyone, and serves the public good.

  • No Lock-in: FOSS ML/AI reinforces independence from vendors and provides choice in service providers for maintaining and hosting the software.

  • Innovation: FOSS ML/AI encourages innovation which we all benefit from.

  • Security & Privacy: FOSS ML/AI provides transparency needed to lower security and privacy risks.

Problems for defining open ML/AI#

ML/AI systems are only a bit comparable to software. So FOSS licenses do not match. Besides data more is needed to reproduce or reuse an existing trained model.

Problems with defining a real open ML/AI definition are grounded in:

  • Data and data usage:How to deal with the attribution requirement for text or images used in training data.

  • How to deal with ethical issues when not all training data can be made available, due to privacy restrictions. This accounts e.g. when medical data is used, or accounting data of companies or individuals is used.

  • What ‘open data’ licenses are acceptable to use?

  • How to deal with ethical and moral issues to prevent misuse of models? E.g. when a model can be misused to ease the process of creating bio weapons.

  • How to deal with needed transparency for all parameters used and needed to replicate, reuse or improve a model?

The OSI The Open Source AI Definition – 1.0#

The OSI 1.0 definition is currently one of the best definitions available.

The OSI Open Source AI Definition – 1.0

OSI AI Definition

A choice is made that full transparency and reproducibility are not part of this OSI definition. See the FAQ. You should have an opinion regarding this choice. It shows that defining an general useable open definition for ML/AI systems is different from software.

For over two years, the Open Source Initiative (OSI) has convened a global, multi-stakeholder process to define Open Source AI, which resulted in the release of version 1.0 of the Open Source AI Definition (OSAID). During this open process it became clear that organizations that care for open, fair and public-interest ML/AI need to pay particular attention to and establish a shared position on data sharing and data governance.

Caution

The OSI Open Source AI Definition is not perfect. It is and was heavily criticized. However it is a first step towards a better definition. The FSF is 2025 only starting a process to get to a definition. But no timeline has been set on this process.