ML Tools#

AI Explainability 360#

The AI Explainability 360 toolkit is an open-source library that supports interpretability and explainability of datasets and machine learning models. The AI Explainability 360 Python package includes a comprehensive set of algorithms that cover different dimensions of explanations along with proxy explainability metrics.

It is OSS from IBM (so apache2.0) so mind the history of openness IBM has regarding OSS product development. The documentation can be found here: https://aix360.readthedocs.io/en/latest/

Item

Value

SBB License

Apache License 2.0

Core Technology

Python

Project URL

http://aix360.mybluemix.net/

Source Location

IBM/AIX360

Tag(s)

Data analytics, ML, ML Tool, Python

Apollo#

Apollo is a high performance, flexible architecture which accelerates the development, testing, and deployment of Autonomous Vehicles.

Apollo 2.0 supports vehicles autonomously driving on simple urban roads. Vehicles are able to cruise on roads safely, avoid collisions with obstacles, stop at traffic lights, and change lanes if needed to reach their destination.

Apollo 5.5 enhances the complex urban road autonomous driving capabilities of previous Apollo releases, by introducing curb-to-curb driving support. With this new addition, Apollo is now a leap closer to fully autonomous urban road driving. The car has complete 360-degree visibility, along with upgraded perception deep learning model and a brand new prediction model to handle the changing conditions of complex road and junction scenarios, making the car more secure and aware.

Item

Value

SBB License

Apache License 2.0

Core Technology

C++

Project URL

http://apollo.auto/

Source Location

ApolloAuto/apollo

Tag(s)

ML, ML Tool

Data Science Version Control (DVC)#

Data Science Version Control or DVC is an open-source tool for data science and machine learning projects. With a simple and flexible Git-like architecture and interface it helps data scientists:

  1. manage machine learning models – versioning, including data sets and transformations (scripts) that were used to generate models;
  2. make projects reproducible;
  3. make projects shareable;
  4. manage experiments with branching and metrics tracking;

It aims to replace tools like Excel and Docs that are being commonly used as a knowledge repo and a ledger for the team, ad-hoc scripts to track and move deploy different model versions, ad-hoc data file suffixes and prefixes.

Item

Value

SBB License

Apache License 2.0

Core Technology

Python

Project URL

https://dvc.org/

Source Location

iterative/dvc

Tag(s)

ML, ML Tool, Python

Espresso#

Espresso is an open-source, modular, extensible end-to-end neural automatic speech recognition (ASR) toolkit based on the deep learning library PyTorch and the popular neural machine translation toolkit fairseq. Espresso supports distributed training across GPUs and computing nodes, and features various decoding approaches commonly employed in ASR, including look-ahead word-based language model fusion, for which a fast, parallelized decoder is implemented.

Research paper can be found at https://arxiv.org/pdf/1909.08723.pdf

Item

Value

SBB License

MIT License

Core Technology

Python

Project URL

freewym/espresso

Source Location

freewym/espresso

Tag(s)

ML, ML Tool, Python, speech recognition

EuclidesDB#

EuclidesDB is a multi-model machine learning feature database that is tight coupled with PyTorch and provides a backend for including and querying data on the model feature space. Some features of EuclidesDB are listed below:

  • Written in C++ for performance;
  • Uses protobuf for data serialization;
  • Uses gRPC for communication;
  • LevelDB integration for database serialization;
  • Many indexing methods implemented (Annoy, Faiss, etc);
  • Tight PyTorch integration through libtorch;
  • Easy integration for new custom fine-tuned models;
  • Easy client language binding generation;
  • Free and open-source with permissive license;

Item

Value

SBB License

Apache License 2.0

Core Technology

CPP

Project URL

https://euclidesdb.readthedocs.io/en/latest/index.html

Source Location

perone/euclidesdb

Tag(s)

ML, ML Tool

Fabrik#

Fabrik is an online collaborative platform to build, visualize and train deep learning models via a simple drag-and-drop interface. It allows researchers to collaboratively develop and debug models using a web GUI that supports importing, editing and exporting networks written in widely popular frameworks like Caffe, Keras, and TensorFlow.

Item

Value

SBB License

GNU General Public License (GPL) 3.0

Core Technology

Javascript, Python

Source Location

Cloud-CV/Fabrik

Tag(s)

Data Visualization, ML, ML Tool

Face_recognition#

The world’s simplest facial recognition api for Python and the command line.

Recognize and manipulate faces from Python or from the command line with the world’s simplest face recognition library.

Built using dlib‘s state-of-the-art face recognition built with deep learning. The model has an accuracy of 99.38% on the Labeled Faces in the Wild benchmark.

This also provides a simple face_recognition command line tool that lets you do face recognition on a folder of images from the command line!

Full API documentation can be found here: https://face-recognition.readthedocs.io/en/latest/

Git quick-scan report:

  • Date of git statics quick-scan report: 2019/12/19
  • Number of files in the git repository: 96
  • Total Lines of Code (of all files): 70415 total
  • Most recent commit in this repository: Tue Dec 3 16:53:45 2019 +0530
  • Number of authors:33

First commit info:

  • Author: Adam Geitgey
  • Date: Fri Mar 3 16:29:23 2017 -0800

Item

Value

SBB License

MIT License

Core Technology

Python

Project URL

ageitgey/face_recognition

Source Location

ageitgey/face_recognition

Tag(s)

Computer vision, face detection, ML, ML Tool, Python

Guild AI#

Guild AI is an open source toolkit that automates and optimizes machine learning experiments.

  • Run unmodified training scripts, capturing each run result as a unique experiment
  • Automate trials using grid search, random search, and Bayesian optimization
  • Compare and analyze runs to understand and improve models
  • Backup training related operations such as data preparation and test
  • Archive runs to S3 or other remote systems
  • Run operations remotely on cloud accelerators
  • Package and distribute models for easy reproducibility

Item

Value

SBB License

Apache License 2.0

Core Technology

Python

Project URL

https://guild.ai/

Source Location

guildai/guildai

Tag(s)

ML Tool

Kedro#

Kedro is a workflow development tool that helps you build data pipelines that are robust, scalable, deployable, reproducible and versioned. We provide a standard approach so that you can:

  • spend more time building your data pipeline,
  • worry less about how to write production-ready code,
  • standardise the way that your team collaborates across your project,
  • work more efficiently.

Features:

  • A standard and easy-to-use project template, allowing your collaborators to spend less time understanding how you’ve set up your analytics project
  • Data abstraction, managing how you load and save data so that you don’t have to worry about the reproducibility of your code in different environments
  • Configuration management, helping you keep credentials out of your code base
  • Pipeline visualisation with Kedro-Viz:(https://github.com/quantumblacklabs/kedro-viz) making it easy to see how your data pipeline is constructed
  • Seamless packaging, allowing you to ship your projects to production, e.g. using Docker (https://github.com/quantumblacklabs/kedro-docker) or Kedro-Airflow (https://github.com/quantumblacklabs/kedro-airflow)
  • Versioning for your datasets and machine learning models whenever your pipeline runs

Features:

  • A standard and easy-to-use project template, allowing your collaborators to spend less time understanding how you’ve set up your analytics project
  • Data abstraction, managing how you load and save data so that you don’t have to worry about the reproducibility of your code in different environments
  • Configuration management, helping you keep credentials out of your code base
  • Pipeline visualisation with [Kedro-Viz](https://github.com/quantumblacklabs/kedro-viz) making it easy to see how your data pipeline is constructed
  • Seamless packaging, allowing you to ship your projects to production, e.g. using [Kedro-Docker](https://github.com/quantumblacklabs/kedro-docker) or [Kedro-Airflow](https://github.com/quantumblacklabs/kedro-airflow)
  • Versioning for your data sets and machine learning models whenever your pipeline runs

Documentation on: https://kedro.readthedocs.io/

The REACT visualization for Kedro is on: https://github.com/quantumblacklabs/kedro-viz

Item

Value

SBB License

Apache License 2.0

Core Technology

Python

Project URL

quantumblacklabs/kedro

Source Location

quantumblacklabs/kedro

Tag(s)

ML, ML Tool, Python

Ludwig#

Ludwig is a toolbox built on top of TensorFlow that allows to train and test deep learning models without the need to write code. Ludwig provides two main functionalities: training models and using them to predict. It is based on datatype abstraction, so that the same data preprocessing and postprocessing will be performed on different datasets that share data types and the same encoding and decoding models developed for one task can be reused for different tasks.

All you need to provide is a CSV file containing your data, a list of columns to use as inputs, and a list of columns to use as outputs, Ludwig will do the rest. Simple commands can be used to train models both locally and in a distributed way, and to use them to predict on new data.

A programmatic API is also available in order to use Ludwig from your python code. A suite of visualization tools allows you to analyze models’ training and test performance and to compare them.

Ludwig is built with extensibility principles in mind and is based on data type abstractions, making it easy to add support for new data types as well as new model architectures.

It can be used by practitioners to quickly train and test deep learning models as well as by researchers to obtain strong baselines to compare against and have an experimentation setting that ensures comparability by performing standard data preprocessing and visualization.

Item

Value

SBB License

Apache License 2.0

Core Technology

Python

Project URL

https://ludwig.ai/latest/

Source Location

uber/ludwig

Tag(s)

ML, ML Tool

makesense.ai#

makesense.ai is a free to use online tool for labelling photos. Thanks to the use of a browser it does not require any complicated installation – just visit the website and you are ready to go. It also doesn’t matter which operating system you’re running on – we do our best to be truly cross-platform. It is perfect for small computer vision deeplearning projects, making the process of preparing a dataset much easier and faster.

Item

Value

SBB License

GNU General Public License (GPL) 3.0

Core Technology

Typescript

Project URL

https://www.makesense.ai/

Source Location

SkalskiP/make-sense

Tag(s)

Computer vision, ML, ML Tool, Photos

MLflow#

MLflow offers a way to simplify ML development by making it easy to track, reproduce, manage, and deploy models. MLflow (currently in alpha) is an open source platform designed to manage the entire machine learning lifecycle and work with any machine learning library. It offers:

  • Record and query experiments: code, data, config, results
  • Packaging format for reproducible runs on any platform
  • General format for sending models to diverse deploy tools

Item

Value

SBB License

Apache License 2.0

Core Technology

Python

Project URL

https://mlflow.org/

Source Location

mlflow/mlflow

Tag(s)

ML, ML Tool, Python

MLPerf#

A broad ML benchmark suite for measuring performance of ML software frameworks, ML hardware accelerators, and ML cloud platforms.

The MLPerf effort aims to build a common set of benchmarks that enables the machine learning (ML) field to measure system performance for both training and inference from mobile devices to cloud services. We believe that a widely accepted benchmark suite will benefit the entire community, including researchers, developers, builders of machine learning frameworks, cloud service providers, hardware manufacturers, application providers, and end users.

Item

Value

SBB License

MIT License

Core Technology

Python

Project URL

https://mlperf.org/

Source Location

mlperf/reference

Tag(s)

ML, ML Tool, Performance

Model Card Toolkit (MCT)#

A FOSS ML Toolkit by Google ML Research. See also the blog article on https://ai.googleblog.com/2020/07/introducing-model-card-toolkit-for.html

The Model Card Toolkit (MCT) streamlines and automates generation of Model Cards [1], machine learning documents that provide context and transparency into a model’s development and performance. Integrating the MCT into your ML pipeline enables the sharing model metadata and metrics with researchers, developers, reporters, and more.

Some use cases of model cards include:

  • Facilitating the exchange of information between model builders and product developers.
  • Informing users of ML models to make better-informed decisions about how to use them (or how not to use them).
  • Providing model information required for effective public oversight and accountability.

Item

Value

SBB License

Apache License 2.0

Core Technology

Python

Project URL

tensorflow/model-card-toolkit

Source Location

tensorflow/model-card-toolkit

Tag(s)

ML Tool

ModelDB#

A system to manage machine learning models.

ModelDB is an end-to-end system to manage machine learning models. It ingests models and associated metadata as models are being trained, stores model data in a structured format, and surfaces it through a web-frontend for rich querying. ModelDB can be used with any ML environment via the ModelDB Light API. ModelDB native clients can be used for advanced support in spark.ml and scikit-learn.

The ModelDB frontend provides rich summaries and graphs showing model data. The frontend provides functionality to slice and dice this data along various attributes (e.g. operations like filter by hyperparameter, group by datasets) and to build custom charts showing model performance.

Item

Value

SBB License

MIT License

Core Technology

Python, Javascript

Project URL

VertaAI/modeldb

Source Location

mitdbg/modeldb

Tag(s)

Administration, ML, ML Tool

Netron#

Netron is a viewer for neural network, deep learning and machine learning models.

Netron supports ONNX (.onnx, .pb), Keras (.h5, .keras), CoreML (.mlmodel) and TensorFlow Lite (.tflite). Netron has experimental support for Caffe (.caffemodel), Caffe2 (predict_net.pb), MXNet (-symbol.json), TensorFlow.js (model.json, .pb) and TensorFlow (.pb, .meta).

Item

Value

SBB License

GNU General Public License (GPL) 2.0

Core Technology

Python, Javascript

Project URL

https://www.lutzroeder.com/ai/

Source Location

lutzroeder/Netron

Tag(s)

Data viewer, ML, ML Tool

NLP Architect#

NLP Architect is an open-source Python library for exploring the state-of-the-art deep learning topologies and techniques for natural language processing and natural language understanding. It is intended to be a platform for future research and collaboration.

Features:

  • Core NLP models used in many NLP tasks and useful in many NLP applications
  • Novel NLU models showcasing novel topologies and techniques
  • Optimized NLP/NLU models showcasing different optimization algorithms on neural NLP/NLU models
  • Model-oriented design:
    • Train and run models from command-line.
    • API for using models for inference in python.
    • Procedures to define custom processes for training, inference or anything related to processing.
    • CLI sub-system for running procedures
  • Based on optimized Deep Learning frameworks:
  • Essential utilities for working with NLP models – Text/String pre-processing, IO, data-manipulation, metrics, embeddings.

Item

Value

SBB License

Apache License 2.0

Core Technology

Python

Project URL

http://nlp_architect.nervanasys.com/

Source Location

NervanaSystems/nlp-architect

Tag(s)

ML, ML Tool, NLP, Python

ONNX#

ONNX provides an open source format for AI models. It defines an extensible computation graph model, as well as definitions of built-in operators and standard data types. Initially we focus on the capabilities needed for inferencing (evaluation).

Open Neural Network Exchange (ONNX) is an open standard format for representing machine learning models. ONNX is supported by a community of partners who have implemented it in many frameworks and tools.

Caffe2, PyTorch, Microsoft Cognitive Toolkit, Apache MXNet and other tools are developing ONNX support. Enabling interoperability between different frameworks and streamlining the path from research to production will increase the speed of innovation in the AI community. We are an early stage and we invite the community to submit feedback and help us further evolve ONNX.

Companies behind ONNX are AWS, Facebook and Microsoft Corporation and more.

Item

Value

SBB License

MIT License

Core Technology

Python

Project URL

http://onnx.ai/

Source Location

onnx/onnx

Tag(s)

ML, ML Tool

OpenML#

OpenML is an on-line machine learning platform for sharing and organizing data, machine learning algorithms and experiments. It claims to be designed to create a frictionless, networked ecosystem, so that you can readily integrate into your existing processes/code/environments. It also allows people from all over the world to collaborate and build directly on each other’s latest ideas, data and results, irrespective of the tools and infrastructure they happen to use. So nice ideas to build an open science movement. The people behind OpemML are mostly (data)scientist. So using this product for real world business use cases will take some extra effort.

Altrhough OpenML is exposed as an foundation based on openness, a quick inspection learned that the OpenML platform  is not as open as you want. Also the OSS software is not created to be run on premise. So be aware when doing large (time) investments into this OpenML platform.

Item

Value

SBB License

BSD License 2.0 (3-clause, New or Revised) License

Core Technology

Java

Project URL

https://openml.org

Source Location

openml/OpenML

Tag(s)

ML, ML Tool

Orange#

Orange is a comprehensive, component-based software suite for machine learning and data mining, developed at Bioinformatics Laboratory.

Orange is available by default on Anaconda Navigator dashboard. Orange is a component-based data mining software. It includes a range of data visualization, exploration, preprocessing and modeling techniques. It can be used through a nice and intuitive user interface or, for more advanced users, as a module for the Python programming language.

One of the nice features is the option for visual programming. Can you do visual interactive data exploration for rapid qualitative analysis with clean visualizations. The graphic user interface allows you to focus on exploratory data analysis instead of coding, while clever defaults make fast prototyping of a data analysis workflow extremely easy.

 

 

Item

Value

SBB License

GNU General Public License (GPL) 3.0

Core Technology

Project URL

https://orange.biolab.si/

Source Location

biolab/orange3

Tag(s)

Data Visualization, ML, ML Tool, Python

PySyft#

A library for encrypted, privacy preserving deep learning. PySyft is a Python library for secure, private Deep Learning. PySyft decouples private data from model training, using Multi-Party Computation (MPC) within PyTorch. View the paper on Arxiv.

Item

Value

SBB License

Apache License 2.0

Core Technology

Python

Project URL

OpenMined/PySyft

Source Location

OpenMined/PySyft

Tag(s)

ML Tool, Python, Security, Security Tools

RAPIDS#

The RAPIDS suite of software libraries gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization, but exposes that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.

RAPIDS also focuses on common data preparation tasks for analytics and data science. This includes a familiar DataFrame API that integrates with a variety of machine learning algorithms for end-to-end pipeline accelerations without paying typical serialization costs–. RAPIDS also includes support for multi-node, multi-GPU deployments, enabling vastly accelerated processing and training on much larger dataset sizes.

Item

Value

SBB License

Apache License 2.0

Core Technology

C++

Project URL

http://rapids.ai/

Source Location

rapidsai/

Tag(s)

ML, ML Hosting, ML Tool

SHAP#

SHAP (SHapley Additive exPlanations) is a unified approach to explain the output of any machine learning model. SHAP connects game theory with local explanations, uniting several previous methods [1-7] and representing the only possible consistent and locally accurate additive feature attribution method based on expectations (see our papers for details and citations).

There are also sample notebooks that demonstrate different use cases for SHAP in the github repro.

Item

Value

SBB License

MIT License

Core Technology

Python

Project URL

slundberg/shap

Source Location

slundberg/shap

Tag(s)

ML, ML Tool

Snorkel#

Snorkel is a system for rapidly creating, modeling, and managing training data, currently focused on accelerating the development of structured or “dark” data extraction applications for domains in which large labeled training sets are not available or easy to obtain.

Item

Value

SBB License

Apache License 2.0

Core Technology

Python

Project URL

https://www.snorkel.org/

Source Location

HazyResearch/snorkel

Tag(s)

ML, ML Tool

Streamlit#

The fastest way to build custom ML tools. Streamlit lets you create apps for your machine learning projects with deceptively simple Python scripts. It supports hot-reloading, so your app updates live as you edit and save your file. No need to mess with HTTP requests, HTML, JavaScript, etc. All you need is your favorite editor and a browser.

Documentation on: https://streamlit.io/docs/

Item

Value

SBB License

Apache License 2.0

Core Technology

Javascipt, Python

Project URL

https://streamlit.io/

Source Location

streamlit/streamlit

Tag(s)

ML, ML Framework, ML Hosting, ML Tool, Python

TensorWatch#

TensorWatch is a debugging and visualization tool designed for data science, deep learning and reinforcement learning from Microsoft Research. It works in Jupyter Notebook to show real-time visualizations of your machine learning training and perform several other key analysis tasks for your models and data.

TensorWatch is designed to be flexible and extensible so you can also build your own custom visualizations, UIs, and dashboards. Besides traditional “what-you-see-is-what-you-log” approach, it also has a unique capability to execute arbitrary queries against your live ML training process, return a stream as a result of the query and view this stream using your choice of a visualizer (we call this Lazy Logging Mode).

TensorWatch is under heavy development with a goal of providing a platform for debugging machine learning in one easy to use, extensible, and hackable package.

Item

Value

SBB License

MIT License

Core Technology

Python

Project URL

microsoft/tensorwatch

Source Location

microsoft/tensorwatch

Tag(s)

ML, ML Tool

VisualDL#

VisualDL is an open-source cross-framework web dashboard that richly visualizes the performance and data flowing through your neural network training. VisualDL is a deep learning visualization tool that can help design deep learning jobs. It includes features such as scalar, parameter distribution, model structure and image visualization.

Item

Value

SBB License

Apache License 2.0

Core Technology

C++

Project URL

http://visualdl.paddlepaddle.org/

Source Location

PaddlePaddle/VisualDL

Tag(s)

ML, ML Tool

What-If Tool#

The What-If Tool (WIT) provides an easy-to-use interface for expanding understanding of a black-box ML model. With the plugin, you can perform inference on a large set of examples and immediately visualize the results in a variety of ways. Additionally, examples can be edited manually or programatically and re-run through the model in order to see the results of the changes. It contains tooling for investigating model performance and fairness over subsets of a dataset.

The purpose of the tool is that give people a simple, intuitive, and powerful way to play with a trained ML model on a set of data through a visual interface with absolutely no code required.

Item

Value

SBB License

Apache License 2.0

Core Technology

Python

Project URL

https://pair-code.github.io/what-if-tool/

Source Location

tensorflow/tensorboard

Tag(s)

ML, ML Tool

What-If Tool#

Visually probe the behavior of trained machine learning models, with minimal coding.

The What-If Tool (WIT) provides an easy-to-use interface for expanding understanding of a black-box classification or regression ML model. With the plugin, you can perform inference on a large set of examples and immediately visualize the results in a variety of ways. Additionally, examples can be edited manually or programmatically and re-run through the model in order to see the results of the changes. It contains tooling for investigating model performance and fairness over subsets of a dataset.

The purpose of the tool is that give people a simple, intuitive, and powerful way to play with a trained ML model on a set of data through a visual interface with absolutely no code required.

The tool can be accessed through TensorBoard or as an extension in a Jupyter or Colab notebook.

Item

Value

SBB License

Apache License 2.0

Core Technology

Python (notebooks)

Project URL

https://pair-code.github.io/what-if-tool/

Source Location

PAIR-code/what-if-tool

Tag(s)

ML Tool

End of SBB list