Machine learning (ML) is a rapidly advancing technology, made possible by the Internet, that already has significant impacts on our everyday lives. With the use of Machine learning you can solve challenging problems that impact everyone around the world. Machine Learning (ML) and Artificial Intelligence (AI) are rapidly emerging technologies that have the potential to change our world with speed that humankind has never experienced before.

Machine Learning and Artificial Intelligence are not the same, although the current technologies developed for ML do help research and developments on AI. ML can be characterized with a stricter definition from an engineering perspective. Trying to define AI raises more philosophical discussions on what intelligence is. This publication is focused on Free and Open machine learning. But beware that the terms machine learning and artificial intelligence are intertwined and many so called AI applications are in fact driven by machine learning technology.

You should be aware of the commercial buzz and fads surrounding AI and ML: Machine Learning, deep learning and a lot of tools developed are not ‘a universal solvent’ for solving all current problems. There is no magic machine learning tool or method yet that can solve all your complex challenges. Machine learning is just a tool to solve a certain type of problems. Maybe in future the use of machine learning can be applied to a broader landscape of problems than currently possible. But do not try to solve all your problems with one (new)technology or toolset.

Artificial Intelligence and Machine Learning are now again in the forefront of global discourse, garnering increased attention from practitioners, industry leaders, policymakers, and the general public.

But despite the hype and money invested in machine learning technology the recent 5 years, one big questions remains: Can machine learning technology help us to solve hard and complex business problems like climate change, health welfare for all humans and other urgent problems?

This publication gives you a reality check. You learn what is easily possible using new machine learning technologies and tools, what the current potential is and what still remains wishful thinking for the future. We like transparency, so we focus solely on free and open machine learning technologies.

Hope and Hype

Innovation needs openness. This is also valid for machine learning technologies. Without real openness new developments and innovations in machine learning are impossible. As a practitioners in your business domain and with your unique expertise you can start making a difference. This publication gives you a starting point for trying to apply free and open machine learning technology on your unique use cases.

What is covered in this book?

Nowadays many people are talking about the transformative power of machine learning and how it will revolutionize the economy, but what does that mean for your business and how do you start? How to get solid independent advice to learn and how to apply machine learning? Can you improve or disrupt your business using FOSS machine learning tools that are widely available? This book gives you an introduction to get started with applying FOSS machine learning.

Machine learning concepts are mostly taught by academics for academics. That’s why most learning material is dry and maths heavy. The theory behind machine learning is great, but requires also a very deep understanding of statistics and math. There is a large gap between theory and practice. Practice counts, because in a practical business context you want to determine if you can solve your problems with machine learning tools. Or at minimum do a short and cost efficient run to determine if a project has potential and more investments make sense.

To apply machine learning for real business use cases other skills besides some feelings for statistics and math are required. You need e.g. be able to have some knowledge about all typical IT things that are still needed before you can make use of the new paradigm that machine learning brings.

This publication is created for applying free and open machine learning in practice for real world use cases. This is where the rubber meets the road. So the core focus is on the ‘How’ questions. Key concepts are outlined and a conceptual and logical reference architecture for free and open machine learning architecture is given. This to empower you to make use of FOSS machine learning technology in a simple and efficient way.

The field of machine learning is making rapid progress. Do you know what kind of applications for direct business use are already possible today? Are you aware of the currently low entry barriers that exist, to take direct advantage of machine learning? Is your knowledge of free and open source solutions available in the machine learning eco system up to date? How do you classify safety, security and privacy risk when using machine learning? These and other relevant questions for using machine learning in a business context are the foundation of this book.

Within the FOSS machine learning domain new toolsets, applications and companies are being created on a daily basis. So it is difficult to get a hold on what ML applications are viable, and which are a hype, fads or simply a hoax. Especially when the terms ML and AI are intertwined. This publication guides you through tangible working open source machine learning software.

The mentioned FOSS machine learning software building blocks in this publication are used at large. For real business use cases, and maybe with large similarities for your use case. And because a lot of ML software and tools needed is based on open source software(FOSS), solutions and tools available can be studied and improved.

Given that machine learning tools and techniques are already an increasingly part of our everyday lives, it is crucial for professionals in the IT industry to gain more knowledge on machine learning. You should start asking critical questions and maybe try to do some simple experiments. What will you do with machine learning tools and applications the coming 3 years? Are you really aware of the safety and privacy concerns evolving that are part of this technology? Do you really understand and control the working?

This publication is all about taking advantage of the new FOSS machine learning technologies for your business. The major machine learning concepts are explained, but the main emphasis of this book is to give insights in the various possibilities that are available within the open source machine learning ecosystem. This so you can start applying machine learning in your business today, without hidden dependencies or unknown strings attached towards a vendor or cloud hosting provider.

This publication gives an overview of all important FOSS machine learning frameworks and FOSS machine learning support tools that you can use for prototyping or for real business use cases and production systems.

This publication does not explain and dive into the statistics and deep mathematical algorithms behind machine learning. Also the algebra functions that form the foundation under machine learning algorithms and software libraries are only explained if needed for practical use and experiments. If you are interested in learning the mathematical foundations on which machine learning is developed, you can find good free and open material in the reference section of this book.

This publication aims to cover the high level machine learning concepts and gives you information to get started to work with free and open machine learning for your business use case.

So this publication is concentrated on machine learning aspects where software, business and technology touch each other.

Domains touching

(* When we write Open Source Software or OSS in this report we explicitly mean FOSS as defined by the Free Software Foundation - )

Who should read this book?

This book is created for everyone who wants to learn and get started with machine learning without being already forced into a specific solution. Creating Machine learning applications is possible with the use of FOSS building blocks only and on premise. So you do not need to use directly expensive Cloud infrastructure or commercial software packages. So if you like IT architecture, simple concepts and want to be empowered to play with machine learning and create your own solution, then this publication is for you.

This book is primary written with software developers, system administrators, security architects, privacy controllers, IT managers, directors, business owners, system engineers, quality managers, IT architects and other curious people interested in open technologies in mind.

This book crucial outlines machine learning concepts, but will not go into mathematical or technical details. But after reading this book you will have a more complete and realistic overview of the possibilities applying machine learning (ML) for your use cases.

Why another book on Machine Learning?

There are many books, courses and tutorials that teach you what machine learning is. However most of these books and courses are focused on hands-on learning and requires you to program. Also many books are focused on explaining concepts without a clear focus on how tools can be used to solve real business use cases. Also a publication that is truly open and is focused on the broad landscape that is needed for Free and Open Machine learning was simply not available.

Despite the enormous buzz and attention for machine learning it is proven to be hard to apply machine learning for real profitable use cases. Applying machine learning starts with understanding the core concepts, business architecture needs, constraints and insights in the technology components that are present. Also some notion of the typical pitfalls and challenges for applying machine learning for business use is needed.

Is Machine Learning complex?

You might get the impression when visiting presentations from commercial vendors that machine learning is simple. The hard work is already done and all you have to do is get your credit card and make use of the incredible machine learning cloud offering. This machine learning as a service (MaaS) takes your company to the next level and the advise of the sales consultant is clear: Using their MaaS service is so simple that entering your credit card number is probably the hardest part. Maybe it takes a minute, maybe more. But in the end you discover that solving problems using machine learning is not that simple after all. The great offerings of many large and small vendors selling MaaS from a fantastic cloud offering do not solve your business problem in a simple way. As with all new technologies and especially IT technology: There are over promises on advantages and getting the return on your investments is not simple. You are confronted with complex terminology, a machine learning back-box from your vendor that is of course great at billing, data collection and data cleaning problems you had never heard of, and security, privacy and even safety issues. And if you think it can not get worse also legal and ethical issues will slow your project down.

By using an open approach (tools, methods, datasets) for machine learning a lot of risks can be mitigated. E.g. it is easier to control spending in the important ramp up phase of your project. If you need more performance you can always move hosting to a cloud platform in a later stage. But you need to start with a flexible and scalable architecture that is no limitation for future goals.

There have been tremendous advances made in making machine learning more accessible over the past few years. This publication outlines some great OSS applications ready to be used, even if you really hate difficult mathematical formulas. Multiple developments are in progress that now really make it possible to drop your data and let a complex machine learning algorithm do the hard work.

But don’t be fooled. Even solving only ‘some type of problems’ using machine learning tools is a relatively ‘hard’ problem. So only equipped with the right knowledge, tools and resources it is possible to get results. Solving soft business problems with machine learning requires far more than a good computer scientist alone. Using machine learning for soft problems requires a variety of disciples and a lot of creativity, experimentation and tenacity.

Organization of this book

The topics explored in this publication include:

  • Why Free and Open Machine Learning. This section outlines why we all should promote and advocate for openness and freedom regarding this promising technology.

  • What is Machine Learning. This is the section to read if you are short on time and want a simple outline of complex machine learning concepts.

  • Machine Learning for business problems. New technologies come with new opportunities for innovation. This section outlines common business use cases that are possible today using machine learning technology.

  • Machine learning Reference architecture. Starting with machine learning can be overwhelming. This section gives an overview of the business and technology aspects that you face when applying machine learning for real business use cases. But this section also helps you with developing your machine learning solution architecture.

  • Security, Privacy and Safety. The things you do not see are often the most important aspects. Security, Privacy and safety are very complex to deal with for normal IT solutions. But for machine learning these non functional aspects must be taken into your design upfront from a system perspective. This section outlines the key aspects for security, privacy and safety you should be aware of when creating machine learning applications.

  • Natural language processing (NLP). Hard to solve speech and text processing problems are now far more easily solved using machine learning algorithms. This section outlines still on of the most used applications for machine learning: NLP.

  • Machine learning implementation challenges: Knowing what machine learning can do and how it works is no guarantee that creating an machine learning application succeeds. The failure rate of normal IT projects are already very high for decades. Machine learning projects are complex and risky. This sections gives guidance on avoiding pitfalls when applying machine learning for real business use.

  • FOSS System Building Blocks for machine learning. This publication presents an opinionated list of FOSS software building blocks that can be used when creating machine learning applications. Starting with FOSS machine learning building blocks means you start with no strings attached. Switching to cloud hosting solutions later is always possible, but machine learning needs experimentation and playing. With open data and open tools.

  • Learning Resources. Some very good learning resources for machine learning and NLP are open. So licensed using a creative commons license. After reading this publication a next step can be to dive in depth into a specific machine learning aspect, framework or technology. This section provides references to open learning resources, including references to hands-on tutorials.

Errata, updates and support

We made serious efforts to create a first readable version of this book. However if you notice typos, spelling and grammar errors please notify us so we can improve this publication. You can create a pull request on github or simply send an email to us.

Since the world of machine learning is rapidly evolving this book will be continuously updated. That’s why there is an open on-line version of this book available that always incorporates the latest updates.


If like to contribute to promote the Free and Open Machine Learning principles and to make this book better: Please CONTRIBUTE! See the HELP section.