oam/knowledge base/ai/ml.md

# Machine learning

Branch of [AI] focusing on developing models and algorithms that can learn patterns from data without being explicitly
programmed for every task, and subsequently make accurate inferences about new data.

It is a pattern recognition ability that enables models to make decisions or predictions without explicit, hard-coded
instructions.

<!-- Remove this line to uncomment if used
## Table of contents <!-- omit in toc -->

1. [TL;DR](#tldr)
1. [Approaches](#approaches)
   1. [Deep learning](#deep-learning)
1. [Architectures](#architectures)
   1. [Mixture of Experts](#mixture-of-experts)
1. [Further readings](#further-readings)
   1. [Sources](#sources)

## TL;DR

All machine learning is AI, but not all AI is machine learning.

Rules-based models become increasingly brittle the more data is added to them.<br/>
They require accurate, universal criteria to define the results they need to achieve. This is not scalable.

ML models operate by a logic that is learned through experience, and not explicitly programmed into them.<br/>
They train by analyzing data and predicting the next result; prediction errors are calculated, and the algorithm is
adjusted to reduce the possibility of errors.<br/>
The training process is repeated until the model is accurate.

ML works through mathematical logic. Relevant characteristics (A.K.A. _features_) of each data point **must** be
expressed numerically, so that the data can be fed into the mathematical algorithm that will _learn_ to map a given
input to the desired output.

ML is mainly divided into the following types:

- _Supervised_ learning.

  Models learn from _labelled_ data. Every input has a corresponding _correct_ output.<br/>
  Models make predictions, compare those with the true outputs, and adjust themselves to reduce errors and improve
  accuracy over time.<br/>
  The goal is to train to make accurate predictions on new, unseen data.

- _Unsupervised_ learning.

  Models work **without** labelled data.<br/>
  They learn patterns on their own by grouping similar data points or finding hidden structures without human
  intervention.<br/>
  Helps identify hidden patterns in data. Useful for grouping, compression and anomaly detection.<br/>
  Used for tasks like clustering, dimensionality reduction and Association Rule Learning.

- _Reinforcement_ learning.

  Teaches agents to make decisions through trial and error to maximize cumulative rewards.<br/>
  Allows machines to learn by interacting with an environment and receiving feedback based on their actions. This
  feedback comes in the form of rewards or penalties.<br/>
  Agents use the feedback to optimize their decision-making over time.

- _Semi-supervised_ learning.

  Hybrid machine learning approach using both supervised and unsupervised learning.<br/>
  Uses a small amount of labelled data, combined with a large amount of unlabelled data to train models.<br/>
  The goal is to learn a function that accurately predicts outputs based on inputs, like with supervised learning, but
  with much less labelled data.<br/>
  Particularly valuable when acquiring labelled data is expensive or time-consuming, yet unlabelled data is plentiful
  and easy to collect.

- _Self-supervised_ learning.

  Subset of unsupervised learning.<br/>
  Models train using data that does not have any labels or answers provided. Instead of needing people to label the
  data, the models themselves find patterns and create their own labels from the data automatically.<br/>
  Especially useful when there is a lot of data, but only a small part of it is labelled or labelling the data would
  take a lot of time and effort.

_Deep learning_ has emerged as the state-of-the-art approach for AI models across nearly every domain.<br/>
It relies on distributed _networks_ of mathematical operations providing the ability to learn intricate nuances of very
complex data.<br/>
It requires very large amounts of data and computational resources.

## Approaches

### Deep learning

Approach in which multiple layers of nodes (a _deep_ neural network) can extract meaning, relationships, and other
complex patterns from large volumes of raw (unstructured and unlabeled) data and make their own predictions about what
the data represents.<br/>
They were initially created with the idea of closely simulating the human brain.

Deep neural networks include:

- An input layer.
- 3 or more (now usually hundreds) of hidden layers.
- An output layer.

The multiple layers enable **unsupervised** learning.

Deep learning encompasses a range of neural network architectures, including multi-layer perceptrons (MLPs),
convolutional neural networks (CNNs), recurrent neural networks (RNNs), graph networks, and transformers.<br/>
Results are usually applied to domains like computer vision, natural language processing, and robotics.

CNNs showed to be ideal for image and video recognition, including medical imaging.<br/>
LSTMs and RNNs excel in sequence prediction, language translation, and speech recognition.
Generative adversarial networks (GANs) enabled the generation of realistic images and AI-driven art.

## Architectures

### Mixture of Experts

Divides a single model into multiple, specialized sub-networks (_experts_) along with a learned routing mechanism
(_gate_ or _router_) that dynamically selects which experts to activate for any given input.<br/>
Inference only leverages a small subset of experts at any time, typically 1 or 2 out of all of them.

It allows to build models with a very large **total** number of parameters, but only activate a fraction of them per
input.<br/>
This makes them more efficient to pre-train and run.

A small router network:

1. Takes the input.
1. Produces a probability distribution over the available experts.
1. Selects the top-k experts.

Training MoE models requires balancing to prevent the router from always routing to the same few experts, and possibly
ensuring experts get roughly equal use instead.<br/>
All expert weights still need to be stored and loaded in memory.

MoE is used across many domains, including vision models, multimodal models, and speech recognition and recommendation
systems.

## Further readings

- [Mixtral of Experts]

### Sources

- geeksforgeeks.com's [Machine Learning Tutorial][geeksforgeeks / machine learning tutorial]
- IBM's [What is machine learning?][ibm / what is machine learning?]
- Oracle's [What is machine learning?][oracle / what is machine learning?]
- [Machine learning, explained]
- IBM's [What is mixture of experts?][ibm / what is mixture of experts?]
- [Adaptive Mixtures of Local Experts]
- [IBM / What is artificial intelligence (AI)?]

<!--
  Reference
  ═╬═Time══
  -->

<!-- In-article sections -->
<!-- Knowledge base -->
[AI]: README.md

<!-- Files -->
[Adaptive Mixtures of Local Experts]: study%20material/JacobsJordanNowlanHinton_NeuralComputation_1991.pdf

<!-- Upstream -->
<!-- Others -->
[geeksforgeeks / Machine Learning Tutorial]: https://www.geeksforgeeks.org/machine-learning/
[IBM / What is artificial intelligence (AI)?]: https://www.ibm.com/think/topics/artificial-intelligence
[IBM / What is machine learning?]: https://www.ibm.com/think/topics/machine-learning
[IBM / What is mixture of experts?]: https://www.ibm.com/think/topics/mixture-of-experts
[Machine learning, explained]: https://mitsloan.mit.edu/ideas-made-to-matter/machine-learning-explained
[Mixtral of Experts]: https://arxiv.org/abs/2401.04088
[Oracle / What is machine learning?]: https://www.oracle.com/artificial-intelligence/machine-learning/what-is-machine-learning/