Files
oam/knowledge base/ai/ml.md
2026-02-23 20:50:47 +01:00

7.4 KiB

Machine learning

Branch of AI focusing on developing models and algorithms that can learn patterns from data without being explicitly programmed for every task, and subsequently make accurate inferences about new data.

It is a pattern recognition ability that enables models to make decisions or predictions without explicit, hard-coded instructions.

  1. TL;DR
  2. Approaches
    1. Deep learning
  3. Architectures
    1. Mixture of Experts
  4. Further readings
    1. Sources

TL;DR

All machine learning is AI, but not all AI is machine learning.

Rules-based models become increasingly brittle the more data is added to them.
They require accurate, universal criteria to define the results they need to achieve. This is not scalable.

ML models operate by a logic that is learned through experience, and not explicitly programmed into them.
They train by analyzing data and predicting the next result; prediction errors are calculated, and the algorithm is adjusted to reduce the possibility of errors.
The training process is repeated until the model is accurate.

ML works through mathematical logic. Relevant characteristics (A.K.A. features) of each data point must be expressed numerically, so that the data can be fed into the mathematical algorithm that will learn to map a given input to the desired output.

ML is mainly divided into the following types:

  • Supervised learning.

    Models learn from labelled data. Every input has a corresponding correct output.
    Models make predictions, compare those with the true outputs, and adjust themselves to reduce errors and improve accuracy over time.
    The goal is to train to make accurate predictions on new, unseen data.

  • Unsupervised learning.

    Models work without labelled data.
    They learn patterns on their own by grouping similar data points or finding hidden structures without human intervention.
    Helps identify hidden patterns in data. Useful for grouping, compression and anomaly detection.
    Used for tasks like clustering, dimensionality reduction and Association Rule Learning.

  • Reinforcement learning.

    Teaches agents to make decisions through trial and error to maximize cumulative rewards.
    Allows machines to learn by interacting with an environment and receiving feedback based on their actions. This feedback comes in the form of rewards or penalties.
    Agents use the feedback to optimize their decision-making over time.

  • Semi-supervised learning.

    Hybrid machine learning approach using both supervised and unsupervised learning.
    Uses a small amount of labelled data, combined with a large amount of unlabelled data to train models.
    The goal is to learn a function that accurately predicts outputs based on inputs, like with supervised learning, but with much less labelled data.
    Particularly valuable when acquiring labelled data is expensive or time-consuming, yet unlabelled data is plentiful and easy to collect.

  • Self-supervised learning.

    Subset of unsupervised learning.
    Models train using data that does not have any labels or answers provided. Instead of needing people to label the data, the models themselves find patterns and create their own labels from the data automatically.
    Especially useful when there is a lot of data, but only a small part of it is labelled or labelling the data would take a lot of time and effort.

Deep learning has emerged as the state-of-the-art approach for AI models across nearly every domain.
It relies on distributed networks of mathematical operations providing the ability to learn intricate nuances of very complex data.
It requires very large amounts of data and computational resources.

Approaches

Deep learning

Approach in which multiple layers of nodes (a deep neural network) can extract meaning, relationships, and other complex patterns from large volumes of raw (unstructured and unlabeled) data and make their own predictions about what the data represents.
They were initially created with the idea of closely simulating the human brain.

Deep neural networks include:

  • An input layer.
  • 3 or more (now usually hundreds) of hidden layers.
  • An output layer.

The multiple layers enable unsupervised learning.

Deep learning encompasses a range of neural network architectures, including multi-layer perceptrons (MLPs), convolutional neural networks (CNNs), recurrent neural networks (RNNs), graph networks, and transformers.
Results are usually applied to domains like computer vision, natural language processing, and robotics.

CNNs showed to be ideal for image and video recognition, including medical imaging.
LSTMs and RNNs excel in sequence prediction, language translation, and speech recognition. Generative adversarial networks (GANs) enabled the generation of realistic images and AI-driven art.

Architectures

Mixture of Experts

Divides a single model into multiple, specialized sub-networks (experts) along with a learned routing mechanism (gate or router) that dynamically selects which experts to activate for any given input.
Inference only leverages a small subset of experts at any time, typically 1 or 2 out of all of them.

It allows to build models with a very large total number of parameters, but only activate a fraction of them per input.
This makes them more efficient to pre-train and run.

A small router network:

  1. Takes the input.
  2. Produces a probability distribution over the available experts.
  3. Selects the top-k experts.

Training MoE models requires balancing to prevent the router from always routing to the same few experts, and possibly ensuring experts get roughly equal use instead.
All expert weights still need to be stored and loaded in memory.

MoE is used across many domains, including vision models, multimodal models, and speech recognition and recommendation systems.

Further readings

Sources