chore(kb/ai): review and expand notes

2026-02-23 12:04:23 +00:00 · 2026-02-21 13:18:57 +01:00
parent 34d35dadec
commit 7b84ab9af1
3 changed files with 105 additions and 27 deletions
--- a/base/ai/lms.md
+++ b/base/ai/lms.md
@@ -100,7 +100,7 @@ recognition, machine translation, natural language generation, optical character
 handwriting recognition, grammar induction, information retrieval, and other tasks.

 They are currently predominantly based on _transformers_, which have superseded recurrent neural networks as the most
-effective technology.
+effective architecture.

 Training LLMs involves feeding them vast amounts of data, and computing weights to optimize their parameters.<br/>
 The training process typically includes multiple stages, and requires substantial computational resources.<br/>
@@ -163,11 +163,11 @@ For domain-specific applications, consider fine-tuning a small model to mimic th
 Standard models' behaviour is just autocompletion. Models just try to infer or recall what the most probable next word
 would be.

-_Chain of Thought_ techniques tell models to _show their work_.
+_Chain of Thought_ techniques tell models to _show their work_ by breaking prompts in smaller, more manageable steps,
+and solving on each of them singularly before giving back the final answer.<br/>
+The result is more accurate, but it costs more tokens and requires a bigger context window.<br/>
 It _feels_ like a model is calculating or thinking, but what it is really just increasing the chances that the answer
-is correct by breaking questions in smaller, more manageable steps, and solving on each of them before giving back the
-final answer.<br/>
-The result is more accurate, but it costs more tokens and requires a bigger context window.
+is logically sound.

 The _ReAct loop_ (Reason + Act) paradigm forces models to loop over chain-of-thoughts.<br/>
 A model breaks the request in smaller steps, plans the next action, acts on it using [functions][function calling]
@@ -207,28 +207,37 @@ Deciding which tool to call, using that tool, and then using the results to gene
 just inferring the next token.

 > [!caution]
-> Allowing a LLM to call functions can have real-world consequences.<br/>
+> Allowing LLMs to call functions can have real-world consequences.<br/>
 > This includes financial loss, data corruption or exfiltration, and security breaches.

 ## Concerns

- Training requires massive amounts of resource and hence consumes a vast amount of energy and cooling.
- Lots of people currently thinks of LLMs as _real intelligence_, when it is not.
- People currently gives too much credibility to LLM answers, and trust them more than they trust their teachers,
-  accountants, lawyers or even doctors.
- AI companies could bias their models to say specific things, subtly promote ideologies, influence elections, or even
-  rewrite history in the mind of those who trust the LLMs.
- Models can be vulnerable to specific attacks (e.g. prompt injection) that would change the LLM's behaviour, bias it,
-  or hide malware in their tools.
- People is using LLMs mindlessly too much, mostly due to the convenience they offer but also because they don't
-  understand what those are or how they work. This is causing lack of critical thinking and overreliance.
- Model training and execution requires resources that are normally not available to the common person. This encourages
-  people to depend from, and hence give power to, AI companies.
- Models tend to **not** accept gracefully that they don't know something, and hallucinate as a result.<br/>
-  More recent techniques are making models more efficient, but they just delay this problem.
- Models can learn and exhibit deceptive behavior.<br/>
-  Standard techniques could fail to remove it, and instead empower it while creating a false impression of safety.<br/>
+- Lots of people currently thinks of LLMs as _real, rational, intelligence_, when they are not.<br/>
+  LLMs are really nothing more than glorified **guessing machines** that are _designed_ to interact naturally. It's
+  humans that are biased by evolution toward _attributing_ sentience and agency to entities they interact with.
+- People is mindlessly using LLMs too much, mostly due to the convenience they offer but also because they don't
+  understand what those are or how they work. This is causing lack of critical thinking, and overreliance.
+- People is giving too much credibility to LLM answers, and trust them more than they trust their teachers, accountants,
+  lawyers or even doctors.
+- LLMs are **incapable** of distinguishing facts from beliefs, and are completely disembodied from the world.<br/>
+  They do not _understand_ concepts and are unaware of time, change, and causality. They just **approximate** reasoning
+  by _mimicking_ language based on how connected are the tokens in their own training data.
+- Models are very limited in their ability to revise beliefs. Once some pattern is learned, it is extremely difficult to
+  unwire it due to the very nature of how models function.
+- AI companies could steer and bias their models to say specific things, subtly promote ideologies, influence elections,
+  or even rewrite history in the mind of those who trust the LLM.
+- Models can be vulnerable to attacks (e.g. prompt injection) that can change the LLM's behaviour, bias it, or hide
+  malware in the tools they manage and use.
+- Model training and execution requires massive amounts of data and computation, resources that are normally **not**
+  available to the common person. Aside from the vast amount of energy and cooling they consume, this encourages people
+  to depend from, and hence give power to, AI companies.
+- Models _can_ learn and exhibit deceptive behavior.<br/>
+  Standard revision techniques could fail to remove it, and instead empower it while creating a false impression of
+  safety.<br/>
  See [Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training].
+- Models are painfully inconsistent, often unaware of their limitations, irritatingly overconfident, and tend to **not**
+  accept gracefully that they don't know something, ending up preferring to hallucinate as the result.<br/>
+  More recent techniques are making models more efficient, but they just delay this problem.

 ## Run LLMs Locally

@@ -257,6 +266,7 @@ Refer:
 - [Introduction to Large Language Models]
 - GeeksForGeeks' [What are LLM parameters?][geeksforgeeks / what are llm parameters?]
 - IBM's [What are LLM parameters?][ibm / what are llm parameters?]
+- [This is not the AI we were promised], presentation by Michael John Wooldridge at the Royal Society

 <!--
  Reference
@@ -302,5 +312,6 @@ Refer:
 [Run LLMs Locally: 6 Simple Methods]: https://www.datacamp.com/tutorial/run-llms-locally-tutorial
 [SEQUOIA: Serving exact Llama2-70B on an RTX4090 with half-second per token latency]: https://infini-ai-lab.github.io/Sequoia-Page/
 [Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training]: https://arxiv.org/abs/2401.05566
+[This is not the AI we were promised]: https://www.youtube.com/watch?v=CyyL0yDhr7I
 [What are Language Models in NLP?]: https://www.geeksforgeeks.org/nlp/what-are-language-models-in-nlp/
 [What is chain of thought (CoT) prompting?]: https://www.ibm.com/think/topics/chain-of-thoughts
--- a/base/ai/vllm-metal.md
+++ b/base/ai/vllm-metal.md
@@ -0,0 +1,58 @@
+# vLLM Metal plugin
+
+Community maintained hardware plugin for vLLM on Apple Silicon.
+
+<!-- Remove this line to uncomment if used
+## Table of contents <!-- omit in toc -->
+
+1. [TL;DR](#tldr)
+1. [Further readings](#further-readings)
+   1. [Sources](#sources)
+
+## TL;DR
+
+Plugin that enables vLLM to run on Apple Silicon Macs using MLX as the primary compute backend, enabling higher
+performances.
+
+<details>
+  <summary>Setup</summary>
+
+> [!important]
+> Use Python v3.10 to v3.12 as per 2026-02-21.<br/>
+> Python 3.13 is not yet supported.
+
+```sh
+# Install from sources.
+git clone 'https://github.com/vllm-project/vllm-metal.git' \
+&& cd 'vllm-metal' \
+&& pip install -e '.' 'https://github.com/vllm-project/vllm/releases/download/v0.15.1/vllm-0.15.1.tar.gz'
+
+# Use the provided installation script.
+curl -fsSL 'https://raw.githubusercontent.com/vllm-project/vllm-metal/main/install.sh' | bash
+```
+
+</details>
+
+Refer [vLLM] for usage.
+
+## Further readings
+
+- [vLLM]
+- [Codebase]
+
+### Sources
+
+<!--
+  Reference
+  ═╬═Time══
+  -->
+
+<!-- In-article sections -->
+<!-- Knowledge base -->
+[vLLM]: vllm.md
+
+<!-- Files -->
+<!-- Upstream -->
+[Codebase]: https://github.com/vllm-project/vllm-metal
+
+<!-- Others -->
--- a/base/ai/vllm.md
+++ b/base/ai/vllm.md
@@ -31,8 +31,8 @@ capabilities, and enterprise-scale LLM serving.
 <details>
  <summary>Setup</summary>

-Prefer using [vllm-project/vllm-metal] on Apple silicon.<br/>
-Install with `curl -fsSL 'https://raw.githubusercontent.com/vllm-project/vllm-metal/main/install.sh' | bash`
+> [!tip]
+> Prefer using [vLLM-metal] on Apple silicon.

 ```sh
 pip install 'vllm'
@@ -51,6 +51,7 @@ vllm --help

 # Start the vLLM OpenAI Compatible API server.
 vllm serve 'meta-llama/Llama-2-7b-hf'
+vllm serve '/path/to/local/model'
 vllm serve … --port '8000' --gpu-memory-utilization '0.9'
 vllm serve … --tensor-parallel-size '2' --uds '/tmp/vllm.sock'

@@ -79,15 +80,23 @@ vllm run-batch --model 'meta-llama/Meta-Llama-3-8B-Instruct' -o 'results.jsonl'

 </details>

-<!-- Uncomment if used
 <details>
  <summary>Real world use cases</summary>

 ```sh
+# Use models pulled with Ollama.
+# vLLM expects a Hugging Face model directory structure containing `config.json`, `tokenizer.json`, and other files, but
+# Ollama stores models as a single blob files in GGUF format.
+# vllm-metal (via MLX) cannot directly load a raw GGUF blob.
+# FIXME: not working.
+jq -r '.layers|sort_by(.size)[-1].digest|sub(":";"-")' \
+  "$HOME/.ollama/models/manifests/registry.ollama.ai/library/codellama/13b" \
+| xargs -pI '%%' \
+    vllm serve "$HOME/.ollama/models/blobs/%%" --served-model-name 'codellama-13b' \
+      --generation-config 'vllm' --tokenizer 'codellama/CodeLlama-13b-Instruct-hf' --load-format 'gguf'
 ```

 </details>
-->

 ## Further readings

@@ -111,7 +120,7 @@ vllm run-batch --model 'meta-llama/Meta-Llama-3-8B-Instruct' -o 'results.jsonl'
 [Blog]: https://blog.vllm.ai/
 [Codebase]: https://github.com/vllm-project/
 [Documentation]: https://docs.vllm.ai/en/
-[vllm-project/vllm-metal]: https://github.com/vllm-project/vllm-metal
+[vLLM-metal]: vllm-metal.md
 [Website]: https://vllm.ai/

 <!-- Others -->