diff --git a/knowledge base/ai/lms.md b/knowledge base/ai/lms.md index ea0b108..43eef33 100644 --- a/knowledge base/ai/lms.md +++ b/knowledge base/ai/lms.md @@ -100,7 +100,7 @@ recognition, machine translation, natural language generation, optical character handwriting recognition, grammar induction, information retrieval, and other tasks. They are currently predominantly based on _transformers_, which have superseded recurrent neural networks as the most -effective technology. +effective architecture. Training LLMs involves feeding them vast amounts of data, and computing weights to optimize their parameters.
The training process typically includes multiple stages, and requires substantial computational resources.
@@ -163,11 +163,11 @@ For domain-specific applications, consider fine-tuning a small model to mimic th Standard models' behaviour is just autocompletion. Models just try to infer or recall what the most probable next word would be. -_Chain of Thought_ techniques tell models to _show their work_. +_Chain of Thought_ techniques tell models to _show their work_ by breaking prompts in smaller, more manageable steps, +and solving on each of them singularly before giving back the final answer.
+The result is more accurate, but it costs more tokens and requires a bigger context window.
It _feels_ like a model is calculating or thinking, but what it is really just increasing the chances that the answer -is correct by breaking questions in smaller, more manageable steps, and solving on each of them before giving back the -final answer.
-The result is more accurate, but it costs more tokens and requires a bigger context window. +is logically sound. The _ReAct loop_ (Reason + Act) paradigm forces models to loop over chain-of-thoughts.
A model breaks the request in smaller steps, plans the next action, acts on it using [functions][function calling] @@ -207,28 +207,37 @@ Deciding which tool to call, using that tool, and then using the results to gene just inferring the next token. > [!caution] -> Allowing a LLM to call functions can have real-world consequences.
+> Allowing LLMs to call functions can have real-world consequences.
> This includes financial loss, data corruption or exfiltration, and security breaches. ## Concerns -- Training requires massive amounts of resource and hence consumes a vast amount of energy and cooling. -- Lots of people currently thinks of LLMs as _real intelligence_, when it is not. -- People currently gives too much credibility to LLM answers, and trust them more than they trust their teachers, - accountants, lawyers or even doctors. -- AI companies could bias their models to say specific things, subtly promote ideologies, influence elections, or even - rewrite history in the mind of those who trust the LLMs. -- Models can be vulnerable to specific attacks (e.g. prompt injection) that would change the LLM's behaviour, bias it, - or hide malware in their tools. -- People is using LLMs mindlessly too much, mostly due to the convenience they offer but also because they don't - understand what those are or how they work. This is causing lack of critical thinking and overreliance. -- Model training and execution requires resources that are normally not available to the common person. This encourages - people to depend from, and hence give power to, AI companies. -- Models tend to **not** accept gracefully that they don't know something, and hallucinate as a result.
- More recent techniques are making models more efficient, but they just delay this problem. -- Models can learn and exhibit deceptive behavior.
- Standard techniques could fail to remove it, and instead empower it while creating a false impression of safety.
+- Lots of people currently thinks of LLMs as _real, rational, intelligence_, when they are not.
+ LLMs are really nothing more than glorified **guessing machines** that are _designed_ to interact naturally. It's + humans that are biased by evolution toward _attributing_ sentience and agency to entities they interact with. +- People is mindlessly using LLMs too much, mostly due to the convenience they offer but also because they don't + understand what those are or how they work. This is causing lack of critical thinking, and overreliance. +- People is giving too much credibility to LLM answers, and trust them more than they trust their teachers, accountants, + lawyers or even doctors. +- LLMs are **incapable** of distinguishing facts from beliefs, and are completely disembodied from the world.
+ They do not _understand_ concepts and are unaware of time, change, and causality. They just **approximate** reasoning + by _mimicking_ language based on how connected are the tokens in their own training data. +- Models are very limited in their ability to revise beliefs. Once some pattern is learned, it is extremely difficult to + unwire it due to the very nature of how models function. +- AI companies could steer and bias their models to say specific things, subtly promote ideologies, influence elections, + or even rewrite history in the mind of those who trust the LLM. +- Models can be vulnerable to attacks (e.g. prompt injection) that can change the LLM's behaviour, bias it, or hide + malware in the tools they manage and use. +- Model training and execution requires massive amounts of data and computation, resources that are normally **not** + available to the common person. Aside from the vast amount of energy and cooling they consume, this encourages people + to depend from, and hence give power to, AI companies. +- Models _can_ learn and exhibit deceptive behavior.
+ Standard revision techniques could fail to remove it, and instead empower it while creating a false impression of + safety.
See [Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training]. +- Models are painfully inconsistent, often unaware of their limitations, irritatingly overconfident, and tend to **not** + accept gracefully that they don't know something, ending up preferring to hallucinate as the result.
+ More recent techniques are making models more efficient, but they just delay this problem. ## Run LLMs Locally @@ -257,6 +266,7 @@ Refer: - [Introduction to Large Language Models] - GeeksForGeeks' [What are LLM parameters?][geeksforgeeks / what are llm parameters?] - IBM's [What are LLM parameters?][ibm / what are llm parameters?] +- [This is not the AI we were promised], presentation by Michael John Wooldridge at the Royal Society + +1. [TL;DR](#tldr) +1. [Further readings](#further-readings) + 1. [Sources](#sources) + +## TL;DR + +Plugin that enables vLLM to run on Apple Silicon Macs using MLX as the primary compute backend, enabling higher +performances. + +
+ Setup + +> [!important] +> Use Python v3.10 to v3.12 as per 2026-02-21.
+> Python 3.13 is not yet supported. + +```sh +# Install from sources. +git clone 'https://github.com/vllm-project/vllm-metal.git' \ +&& cd 'vllm-metal' \ +&& pip install -e '.' 'https://github.com/vllm-project/vllm/releases/download/v0.15.1/vllm-0.15.1.tar.gz' + +# Use the provided installation script. +curl -fsSL 'https://raw.githubusercontent.com/vllm-project/vllm-metal/main/install.sh' | bash +``` + +
+ +Refer [vLLM] for usage. + +## Further readings + +- [vLLM] +- [Codebase] + +### Sources + + + + + +[vLLM]: vllm.md + + + +[Codebase]: https://github.com/vllm-project/vllm-metal + + diff --git a/knowledge base/ai/vllm.md b/knowledge base/ai/vllm.md index 559088e..f9057fe 100644 --- a/knowledge base/ai/vllm.md +++ b/knowledge base/ai/vllm.md @@ -31,8 +31,8 @@ capabilities, and enterprise-scale LLM serving.
Setup -Prefer using [vllm-project/vllm-metal] on Apple silicon.
-Install with `curl -fsSL 'https://raw.githubusercontent.com/vllm-project/vllm-metal/main/install.sh' | bash` +> [!tip] +> Prefer using [vLLM-metal] on Apple silicon. ```sh pip install 'vllm' @@ -51,6 +51,7 @@ vllm --help # Start the vLLM OpenAI Compatible API server. vllm serve 'meta-llama/Llama-2-7b-hf' +vllm serve '/path/to/local/model' vllm serve … --port '8000' --gpu-memory-utilization '0.9' vllm serve … --tensor-parallel-size '2' --uds '/tmp/vllm.sock' @@ -79,15 +80,23 @@ vllm run-batch --model 'meta-llama/Meta-Llama-3-8B-Instruct' -o 'results.jsonl'
- ## Further readings @@ -111,7 +120,7 @@ vllm run-batch --model 'meta-llama/Meta-Llama-3-8B-Instruct' -o 'results.jsonl' [Blog]: https://blog.vllm.ai/ [Codebase]: https://github.com/vllm-project/ [Documentation]: https://docs.vllm.ai/en/ -[vllm-project/vllm-metal]: https://github.com/vllm-project/vllm-metal +[vLLM-metal]: vllm-metal.md [Website]: https://vllm.ai/