diff --git a/knowledge base/ai/large language model.md b/knowledge base/ai/large language model.md
index c4c7a8a..4dbd079 100644
--- a/knowledge base/ai/large language model.md
+++ b/knowledge base/ai/large language model.md
@@ -95,14 +95,12 @@ Next step is [agentic AI][agent].
## Run LLMs Locally
-Use one of the following:
+Refer:
-- [Ollama]
-- [LMStudio]
-- [vLLM]
-- [Jan]
-- [llama.cpp]
-- [Llamafile]
+- [Local LLM Hosting: Complete 2026 Guide - Ollama, vLLM, LocalAI, Jan, LM Studio & More].
+- [Run LLMs Locally: 6 Simple Methods].
+
+[Ollama]| [Jan] |[LMStudio] | [Docker model runner] | [llama.cpp] | [vLLM] | [Llamafile]
## Further readings
@@ -110,6 +108,7 @@ Use one of the following:
- [Run LLMs Locally: 6 Simple Methods]
- [OpenClaw: Who are you?]
+- [Local LLM Hosting: Complete 2026 Guide - Ollama, vLLM, LocalAI, Jan, LM Studio & More]
[Agent]: agent.md
+[Docker model runner]: ../docker.md#running-llms-locally
[LMStudio]: lmstudio.md
[Ollama]: ollama.md
[vLLM]: vllm.md
@@ -133,9 +133,10 @@ Use one of the following:
[Gemini]: https://gemini.google.com/
[Grok]: https://grok.com/
[Jan]: https://www.jan.ai/
-[llama.cpp]: https://github.com/ggml-org/llama.cpp
+[llama.cpp]: llama.cpp.md
[Llama]: https://www.llama.com/
[Llamafile]: https://github.com/mozilla-ai/llamafile
+[Local LLM Hosting: Complete 2026 Guide - Ollama, vLLM, LocalAI, Jan, LM Studio & More]: https://www.glukhov.org/post/2025/11/hosting-llms-ollama-localai-jan-lmstudio-vllm-comparison/
[Mistral]: https://mistral.ai/
[OpenClaw: Who are you?]: https://www.youtube.com/watch?v=hoeEclqW8Gs
[Run LLMs Locally: 6 Simple Methods]: https://www.datacamp.com/tutorial/run-llms-locally-tutorial
diff --git a/knowledge base/ai/llama.cpp.md b/knowledge base/ai/llama.cpp.md
new file mode 100644
index 0000000..ba9aae4
--- /dev/null
+++ b/knowledge base/ai/llama.cpp.md
@@ -0,0 +1,67 @@
+# llama.cpp
+
+> TODO
+
+LLM inference engine written in in C/C++.
+Vastly used as base for AI tools like [Ollama] and [Docker model runner].
+
+
+
+1. [TL;DR](#tldr)
+1. [Further readings](#further-readings)
+ 1. [Sources](#sources)
+
+## TL;DR
+
+
+
+
+
+
+
+## Further readings
+
+- [Codebase]
+
+### Sources
+
+
+
+
+
+[Docker model runner]: ../docker.md#running-llms-locally
+[Ollama]: ollama.md
+
+
+
+[Codebase]: https://github.com/ggml-org/llama.cpp
+
+
diff --git a/knowledge base/ai/lmstudio.md b/knowledge base/ai/lmstudio.md
index 90fd8da..227d9eb 100644
--- a/knowledge base/ai/lmstudio.md
+++ b/knowledge base/ai/lmstudio.md
@@ -1,6 +1,7 @@
# LMStudio
-Allows running LLMs locally.
+Allows running LLMs locally.
+Considered the most accessible tool for local LLM deployment, particularly for users with no technical background.
@@ -11,6 +12,31 @@ Allows running LLMs locally.
## TL;DR
+Focused on single-user scenarios without built-in rate limiting or authentication.
+
+Offers highly mature and stable OpenAI-compatible API.
+
+Supports full streaming, embeddings API, experimental function calling for compatible models, and limited multimodal
+support.
+
+Supports GGUF and Hugging Face Safetensors formats.
+Has a built-in converter for some models, and can run split GGUF models.
+
+Implements experimental tool calling support following the OpenAI function calling API format.
+Models trained on function calling (e.g., Hermes 2 Pro, Llama 3.1, and Functionary) can invoke external tools through
+the local API server. However, tool calling should **not** yet be considered suitable for production.
+Streaming tool calls or advanced features like parallel function invocation are not currently supported.
+Some models show better tool calling behavior than others.
+
+The UI eases defining function schemas and test tool calls interactively
+
+Considered ideal for:
+
+- Beginners new to local LLM deployment.
+- Users who prefer graphical interfaces over command-line tools.
+- Developers needing good performance on lower-spec hardware (especially with integrated GPUs).
+- Anyone wanting a polished professional user experience.
+
Setup
diff --git a/knowledge base/ai/ollama.md b/knowledge base/ai/ollama.md
index 36a2b1e..272e539 100644
--- a/knowledge base/ai/ollama.md
+++ b/knowledge base/ai/ollama.md
@@ -1,6 +1,7 @@
# Ollama
-The easiest way to get up and running with large language models.
+One of the easiest way to get up and running with large language models.
+Emerged as one of the most popular tools for local LLM deployment.
@@ -11,6 +12,21 @@ The easiest way to get up and running with large language models.
## TL;DR
+Leverages [llama.cpp].
+
+Supports primarily the GGUF file format with quantization levels Q2_K through Q8_0.
+Offers automatic conversion of models from Hugging Face and allows customization through Modelfile.
+
+Supports tool calling functionality via API.
+Models can decide when to invoke tools and how to use returned data.
+Works with models specifically trained for function calling (e.g., Mistral, Llama 3.1, Llama 3.2, and Qwen2.5). However,
+it does not currently allow forcing a specific tool to be called nor receiving tool call responses in streaming mode.
+
+Considered ideal for developers who prefer CLI interfaces and automation, need reliable API integration, value
+open-source transparency, and want efficient resource utilization.
+
+Excellent for building applications that require seamless migration from OpenAI.
+
Setup
@@ -118,6 +134,8 @@ ollama signout
+[llama.cpp]: llama.cpp.md
+
[Blog]: https://ollama.com/blog
diff --git a/knowledge base/ai/vllm.md b/knowledge base/ai/vllm.md
index db1d298..a981416 100644
--- a/knowledge base/ai/vllm.md
+++ b/knowledge base/ai/vllm.md
@@ -11,6 +11,23 @@ Open source library for LLM inference and serving.
## TL;DR
+Engineered specifically for high-performance, production-grade LLM inference.
+
+Offers production-ready, highly mature OpenAI-compatible API.
+Has full support for streaming, embeddings, tool/function calling with parallel invocation capability, vision-language
+model support, rate limiting, and token-based authentication. Optimized for high-throughput and batch requests.
+
+Supports PyTorch and Safetensors (primary), GPTQ and AWQ quantization, native Hugging Face model hub.
+Does **not** natively support GGUF (requires conversion).
+
+Offers production-grade, fully-featured, OpenAI-compatible tool calling functionality via API.
+Support includes parallel function calls, the `tool_choice parameter` for controlling tool selection, and streaming
+support for tool calls.
+
+Considered the gold standard for production deployments requiring enterprise-grade tool orchestration.
+Best for production-grade performance and reliability, high concurrent request handling, multi-GPU deployment
+capabilities, and enterprise-scale LLM serving.
+
Setup