chore(ai): expand notes

2026-03-04 07:54:25 +00:00 · 2026-02-11 01:14:19 +01:00
parent 52648cd511
commit 418a3b9914
5 changed files with 139 additions and 10 deletions
--- a/base/ai/large
+++ b/base/ai/large
@@ -95,14 +95,12 @@ Next step is [agentic AI][agent].

 ## Run LLMs Locally

-Use one of the following:
+Refer:

- [Ollama]
- [LMStudio]
- [vLLM]
- [Jan]
- [llama.cpp]
- [Llamafile]
+- [Local LLM Hosting: Complete 2026 Guide - Ollama, vLLM, LocalAI, Jan, LM Studio & More].
+- [Run LLMs Locally: 6 Simple Methods].
+
+[Ollama]| [Jan] |[LMStudio] | [Docker model runner] | [llama.cpp] | [vLLM] | [Llamafile]

 ## Further readings

@@ -110,6 +108,7 @@ Use one of the following:

 - [Run LLMs Locally: 6 Simple Methods]
 - [OpenClaw: Who are you?]
+- [Local LLM Hosting: Complete 2026 Guide - Ollama, vLLM, LocalAI, Jan, LM Studio & More]

 <!--
  Reference
@@ -119,6 +118,7 @@ Use one of the following:
 <!-- In-article sections -->
 <!-- Knowledge base -->
 [Agent]: agent.md
+[Docker model runner]: ../docker.md#running-llms-locally
 [LMStudio]: lmstudio.md
 [Ollama]: ollama.md
 [vLLM]: vllm.md
@@ -133,9 +133,10 @@ Use one of the following:
 [Gemini]: https://gemini.google.com/
 [Grok]: https://grok.com/
 [Jan]: https://www.jan.ai/
-[llama.cpp]: https://github.com/ggml-org/llama.cpp
+[llama.cpp]: llama.cpp.md
 [Llama]: https://www.llama.com/
 [Llamafile]: https://github.com/mozilla-ai/llamafile
+[Local LLM Hosting: Complete 2026 Guide - Ollama, vLLM, LocalAI, Jan, LM Studio & More]: https://www.glukhov.org/post/2025/11/hosting-llms-ollama-localai-jan-lmstudio-vllm-comparison/
 [Mistral]: https://mistral.ai/
 [OpenClaw: Who are you?]: https://www.youtube.com/watch?v=hoeEclqW8Gs
 [Run LLMs Locally: 6 Simple Methods]: https://www.datacamp.com/tutorial/run-llms-locally-tutorial
--- a/base/ai/llama.cpp.md
+++ b/base/ai/llama.cpp.md
@@ -0,0 +1,67 @@
+# llama.cpp
+
+> TODO
+
+LLM inference engine written in in C/C++.<br/>
+Vastly used as base for AI tools like [Ollama] and [Docker model runner].
+
+<!-- Remove this line to uncomment if used
+## Table of contents <!-- omit in toc -->
+
+1. [TL;DR](#tldr)
+1. [Further readings](#further-readings)
+   1. [Sources](#sources)
+
+## TL;DR
+
+<!-- Uncomment if used
+<details>
+  <summary>Setup</summary>
+
+```sh
+```
+
+</details>
+-->
+
+<!-- Uncomment if used
+<details>
+  <summary>Usage</summary>
+
+```sh
+```
+
+</details>
+-->
+
+<!-- Uncomment if used
+<details>
+  <summary>Real world use cases</summary>
+
+```sh
+```
+
+</details>
+-->
+
+## Further readings
+
+- [Codebase]
+
+### Sources
+
+<!--
+  Reference
+  ═╬═Time══
+  -->
+
+<!-- In-article sections -->
+<!-- Knowledge base -->
+[Docker model runner]: ../docker.md#running-llms-locally
+[Ollama]: ollama.md
+
+<!-- Files -->
+<!-- Upstream -->
+[Codebase]: https://github.com/ggml-org/llama.cpp
+
+<!-- Others -->
--- a/base/ai/lmstudio.md
+++ b/base/ai/lmstudio.md
@@ -1,6 +1,7 @@
 # LMStudio

-Allows running LLMs locally.
+Allows running LLMs locally.<br/>
+Considered the most accessible tool for local LLM deployment, particularly for users with no technical background.

 <!-- Remove this line to uncomment if used
 ## Table of contents <!-- omit in toc -->
@@ -11,6 +12,31 @@ Allows running LLMs locally.

 ## TL;DR

+Focused on single-user scenarios without built-in rate limiting or authentication.
+
+Offers highly mature and stable OpenAI-compatible API.
+
+Supports full streaming, embeddings API, experimental function calling for compatible models, and limited multimodal
+support.
+
+Supports GGUF and Hugging Face Safetensors formats.<br/>
+Has a built-in converter for some models, and can run split GGUF models.
+
+Implements experimental tool calling support following the OpenAI function calling API format.<br/>
+Models trained on function calling (e.g., Hermes 2 Pro, Llama 3.1, and Functionary) can invoke external tools through
+the local API server. However, tool calling should **not** yet be considered suitable for production.<br/>
+Streaming tool calls or advanced features like parallel function invocation are not currently supported.<br/>
+Some models show better tool calling behavior than others.
+
+The UI eases defining function schemas and test tool calls interactively
+
+Considered ideal for:
+
+- Beginners new to local LLM deployment.
+- Users who prefer graphical interfaces over command-line tools.
+- Developers needing good performance on lower-spec hardware (especially with integrated GPUs).
+- Anyone wanting a polished professional user experience.
+
 <details>
  <summary>Setup</summary>

--- a/base/ai/ollama.md
+++ b/base/ai/ollama.md
@@ -1,6 +1,7 @@
 # Ollama

-The easiest way to get up and running with large language models.
+One of the easiest way to get up and running with large language models.<br/>
+Emerged as one of the most popular tools for local LLM deployment.

 <!-- Remove this line to uncomment if used
 ## Table of contents <!-- omit in toc -->
@@ -11,6 +12,21 @@ The easiest way to get up and running with large language models.

 ## TL;DR

+Leverages [llama.cpp].
+
+Supports primarily the GGUF file format with quantization levels Q2_K through Q8_0.<br/>
+Offers automatic conversion of models from Hugging Face and allows customization through Modelfile.
+
+Supports tool calling functionality via API.<br/>
+Models can decide when to invoke tools and how to use returned data.<br/>
+Works with models specifically trained for function calling (e.g., Mistral, Llama 3.1, Llama 3.2, and Qwen2.5). However,
+it does not currently allow forcing a specific tool to be called nor receiving tool call responses in streaming mode.
+
+Considered ideal for developers who prefer CLI interfaces and automation, need reliable API integration, value
+open-source transparency, and want efficient resource utilization.
+
+Excellent for building applications that require seamless migration from OpenAI.
+
 <details>
  <summary>Setup</summary>

@@ -118,6 +134,8 @@ ollama signout

 <!-- In-article sections -->
 <!-- Knowledge base -->
+[llama.cpp]: llama.cpp.md
+
 <!-- Files -->
 <!-- Upstream -->
 [Blog]: https://ollama.com/blog
--- a/base/ai/vllm.md
+++ b/base/ai/vllm.md
@@ -11,6 +11,23 @@ Open source library for LLM inference and serving.

 ## TL;DR

+Engineered specifically for high-performance, production-grade LLM inference.
+
+Offers production-ready, highly mature OpenAI-compatible API.<br/>
+Has full support for streaming, embeddings, tool/function calling with parallel invocation capability, vision-language
+model support, rate limiting, and token-based authentication. Optimized for high-throughput and batch requests.
+
+Supports PyTorch and Safetensors (primary), GPTQ and AWQ quantization, native Hugging Face model hub.<br/>
+Does **not** natively support GGUF (requires conversion).
+
+Offers production-grade, fully-featured, OpenAI-compatible tool calling functionality via API.<br/>
+Support includes parallel function calls, the `tool_choice parameter` for controlling tool selection, and streaming
+support for tool calls.
+
+Considered the gold standard for production deployments requiring enterprise-grade tool orchestration.<br/>
+Best for production-grade performance and reliability, high concurrent request handling, multi-GPU deployment
+capabilities, and enterprise-scale LLM serving.
+
 <details>
  <summary>Setup</summary>