chore(ai): expand notes

This commit is contained in:
Michele Cereda
2026-02-11 01:14:19 +01:00
parent 52648cd511
commit 418a3b9914
5 changed files with 139 additions and 10 deletions

View File

@@ -95,14 +95,12 @@ Next step is [agentic AI][agent].
## Run LLMs Locally
Use one of the following:
Refer:
- [Ollama]
- [LMStudio]
- [vLLM]
- [Jan]
- [llama.cpp]
- [Llamafile]
- [Local LLM Hosting: Complete 2026 Guide - Ollama, vLLM, LocalAI, Jan, LM Studio & More].
- [Run LLMs Locally: 6 Simple Methods].
[Ollama]| [Jan] |[LMStudio] | [Docker model runner] | [llama.cpp] | [vLLM] | [Llamafile]
## Further readings
@@ -110,6 +108,7 @@ Use one of the following:
- [Run LLMs Locally: 6 Simple Methods]
- [OpenClaw: Who are you?]
- [Local LLM Hosting: Complete 2026 Guide - Ollama, vLLM, LocalAI, Jan, LM Studio & More]
<!--
Reference
@@ -119,6 +118,7 @@ Use one of the following:
<!-- In-article sections -->
<!-- Knowledge base -->
[Agent]: agent.md
[Docker model runner]: ../docker.md#running-llms-locally
[LMStudio]: lmstudio.md
[Ollama]: ollama.md
[vLLM]: vllm.md
@@ -133,9 +133,10 @@ Use one of the following:
[Gemini]: https://gemini.google.com/
[Grok]: https://grok.com/
[Jan]: https://www.jan.ai/
[llama.cpp]: https://github.com/ggml-org/llama.cpp
[llama.cpp]: llama.cpp.md
[Llama]: https://www.llama.com/
[Llamafile]: https://github.com/mozilla-ai/llamafile
[Local LLM Hosting: Complete 2026 Guide - Ollama, vLLM, LocalAI, Jan, LM Studio & More]: https://www.glukhov.org/post/2025/11/hosting-llms-ollama-localai-jan-lmstudio-vllm-comparison/
[Mistral]: https://mistral.ai/
[OpenClaw: Who are you?]: https://www.youtube.com/watch?v=hoeEclqW8Gs
[Run LLMs Locally: 6 Simple Methods]: https://www.datacamp.com/tutorial/run-llms-locally-tutorial

View File

@@ -0,0 +1,67 @@
# llama.cpp
> TODO
LLM inference engine written in in C/C++.<br/>
Vastly used as base for AI tools like [Ollama] and [Docker model runner].
<!-- Remove this line to uncomment if used
## Table of contents <!-- omit in toc -->
1. [TL;DR](#tldr)
1. [Further readings](#further-readings)
1. [Sources](#sources)
## TL;DR
<!-- Uncomment if used
<details>
<summary>Setup</summary>
```sh
```
</details>
-->
<!-- Uncomment if used
<details>
<summary>Usage</summary>
```sh
```
</details>
-->
<!-- Uncomment if used
<details>
<summary>Real world use cases</summary>
```sh
```
</details>
-->
## Further readings
- [Codebase]
### Sources
<!--
Reference
═╬═Time══
-->
<!-- In-article sections -->
<!-- Knowledge base -->
[Docker model runner]: ../docker.md#running-llms-locally
[Ollama]: ollama.md
<!-- Files -->
<!-- Upstream -->
[Codebase]: https://github.com/ggml-org/llama.cpp
<!-- Others -->

View File

@@ -1,6 +1,7 @@
# LMStudio
Allows running LLMs locally.
Allows running LLMs locally.<br/>
Considered the most accessible tool for local LLM deployment, particularly for users with no technical background.
<!-- Remove this line to uncomment if used
## Table of contents <!-- omit in toc -->
@@ -11,6 +12,31 @@ Allows running LLMs locally.
## TL;DR
Focused on single-user scenarios without built-in rate limiting or authentication.
Offers highly mature and stable OpenAI-compatible API.
Supports full streaming, embeddings API, experimental function calling for compatible models, and limited multimodal
support.
Supports GGUF and Hugging Face Safetensors formats.<br/>
Has a built-in converter for some models, and can run split GGUF models.
Implements experimental tool calling support following the OpenAI function calling API format.<br/>
Models trained on function calling (e.g., Hermes 2 Pro, Llama 3.1, and Functionary) can invoke external tools through
the local API server. However, tool calling should **not** yet be considered suitable for production.<br/>
Streaming tool calls or advanced features like parallel function invocation are not currently supported.<br/>
Some models show better tool calling behavior than others.
The UI eases defining function schemas and test tool calls interactively
Considered ideal for:
- Beginners new to local LLM deployment.
- Users who prefer graphical interfaces over command-line tools.
- Developers needing good performance on lower-spec hardware (especially with integrated GPUs).
- Anyone wanting a polished professional user experience.
<details>
<summary>Setup</summary>

View File

@@ -1,6 +1,7 @@
# Ollama
The easiest way to get up and running with large language models.
One of the easiest way to get up and running with large language models.<br/>
Emerged as one of the most popular tools for local LLM deployment.
<!-- Remove this line to uncomment if used
## Table of contents <!-- omit in toc -->
@@ -11,6 +12,21 @@ The easiest way to get up and running with large language models.
## TL;DR
Leverages [llama.cpp].
Supports primarily the GGUF file format with quantization levels Q2_K through Q8_0.<br/>
Offers automatic conversion of models from Hugging Face and allows customization through Modelfile.
Supports tool calling functionality via API.<br/>
Models can decide when to invoke tools and how to use returned data.<br/>
Works with models specifically trained for function calling (e.g., Mistral, Llama 3.1, Llama 3.2, and Qwen2.5). However,
it does not currently allow forcing a specific tool to be called nor receiving tool call responses in streaming mode.
Considered ideal for developers who prefer CLI interfaces and automation, need reliable API integration, value
open-source transparency, and want efficient resource utilization.
Excellent for building applications that require seamless migration from OpenAI.
<details>
<summary>Setup</summary>
@@ -118,6 +134,8 @@ ollama signout
<!-- In-article sections -->
<!-- Knowledge base -->
[llama.cpp]: llama.cpp.md
<!-- Files -->
<!-- Upstream -->
[Blog]: https://ollama.com/blog

View File

@@ -11,6 +11,23 @@ Open source library for LLM inference and serving.
## TL;DR
Engineered specifically for high-performance, production-grade LLM inference.
Offers production-ready, highly mature OpenAI-compatible API.<br/>
Has full support for streaming, embeddings, tool/function calling with parallel invocation capability, vision-language
model support, rate limiting, and token-based authentication. Optimized for high-throughput and batch requests.
Supports PyTorch and Safetensors (primary), GPTQ and AWQ quantization, native Hugging Face model hub.<br/>
Does **not** natively support GGUF (requires conversion).
Offers production-grade, fully-featured, OpenAI-compatible tool calling functionality via API.<br/>
Support includes parallel function calls, the `tool_choice parameter` for controlling tool selection, and streaming
support for tool calls.
Considered the gold standard for production deployments requiring enterprise-grade tool orchestration.<br/>
Best for production-grade performance and reliability, high concurrent request handling, multi-GPU deployment
capabilities, and enterprise-scale LLM serving.
<details>
<summary>Setup</summary>