mirror of
https://gitea.com/mcereda/oam.git
synced 2026-03-04 07:54:25 +00:00
chore(ai): expand notes
This commit is contained in:
@@ -95,14 +95,12 @@ Next step is [agentic AI][agent].
|
||||
|
||||
## Run LLMs Locally
|
||||
|
||||
Use one of the following:
|
||||
Refer:
|
||||
|
||||
- [Ollama]
|
||||
- [LMStudio]
|
||||
- [vLLM]
|
||||
- [Jan]
|
||||
- [llama.cpp]
|
||||
- [Llamafile]
|
||||
- [Local LLM Hosting: Complete 2026 Guide - Ollama, vLLM, LocalAI, Jan, LM Studio & More].
|
||||
- [Run LLMs Locally: 6 Simple Methods].
|
||||
|
||||
[Ollama]| [Jan] |[LMStudio] | [Docker model runner] | [llama.cpp] | [vLLM] | [Llamafile]
|
||||
|
||||
## Further readings
|
||||
|
||||
@@ -110,6 +108,7 @@ Use one of the following:
|
||||
|
||||
- [Run LLMs Locally: 6 Simple Methods]
|
||||
- [OpenClaw: Who are you?]
|
||||
- [Local LLM Hosting: Complete 2026 Guide - Ollama, vLLM, LocalAI, Jan, LM Studio & More]
|
||||
|
||||
<!--
|
||||
Reference
|
||||
@@ -119,6 +118,7 @@ Use one of the following:
|
||||
<!-- In-article sections -->
|
||||
<!-- Knowledge base -->
|
||||
[Agent]: agent.md
|
||||
[Docker model runner]: ../docker.md#running-llms-locally
|
||||
[LMStudio]: lmstudio.md
|
||||
[Ollama]: ollama.md
|
||||
[vLLM]: vllm.md
|
||||
@@ -133,9 +133,10 @@ Use one of the following:
|
||||
[Gemini]: https://gemini.google.com/
|
||||
[Grok]: https://grok.com/
|
||||
[Jan]: https://www.jan.ai/
|
||||
[llama.cpp]: https://github.com/ggml-org/llama.cpp
|
||||
[llama.cpp]: llama.cpp.md
|
||||
[Llama]: https://www.llama.com/
|
||||
[Llamafile]: https://github.com/mozilla-ai/llamafile
|
||||
[Local LLM Hosting: Complete 2026 Guide - Ollama, vLLM, LocalAI, Jan, LM Studio & More]: https://www.glukhov.org/post/2025/11/hosting-llms-ollama-localai-jan-lmstudio-vllm-comparison/
|
||||
[Mistral]: https://mistral.ai/
|
||||
[OpenClaw: Who are you?]: https://www.youtube.com/watch?v=hoeEclqW8Gs
|
||||
[Run LLMs Locally: 6 Simple Methods]: https://www.datacamp.com/tutorial/run-llms-locally-tutorial
|
||||
|
||||
67
knowledge base/ai/llama.cpp.md
Normal file
67
knowledge base/ai/llama.cpp.md
Normal file
@@ -0,0 +1,67 @@
|
||||
# llama.cpp
|
||||
|
||||
> TODO
|
||||
|
||||
LLM inference engine written in in C/C++.<br/>
|
||||
Vastly used as base for AI tools like [Ollama] and [Docker model runner].
|
||||
|
||||
<!-- Remove this line to uncomment if used
|
||||
## Table of contents <!-- omit in toc -->
|
||||
|
||||
1. [TL;DR](#tldr)
|
||||
1. [Further readings](#further-readings)
|
||||
1. [Sources](#sources)
|
||||
|
||||
## TL;DR
|
||||
|
||||
<!-- Uncomment if used
|
||||
<details>
|
||||
<summary>Setup</summary>
|
||||
|
||||
```sh
|
||||
```
|
||||
|
||||
</details>
|
||||
-->
|
||||
|
||||
<!-- Uncomment if used
|
||||
<details>
|
||||
<summary>Usage</summary>
|
||||
|
||||
```sh
|
||||
```
|
||||
|
||||
</details>
|
||||
-->
|
||||
|
||||
<!-- Uncomment if used
|
||||
<details>
|
||||
<summary>Real world use cases</summary>
|
||||
|
||||
```sh
|
||||
```
|
||||
|
||||
</details>
|
||||
-->
|
||||
|
||||
## Further readings
|
||||
|
||||
- [Codebase]
|
||||
|
||||
### Sources
|
||||
|
||||
<!--
|
||||
Reference
|
||||
═╬═Time══
|
||||
-->
|
||||
|
||||
<!-- In-article sections -->
|
||||
<!-- Knowledge base -->
|
||||
[Docker model runner]: ../docker.md#running-llms-locally
|
||||
[Ollama]: ollama.md
|
||||
|
||||
<!-- Files -->
|
||||
<!-- Upstream -->
|
||||
[Codebase]: https://github.com/ggml-org/llama.cpp
|
||||
|
||||
<!-- Others -->
|
||||
@@ -1,6 +1,7 @@
|
||||
# LMStudio
|
||||
|
||||
Allows running LLMs locally.
|
||||
Allows running LLMs locally.<br/>
|
||||
Considered the most accessible tool for local LLM deployment, particularly for users with no technical background.
|
||||
|
||||
<!-- Remove this line to uncomment if used
|
||||
## Table of contents <!-- omit in toc -->
|
||||
@@ -11,6 +12,31 @@ Allows running LLMs locally.
|
||||
|
||||
## TL;DR
|
||||
|
||||
Focused on single-user scenarios without built-in rate limiting or authentication.
|
||||
|
||||
Offers highly mature and stable OpenAI-compatible API.
|
||||
|
||||
Supports full streaming, embeddings API, experimental function calling for compatible models, and limited multimodal
|
||||
support.
|
||||
|
||||
Supports GGUF and Hugging Face Safetensors formats.<br/>
|
||||
Has a built-in converter for some models, and can run split GGUF models.
|
||||
|
||||
Implements experimental tool calling support following the OpenAI function calling API format.<br/>
|
||||
Models trained on function calling (e.g., Hermes 2 Pro, Llama 3.1, and Functionary) can invoke external tools through
|
||||
the local API server. However, tool calling should **not** yet be considered suitable for production.<br/>
|
||||
Streaming tool calls or advanced features like parallel function invocation are not currently supported.<br/>
|
||||
Some models show better tool calling behavior than others.
|
||||
|
||||
The UI eases defining function schemas and test tool calls interactively
|
||||
|
||||
Considered ideal for:
|
||||
|
||||
- Beginners new to local LLM deployment.
|
||||
- Users who prefer graphical interfaces over command-line tools.
|
||||
- Developers needing good performance on lower-spec hardware (especially with integrated GPUs).
|
||||
- Anyone wanting a polished professional user experience.
|
||||
|
||||
<details>
|
||||
<summary>Setup</summary>
|
||||
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
# Ollama
|
||||
|
||||
The easiest way to get up and running with large language models.
|
||||
One of the easiest way to get up and running with large language models.<br/>
|
||||
Emerged as one of the most popular tools for local LLM deployment.
|
||||
|
||||
<!-- Remove this line to uncomment if used
|
||||
## Table of contents <!-- omit in toc -->
|
||||
@@ -11,6 +12,21 @@ The easiest way to get up and running with large language models.
|
||||
|
||||
## TL;DR
|
||||
|
||||
Leverages [llama.cpp].
|
||||
|
||||
Supports primarily the GGUF file format with quantization levels Q2_K through Q8_0.<br/>
|
||||
Offers automatic conversion of models from Hugging Face and allows customization through Modelfile.
|
||||
|
||||
Supports tool calling functionality via API.<br/>
|
||||
Models can decide when to invoke tools and how to use returned data.<br/>
|
||||
Works with models specifically trained for function calling (e.g., Mistral, Llama 3.1, Llama 3.2, and Qwen2.5). However,
|
||||
it does not currently allow forcing a specific tool to be called nor receiving tool call responses in streaming mode.
|
||||
|
||||
Considered ideal for developers who prefer CLI interfaces and automation, need reliable API integration, value
|
||||
open-source transparency, and want efficient resource utilization.
|
||||
|
||||
Excellent for building applications that require seamless migration from OpenAI.
|
||||
|
||||
<details>
|
||||
<summary>Setup</summary>
|
||||
|
||||
@@ -118,6 +134,8 @@ ollama signout
|
||||
|
||||
<!-- In-article sections -->
|
||||
<!-- Knowledge base -->
|
||||
[llama.cpp]: llama.cpp.md
|
||||
|
||||
<!-- Files -->
|
||||
<!-- Upstream -->
|
||||
[Blog]: https://ollama.com/blog
|
||||
|
||||
@@ -11,6 +11,23 @@ Open source library for LLM inference and serving.
|
||||
|
||||
## TL;DR
|
||||
|
||||
Engineered specifically for high-performance, production-grade LLM inference.
|
||||
|
||||
Offers production-ready, highly mature OpenAI-compatible API.<br/>
|
||||
Has full support for streaming, embeddings, tool/function calling with parallel invocation capability, vision-language
|
||||
model support, rate limiting, and token-based authentication. Optimized for high-throughput and batch requests.
|
||||
|
||||
Supports PyTorch and Safetensors (primary), GPTQ and AWQ quantization, native Hugging Face model hub.<br/>
|
||||
Does **not** natively support GGUF (requires conversion).
|
||||
|
||||
Offers production-grade, fully-featured, OpenAI-compatible tool calling functionality via API.<br/>
|
||||
Support includes parallel function calls, the `tool_choice parameter` for controlling tool selection, and streaming
|
||||
support for tool calls.
|
||||
|
||||
Considered the gold standard for production deployments requiring enterprise-grade tool orchestration.<br/>
|
||||
Best for production-grade performance and reliability, high concurrent request handling, multi-GPU deployment
|
||||
capabilities, and enterprise-scale LLM serving.
|
||||
|
||||
<details>
|
||||
<summary>Setup</summary>
|
||||
|
||||
|
||||
Reference in New Issue
Block a user