diff --git a/knowledge base/ai/large language model.md b/knowledge base/ai/large language model.md index c4c7a8a..4dbd079 100644 --- a/knowledge base/ai/large language model.md +++ b/knowledge base/ai/large language model.md @@ -95,14 +95,12 @@ Next step is [agentic AI][agent]. ## Run LLMs Locally -Use one of the following: +Refer: -- [Ollama] -- [LMStudio] -- [vLLM] -- [Jan] -- [llama.cpp] -- [Llamafile] +- [Local LLM Hosting: Complete 2026 Guide - Ollama, vLLM, LocalAI, Jan, LM Studio & More]. +- [Run LLMs Locally: 6 Simple Methods]. + +[Ollama]| [Jan] |[LMStudio] | [Docker model runner] | [llama.cpp] | [vLLM] | [Llamafile] ## Further readings @@ -110,6 +108,7 @@ Use one of the following: - [Run LLMs Locally: 6 Simple Methods] - [OpenClaw: Who are you?] +- [Local LLM Hosting: Complete 2026 Guide - Ollama, vLLM, LocalAI, Jan, LM Studio & More] [Agent]: agent.md +[Docker model runner]: ../docker.md#running-llms-locally [LMStudio]: lmstudio.md [Ollama]: ollama.md [vLLM]: vllm.md @@ -133,9 +133,10 @@ Use one of the following: [Gemini]: https://gemini.google.com/ [Grok]: https://grok.com/ [Jan]: https://www.jan.ai/ -[llama.cpp]: https://github.com/ggml-org/llama.cpp +[llama.cpp]: llama.cpp.md [Llama]: https://www.llama.com/ [Llamafile]: https://github.com/mozilla-ai/llamafile +[Local LLM Hosting: Complete 2026 Guide - Ollama, vLLM, LocalAI, Jan, LM Studio & More]: https://www.glukhov.org/post/2025/11/hosting-llms-ollama-localai-jan-lmstudio-vllm-comparison/ [Mistral]: https://mistral.ai/ [OpenClaw: Who are you?]: https://www.youtube.com/watch?v=hoeEclqW8Gs [Run LLMs Locally: 6 Simple Methods]: https://www.datacamp.com/tutorial/run-llms-locally-tutorial diff --git a/knowledge base/ai/llama.cpp.md b/knowledge base/ai/llama.cpp.md new file mode 100644 index 0000000..ba9aae4 --- /dev/null +++ b/knowledge base/ai/llama.cpp.md @@ -0,0 +1,67 @@ +# llama.cpp + +> TODO + +LLM inference engine written in in C/C++.
+Vastly used as base for AI tools like [Ollama] and [Docker model runner]. + + + +1. [TL;DR](#tldr) +1. [Further readings](#further-readings) + 1. [Sources](#sources) + +## TL;DR + + + + + + + +## Further readings + +- [Codebase] + +### Sources + + + + + +[Docker model runner]: ../docker.md#running-llms-locally +[Ollama]: ollama.md + + + +[Codebase]: https://github.com/ggml-org/llama.cpp + + diff --git a/knowledge base/ai/lmstudio.md b/knowledge base/ai/lmstudio.md index 90fd8da..227d9eb 100644 --- a/knowledge base/ai/lmstudio.md +++ b/knowledge base/ai/lmstudio.md @@ -1,6 +1,7 @@ # LMStudio -Allows running LLMs locally. +Allows running LLMs locally.
+Considered the most accessible tool for local LLM deployment, particularly for users with no technical background. @@ -11,6 +12,31 @@ Allows running LLMs locally. ## TL;DR +Focused on single-user scenarios without built-in rate limiting or authentication. + +Offers highly mature and stable OpenAI-compatible API. + +Supports full streaming, embeddings API, experimental function calling for compatible models, and limited multimodal +support. + +Supports GGUF and Hugging Face Safetensors formats.
+Has a built-in converter for some models, and can run split GGUF models. + +Implements experimental tool calling support following the OpenAI function calling API format.
+Models trained on function calling (e.g., Hermes 2 Pro, Llama 3.1, and Functionary) can invoke external tools through +the local API server. However, tool calling should **not** yet be considered suitable for production.
+Streaming tool calls or advanced features like parallel function invocation are not currently supported.
+Some models show better tool calling behavior than others. + +The UI eases defining function schemas and test tool calls interactively + +Considered ideal for: + +- Beginners new to local LLM deployment. +- Users who prefer graphical interfaces over command-line tools. +- Developers needing good performance on lower-spec hardware (especially with integrated GPUs). +- Anyone wanting a polished professional user experience. +
Setup diff --git a/knowledge base/ai/ollama.md b/knowledge base/ai/ollama.md index 36a2b1e..272e539 100644 --- a/knowledge base/ai/ollama.md +++ b/knowledge base/ai/ollama.md @@ -1,6 +1,7 @@ # Ollama -The easiest way to get up and running with large language models. +One of the easiest way to get up and running with large language models.
+Emerged as one of the most popular tools for local LLM deployment. @@ -11,6 +12,21 @@ The easiest way to get up and running with large language models. ## TL;DR +Leverages [llama.cpp]. + +Supports primarily the GGUF file format with quantization levels Q2_K through Q8_0.
+Offers automatic conversion of models from Hugging Face and allows customization through Modelfile. + +Supports tool calling functionality via API.
+Models can decide when to invoke tools and how to use returned data.
+Works with models specifically trained for function calling (e.g., Mistral, Llama 3.1, Llama 3.2, and Qwen2.5). However, +it does not currently allow forcing a specific tool to be called nor receiving tool call responses in streaming mode. + +Considered ideal for developers who prefer CLI interfaces and automation, need reliable API integration, value +open-source transparency, and want efficient resource utilization. + +Excellent for building applications that require seamless migration from OpenAI. +
Setup @@ -118,6 +134,8 @@ ollama signout +[llama.cpp]: llama.cpp.md + [Blog]: https://ollama.com/blog diff --git a/knowledge base/ai/vllm.md b/knowledge base/ai/vllm.md index db1d298..a981416 100644 --- a/knowledge base/ai/vllm.md +++ b/knowledge base/ai/vllm.md @@ -11,6 +11,23 @@ Open source library for LLM inference and serving. ## TL;DR +Engineered specifically for high-performance, production-grade LLM inference. + +Offers production-ready, highly mature OpenAI-compatible API.
+Has full support for streaming, embeddings, tool/function calling with parallel invocation capability, vision-language +model support, rate limiting, and token-based authentication. Optimized for high-throughput and batch requests. + +Supports PyTorch and Safetensors (primary), GPTQ and AWQ quantization, native Hugging Face model hub.
+Does **not** natively support GGUF (requires conversion). + +Offers production-grade, fully-featured, OpenAI-compatible tool calling functionality via API.
+Support includes parallel function calls, the `tool_choice parameter` for controlling tool selection, and streaming +support for tool calls. + +Considered the gold standard for production deployments requiring enterprise-grade tool orchestration.
+Best for production-grade performance and reliability, high concurrent request handling, multi-GPU deployment +capabilities, and enterprise-scale LLM serving. +
Setup