chore(kb/ai): review and expand notes

This commit is contained in:
Michele Cereda
2026-02-20 22:20:45 +01:00
parent c862d1208c
commit e2cfa1d235
4 changed files with 89 additions and 5 deletions

View File

@@ -11,6 +11,8 @@ Works in a terminal, IDE, browser, and as a desktop app.
1. [TL;DR](#tldr)
1. [Grant access to tools](#grant-access-to-tools)
1. [Using skills](#using-skills)
1. [Limit tool execution](#limit-tool-execution)
1. [Memory](#memory)
1. [Run on local models](#run-on-local-models)
1. [Further readings](#further-readings)
1. [Sources](#sources)
@@ -259,6 +261,44 @@ Reference optional files in `SKILL.md` to instruct Claude of what they contain a
> [!tip]
> Prefer keeping `SKILL.md` under 500 lines. Move detailed reference material to supporting files.
## Limit tool execution
Leverage [Sandboxing][documentation/sandboxing] to provide filesystem and network isolation for tool execution.<br/>
The sandboxed bash tool uses OS-level primitives to enforce defined boundaries upfront, and controls network access
through a proxy server running outside the sandbox.<br/>
Attempts to access resources outside the sandbox trigger immediate notifications.
> [!warning]
> Effective sandboxing requires **both** filesystem and network isolation.<br/>
> Without network isolation, compromised agents could exfiltrate sensitive files like SSH keys.<br/>
> Without filesystem isolation, compromised agents could backdoor system resources to gain network access.<br/>
> When configuring sandboxing, it is important to ensure that configured settings do not bypass these systems.
The sandboxed tool:
- Grants _default_ read and write access to the current working directory and its subdirectories.
- Grants _default_ read access to the entire computer, except specific denied directories.
- Blocks modifying files outside the current working directory without **explicit** permission.
- Allows defining custom allowed and denied paths through settings.
- Allows accessing only approved domains.
- Prompts the user when tools request access to new domains.
- Allows implementing custom rules on **outgoing** traffic.
- Applies restrictions to all scripts, programs, and subprocesses spawned by commands.
On Mac OS X, Claude Code uses the built-in Seatbelt framework. On Linux and WSL2, it requires installing
[containers/bubblewrap] before activation.
Sandboxes _can_ be configured to execute commands within the sandbox **without** requiring approval.<br/>
Commands that cannot be sandboxed fall back to the regular permission flow.
Customize sandbox behavior through the `settings.json` file.
## Memory
TODO
Refer [Manage Claude's memory][documentation/manage claude's memory].
## Run on local models
Claude _can_ use other models and engines by setting the `ANTHROPIC_AUTH_TOKEN`, `ANTHROPIC_BASE_URL` and
@@ -338,6 +378,8 @@ Claude Code version: `v2.1.41`.<br/>
[Blog]: https://claude.com/blog
[Codebase]: https://github.com/anthropics/claude-code
[Documentation]: https://code.claude.com/docs/en/overview
[Documentation/Manage Claude's memory]: https://code.claude.com/docs/en/memory
[Documentation/Sandboxing]: https://code.claude.com/docs/en/sandboxing
[Documentation/Skills]: https://code.claude.com/docs/en/skills
[Website]: https://claude.com/product/overview
@@ -345,6 +387,7 @@ Claude Code version: `v2.1.41`.<br/>
[Agent Skills]: https://agentskills.io/
[AWS API MCP Server]: https://github.com/awslabs/mcp/tree/main/src/aws-api-mcp-server
[Claude Skills vs. MCP: A Technical Comparison for AI Workflows]: https://intuitionlabs.ai/articles/claude-skills-vs-mcp
[containers/bubblewrap]: https://github.com/containers/bubblewrap
[Cost Explorer MCP Server]: https://github.com/awslabs/mcp/tree/main/src/cost-explorer-mcp-server
[pffigueiredo/claude-code-sheet.md]: https://gist.github.com/pffigueiredo/252bac8c731f7e8a2fc268c8a965a963
[Prat011/awesome-llm-skills]: https://github.com/Prat011/awesome-llm-skills

View File

@@ -187,7 +187,11 @@ just inferring the next token.
what those are or how they work. This is causing lack of critical thinking and overreliance.
- Model training and execution requires resources that are normally not available to the common person. This encourages
people to depend from, and hence give power to, AI companies.
- Models tend to **not** accept gracefully that they don't know something, and hallucinate as a result.
- Models tend to **not** accept gracefully that they don't know something, and hallucinate as a result.<br/>
More recent techniques are making models more efficient, but they just delay this problem.
- Models can learn and exhibit deceptive behavior.<br/>
Standard techniques could fail to remove it, and instead empower it while creating a false impression of safety.<br/>
See [Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training].
## Run LLMs Locally
@@ -202,6 +206,7 @@ Refer:
- [SEQUOIA: Serving exact Llama2-70B on an RTX4090 with half-second per token latency]
- [Optimizing LLMs for Performance and Accuracy with Post-Training Quantization]
- [Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training]
### Sources
@@ -252,4 +257,5 @@ Refer:
[Optimizing LLMs for Performance and Accuracy with Post-Training Quantization]: https://developer.nvidia.com/blog/optimizing-llms-for-performance-and-accuracy-with-post-training-quantization/
[Run LLMs Locally: 6 Simple Methods]: https://www.datacamp.com/tutorial/run-llms-locally-tutorial
[SEQUOIA: Serving exact Llama2-70B on an RTX4090 with half-second per token latency]: https://infini-ai-lab.github.io/Sequoia-Page/
[Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training]: https://arxiv.org/abs/2401.05566
[What is chain of thought (CoT) prompting?]: https://www.ibm.com/think/topics/chain-of-thoughts

View File

@@ -31,10 +31,13 @@ capabilities, and enterprise-scale LLM serving.
<details>
<summary>Setup</summary>
Prefer using [vllm-project/vllm-metal] on Apple silicon.<br/>
Install with `curl -fsSL 'https://raw.githubusercontent.com/vllm-project/vllm-metal/main/install.sh' | bash`
```sh
pip install 'vllm'
pipx install 'vllm'
uv tool install 'vllm' # 'vllm-metal' on apple silicon
uv tool install 'vllm'
```
</details>
@@ -43,8 +46,35 @@ uv tool install 'vllm' # 'vllm-metal' on apple silicon
<summary>Usage</summary>
```sh
vllm serve 'meta-llama/Llama-2-7b-hf' --port '8000' --gpu-memory-utilization '0.9'
vllm serve 'meta-llama/Llama-2-70b-hf' --tensor-parallel-size '2' --port '8000'
# Get help.
vllm --help
# Start the vLLM OpenAI Compatible API server.
vllm serve 'meta-llama/Llama-2-7b-hf'
vllm serve … --port '8000' --gpu-memory-utilization '0.9'
vllm serve … --tensor-parallel-size '2' --uds '/tmp/vllm.sock'
# Chat.
vllm chat
vllm chat --url 'http://vllm.example.org:8000/v1'
vllm chat --quick "hi"
# Generate text completion.
vllm complete
vllm complete --url 'http://vllm.example.org:8000/v1'
vllm complete --quick "The future of AI is"
# Bench vLLM.
vllm bench latency --model '…' --input-len '32' --output-len '1' --enforce-eager --load-format 'dummy'
vllm bench serve --host 'localhost' --port '8000' --model '…' \
--random-input-len '32' --random-output-len '4' --num-prompts '5'
vllm bench throughput --model '…' --input-len '32' --output-len '1' --enforce-eager --load-format 'dummy'
# Run prompts in batch and save results to files.
vllm run-batch --input-file 'offline_inference/openai_batch/openai_example_batch.jsonl' --output-file 'results.jsonl' \
--model 'meta-llama/Meta-Llama-3-8B-Instruct'
vllm run-batch --model 'meta-llama/Meta-Llama-3-8B-Instruct' -o 'results.jsonl' \
-i 'https://raw.githubusercontent.com/vllm-project/vllm/main/examples/offline_inference/openai_batch/openai_example_batch.jsonl'
```
</details>
@@ -79,8 +109,9 @@ vllm serve 'meta-llama/Llama-2-70b-hf' --tensor-parallel-size '2' --port '8000'
<!-- Files -->
<!-- Upstream -->
[Blog]: https://blog.vllm.ai/
[Codebase]: https://github.com/vllm-project/vllm
[Codebase]: https://github.com/vllm-project/
[Documentation]: https://docs.vllm.ai/en/
[vllm-project/vllm-metal]: https://github.com/vllm-project/vllm-metal
[Website]: https://vllm.ai/
<!-- Others -->

View File

@@ -36,6 +36,10 @@ uv tool list
uv tool run 'vllm'
uvx 'vllm' # alias for `uv tool run`
# Create virtual environments.
uv venv '.venv'
uv venv '.venv' --allow-existing --python 'python3.12' --seed
# Clear the cache.
uv cache clean
```