mirror of
https://gitea.com/mcereda/oam.git
synced 2026-02-22 03:24:25 +00:00
chore(kb/ai): review and expand notes
This commit is contained in:
@@ -11,6 +11,8 @@ Works in a terminal, IDE, browser, and as a desktop app.
|
||||
1. [TL;DR](#tldr)
|
||||
1. [Grant access to tools](#grant-access-to-tools)
|
||||
1. [Using skills](#using-skills)
|
||||
1. [Limit tool execution](#limit-tool-execution)
|
||||
1. [Memory](#memory)
|
||||
1. [Run on local models](#run-on-local-models)
|
||||
1. [Further readings](#further-readings)
|
||||
1. [Sources](#sources)
|
||||
@@ -259,6 +261,44 @@ Reference optional files in `SKILL.md` to instruct Claude of what they contain a
|
||||
> [!tip]
|
||||
> Prefer keeping `SKILL.md` under 500 lines. Move detailed reference material to supporting files.
|
||||
|
||||
## Limit tool execution
|
||||
|
||||
Leverage [Sandboxing][documentation/sandboxing] to provide filesystem and network isolation for tool execution.<br/>
|
||||
The sandboxed bash tool uses OS-level primitives to enforce defined boundaries upfront, and controls network access
|
||||
through a proxy server running outside the sandbox.<br/>
|
||||
Attempts to access resources outside the sandbox trigger immediate notifications.
|
||||
|
||||
> [!warning]
|
||||
> Effective sandboxing requires **both** filesystem and network isolation.<br/>
|
||||
> Without network isolation, compromised agents could exfiltrate sensitive files like SSH keys.<br/>
|
||||
> Without filesystem isolation, compromised agents could backdoor system resources to gain network access.<br/>
|
||||
> When configuring sandboxing, it is important to ensure that configured settings do not bypass these systems.
|
||||
|
||||
The sandboxed tool:
|
||||
|
||||
- Grants _default_ read and write access to the current working directory and its subdirectories.
|
||||
- Grants _default_ read access to the entire computer, except specific denied directories.
|
||||
- Blocks modifying files outside the current working directory without **explicit** permission.
|
||||
- Allows defining custom allowed and denied paths through settings.
|
||||
- Allows accessing only approved domains.
|
||||
- Prompts the user when tools request access to new domains.
|
||||
- Allows implementing custom rules on **outgoing** traffic.
|
||||
- Applies restrictions to all scripts, programs, and subprocesses spawned by commands.
|
||||
|
||||
On Mac OS X, Claude Code uses the built-in Seatbelt framework. On Linux and WSL2, it requires installing
|
||||
[containers/bubblewrap] before activation.
|
||||
|
||||
Sandboxes _can_ be configured to execute commands within the sandbox **without** requiring approval.<br/>
|
||||
Commands that cannot be sandboxed fall back to the regular permission flow.
|
||||
|
||||
Customize sandbox behavior through the `settings.json` file.
|
||||
|
||||
## Memory
|
||||
|
||||
TODO
|
||||
|
||||
Refer [Manage Claude's memory][documentation/manage claude's memory].
|
||||
|
||||
## Run on local models
|
||||
|
||||
Claude _can_ use other models and engines by setting the `ANTHROPIC_AUTH_TOKEN`, `ANTHROPIC_BASE_URL` and
|
||||
@@ -338,6 +378,8 @@ Claude Code version: `v2.1.41`.<br/>
|
||||
[Blog]: https://claude.com/blog
|
||||
[Codebase]: https://github.com/anthropics/claude-code
|
||||
[Documentation]: https://code.claude.com/docs/en/overview
|
||||
[Documentation/Manage Claude's memory]: https://code.claude.com/docs/en/memory
|
||||
[Documentation/Sandboxing]: https://code.claude.com/docs/en/sandboxing
|
||||
[Documentation/Skills]: https://code.claude.com/docs/en/skills
|
||||
[Website]: https://claude.com/product/overview
|
||||
|
||||
@@ -345,6 +387,7 @@ Claude Code version: `v2.1.41`.<br/>
|
||||
[Agent Skills]: https://agentskills.io/
|
||||
[AWS API MCP Server]: https://github.com/awslabs/mcp/tree/main/src/aws-api-mcp-server
|
||||
[Claude Skills vs. MCP: A Technical Comparison for AI Workflows]: https://intuitionlabs.ai/articles/claude-skills-vs-mcp
|
||||
[containers/bubblewrap]: https://github.com/containers/bubblewrap
|
||||
[Cost Explorer MCP Server]: https://github.com/awslabs/mcp/tree/main/src/cost-explorer-mcp-server
|
||||
[pffigueiredo/claude-code-sheet.md]: https://gist.github.com/pffigueiredo/252bac8c731f7e8a2fc268c8a965a963
|
||||
[Prat011/awesome-llm-skills]: https://github.com/Prat011/awesome-llm-skills
|
||||
|
||||
@@ -187,7 +187,11 @@ just inferring the next token.
|
||||
what those are or how they work. This is causing lack of critical thinking and overreliance.
|
||||
- Model training and execution requires resources that are normally not available to the common person. This encourages
|
||||
people to depend from, and hence give power to, AI companies.
|
||||
- Models tend to **not** accept gracefully that they don't know something, and hallucinate as a result.
|
||||
- Models tend to **not** accept gracefully that they don't know something, and hallucinate as a result.<br/>
|
||||
More recent techniques are making models more efficient, but they just delay this problem.
|
||||
- Models can learn and exhibit deceptive behavior.<br/>
|
||||
Standard techniques could fail to remove it, and instead empower it while creating a false impression of safety.<br/>
|
||||
See [Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training].
|
||||
|
||||
## Run LLMs Locally
|
||||
|
||||
@@ -202,6 +206,7 @@ Refer:
|
||||
|
||||
- [SEQUOIA: Serving exact Llama2-70B on an RTX4090 with half-second per token latency]
|
||||
- [Optimizing LLMs for Performance and Accuracy with Post-Training Quantization]
|
||||
- [Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training]
|
||||
|
||||
### Sources
|
||||
|
||||
@@ -252,4 +257,5 @@ Refer:
|
||||
[Optimizing LLMs for Performance and Accuracy with Post-Training Quantization]: https://developer.nvidia.com/blog/optimizing-llms-for-performance-and-accuracy-with-post-training-quantization/
|
||||
[Run LLMs Locally: 6 Simple Methods]: https://www.datacamp.com/tutorial/run-llms-locally-tutorial
|
||||
[SEQUOIA: Serving exact Llama2-70B on an RTX4090 with half-second per token latency]: https://infini-ai-lab.github.io/Sequoia-Page/
|
||||
[Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training]: https://arxiv.org/abs/2401.05566
|
||||
[What is chain of thought (CoT) prompting?]: https://www.ibm.com/think/topics/chain-of-thoughts
|
||||
|
||||
@@ -31,10 +31,13 @@ capabilities, and enterprise-scale LLM serving.
|
||||
<details>
|
||||
<summary>Setup</summary>
|
||||
|
||||
Prefer using [vllm-project/vllm-metal] on Apple silicon.<br/>
|
||||
Install with `curl -fsSL 'https://raw.githubusercontent.com/vllm-project/vllm-metal/main/install.sh' | bash`
|
||||
|
||||
```sh
|
||||
pip install 'vllm'
|
||||
pipx install 'vllm'
|
||||
uv tool install 'vllm' # 'vllm-metal' on apple silicon
|
||||
uv tool install 'vllm'
|
||||
```
|
||||
|
||||
</details>
|
||||
@@ -43,8 +46,35 @@ uv tool install 'vllm' # 'vllm-metal' on apple silicon
|
||||
<summary>Usage</summary>
|
||||
|
||||
```sh
|
||||
vllm serve 'meta-llama/Llama-2-7b-hf' --port '8000' --gpu-memory-utilization '0.9'
|
||||
vllm serve 'meta-llama/Llama-2-70b-hf' --tensor-parallel-size '2' --port '8000'
|
||||
# Get help.
|
||||
vllm --help
|
||||
|
||||
# Start the vLLM OpenAI Compatible API server.
|
||||
vllm serve 'meta-llama/Llama-2-7b-hf'
|
||||
vllm serve … --port '8000' --gpu-memory-utilization '0.9'
|
||||
vllm serve … --tensor-parallel-size '2' --uds '/tmp/vllm.sock'
|
||||
|
||||
# Chat.
|
||||
vllm chat
|
||||
vllm chat --url 'http://vllm.example.org:8000/v1'
|
||||
vllm chat --quick "hi"
|
||||
|
||||
# Generate text completion.
|
||||
vllm complete
|
||||
vllm complete --url 'http://vllm.example.org:8000/v1'
|
||||
vllm complete --quick "The future of AI is"
|
||||
|
||||
# Bench vLLM.
|
||||
vllm bench latency --model '…' --input-len '32' --output-len '1' --enforce-eager --load-format 'dummy'
|
||||
vllm bench serve --host 'localhost' --port '8000' --model '…' \
|
||||
--random-input-len '32' --random-output-len '4' --num-prompts '5'
|
||||
vllm bench throughput --model '…' --input-len '32' --output-len '1' --enforce-eager --load-format 'dummy'
|
||||
|
||||
# Run prompts in batch and save results to files.
|
||||
vllm run-batch --input-file 'offline_inference/openai_batch/openai_example_batch.jsonl' --output-file 'results.jsonl' \
|
||||
--model 'meta-llama/Meta-Llama-3-8B-Instruct'
|
||||
vllm run-batch --model 'meta-llama/Meta-Llama-3-8B-Instruct' -o 'results.jsonl' \
|
||||
-i 'https://raw.githubusercontent.com/vllm-project/vllm/main/examples/offline_inference/openai_batch/openai_example_batch.jsonl'
|
||||
```
|
||||
|
||||
</details>
|
||||
@@ -79,8 +109,9 @@ vllm serve 'meta-llama/Llama-2-70b-hf' --tensor-parallel-size '2' --port '8000'
|
||||
<!-- Files -->
|
||||
<!-- Upstream -->
|
||||
[Blog]: https://blog.vllm.ai/
|
||||
[Codebase]: https://github.com/vllm-project/vllm
|
||||
[Codebase]: https://github.com/vllm-project/
|
||||
[Documentation]: https://docs.vllm.ai/en/
|
||||
[vllm-project/vllm-metal]: https://github.com/vllm-project/vllm-metal
|
||||
[Website]: https://vllm.ai/
|
||||
|
||||
<!-- Others -->
|
||||
|
||||
@@ -36,6 +36,10 @@ uv tool list
|
||||
uv tool run 'vllm'
|
||||
uvx 'vllm' # alias for `uv tool run`
|
||||
|
||||
# Create virtual environments.
|
||||
uv venv '.venv'
|
||||
uv venv '.venv' --allow-existing --python 'python3.12' --seed
|
||||
|
||||
# Clear the cache.
|
||||
uv cache clean
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user