Files
oam/knowledge base/ai/claude/claude code.md
2026-02-14 18:54:05 +01:00

5.0 KiB

Claude Code

TODO

Agentic coding tool that reads and edits files, runs commands, and integrates with tools.
Works in a terminal, IDE, browser, and as a desktop app.

  1. TL;DR
  2. Run on local models
  3. Further readings
    1. Sources

TL;DR

Warning

Normally requires an Anthropic account to be used.
One can use Claude Code router or Ollama to run on a locally server or shared LLM instead.

Uses a scope system to determine where configurations apply and who they're shared with.
When multiple scopes are active, the more specific ones take precedence.

Scope Location Area of effect Shared
Managed (A.K.A. System) System-level managed-settings.json All users on the host Yes (usually deployed by IT)
User ~/.claude/ directory Single user, across all projects No
Project .claude/ directory in a repository All collaborators, repository only Yes (usually committed to the repository)
Local .claude/*.local.* files Single user, repository only No (usually gitignored)
Setup
brew install --cask 'claude-code'
Usage
# Start in interactive mode.
claude

# Run a one-time task.
claude "fix the build error"

# Run a one-off task, then exit.
claude -p 'Hi! Are you there?'
claude -p "explain this function"

# Resume the most recent conversation that happened in the current directory
claude -c

# Resume a previous conversation
claude -r
Real world use cases
# Run Claude Code on a model served locally by Ollama.
ANTHROPIC_AUTH_TOKEN='ollama' ANTHROPIC_BASE_URL='http://localhost:11434' ANTHROPIC_API_KEY='' \
  claude --model 'lfm2.5-thinking:1.2b'

Run on local models

Claude can use other models and engines by setting the ANTHROPIC_AUTH_TOKEN, ANTHROPIC_BASE_URL and ANTHROPIC_API_KEY environment variables.

E.g.:

# Run Claude Code on a model served locally by Ollama.
ANTHROPIC_AUTH_TOKEN='ollama' ANTHROPIC_BASE_URL='http://localhost:11434' ANTHROPIC_API_KEY='' \
  claude --model 'lfm2.5-thinking:1.2b'

Warning

Performances do tend to drop substantially depending on the context size and the executing host.

Examples

Prompt: Hi! Are you there?.
The model was run once right before the tests started to remove loading times.
Requests have been sent in headless mode (claude -p 'prompt').

glm-4.7-flash:q4_K_M on an M3 Pro MacBook Pro 36 GB

Model: glm-4.7-flash:q4_K_M.
Host: M3 Pro MacBook Pro 36 GB.
Claude Code version: v2.1.41.

Engine Context RAM usage Used swap Average response time System remained responsive
llama.cpp (ollama) 4096 19 GB No 19s No
llama.cpp (ollama) 8192 19 GB No 48s No
llama.cpp (ollama) 16384 20 GB No 2m 16s No
llama.cpp (ollama) 32768 22 GB No 7.12s No
llama.cpp (ollama) 65536 25 GB No? (unsure) 10.25s Meh (minor stutters)
llama.cpp (ollama) 131072 33 GB No 3m 42s No (major stutters)

Further readings

Sources