Claude Code

TODO

Agentic coding tool that reads and edits files, runs commands, and integrates with tools.
Works in a terminal, IDE, browser, and as a desktop app.

TL;DR
Run on local models
Further readings
1. Sources

TL;DR

Warning

Normally requires an Anthropic account to be used.
One can use Claude Code router or Ollama to run on a locally server or shared LLM instead.

Uses a scope system to determine where configurations apply and who they're shared with.
When multiple scopes are active, the more specific ones take precedence.

Scope	Location	Area of effect	Shared
Managed (A.K.A. System)	System-level `managed-settings.json`	All users on the host	Yes (usually deployed by IT)
User	`~/.claude/` directory	Single user, across all projects	No
Project	`.claude/` directory in a repository	All collaborators, repository only	Yes (usually committed to the repository)
Local	`.claude/.local.` files	Single user, repository only	No (usually gitignored)

Setup

brew install --cask 'claude-code'

Usage

# Start in interactive mode.
claude

# Run a one-time task.
claude "fix the build error"

# Run a one-off task, then exit.
claude -p 'Hi! Are you there?'
claude -p "explain this function"

# Resume the most recent conversation that happened in the current directory
claude -c

# Resume a previous conversation
claude -r

Real world use cases

# Run Claude Code on a model served locally by Ollama.
ANTHROPIC_AUTH_TOKEN='ollama' ANTHROPIC_BASE_URL='http://localhost:11434' ANTHROPIC_API_KEY='' \
  claude --model 'lfm2.5-thinking:1.2b'

Run on local models

Claude can use other models and engines by setting the ANTHROPIC_AUTH_TOKEN, ANTHROPIC_BASE_URL and ANTHROPIC_API_KEY environment variables.

E.g.:

# Run Claude Code on a model served locally by Ollama.
ANTHROPIC_AUTH_TOKEN='ollama' ANTHROPIC_BASE_URL='http://localhost:11434' ANTHROPIC_API_KEY='' \
  claude --model 'lfm2.5-thinking:1.2b'

Warning

Performances do tend to drop substantially depending on the context size and the executing host.

Examples

Prompt: Hi! Are you there?.
The model was run once right before the tests started to remove loading times.
Requests have been sent in headless mode (claude -p 'prompt').

glm-4.7-flash:q4_K_M on an M3 Pro MacBook Pro 36 GB

Model: glm-4.7-flash:q4_K_M.
Host: M3 Pro MacBook Pro 36 GB.
Claude Code version: v2.1.41.

Engine	Context	RAM usage	Used swap	Average response time	System remained responsive
llama.cpp (ollama)	4096	19 GB	No	19s	No
llama.cpp (ollama)	8192	19 GB	No	48s	No
llama.cpp (ollama)	16384	20 GB	No	2m 16s	No
llama.cpp (ollama)	32768	22 GB	No	7.12s	No
llama.cpp (ollama)	65536	25 GB	No? (unsure)	10.25s	Meh (minor stutters)
llama.cpp (ollama)	131072	33 GB	No	3m 42s	No (major stutters)

5.0 KiB Raw Blame History

Claude Code

TL;DR

Run on local models

Further readings

Sources

5.0 KiB

Raw Blame History