What is Cascade Agent?
Cascade Agent is a natural language interface and document intelligence engine built into the Cascade Protocol CLI. It does two things:
- Conversational assistant — Describe what you want in plain English; the agent figures out the right CLI commands to run.
- Document intelligence — Extracts structured clinical entities (medications, conditions, lab results, etc.) from free-text narrative sections in C-CDA documents using an on-device LLM.
The agent streams responses in real time, shows every command it runs, and maintains conversation context so you can follow up naturally.
Prerequisites
Node.js 18+Runtime
Cascade CLIIncludes Agent
AI providerAPI key or local
Cascade Agent is bundled with the CLI since v0.5. Install or update the CLI to get both:
npm install -g @the-cascade-protocol/cli
Verify both are available:
cascade --version
cascade agent --version
⚠ Deprecation notice: cascade-agent standalone binary
The separate cascade-agent package and its binary are deprecated as of v1.0.0. Use cascade agent (subcommand of the CLI) instead. The standalone binary still works but prints a warning and will be removed in v2.0.0.
Configure an AI Provider
Cascade Agent supports five AI providers. You only need one. The Local provider runs entirely on-device with no API key or internet connection.
| Provider | Flag | Cost / account | Default model |
|---|---|---|---|
| Anthropic (Claude) | -p anthropic |
Paid — console.anthropic.com | claude-opus-4-6 |
| OpenAI (GPT) | -p openai |
Paid — platform.openai.com | gpt-4o |
| Google (Gemini) | -p google |
Free tier — aistudio.google.com | gemini-2.0-flash |
| Ollama | -p ollama |
Free — runs on your machine | llama3.2 |
| Local (Qwen3.5-4B) | -p local |
Free, fully on-device — no account | Qwen3.5-4B (~2.5 GB) |
The Local provider is powered by node-llama-cpp and downloads the Qwen3.5-4B model (~2.5 GB) on first use. It is the recommended provider for document intelligence tasks because no text ever leaves your machine. A smaller Qwen3.5-2B variant (~1.5 GB) is also available for memory-constrained machines.
Set up a cloud provider
Run the interactive setup to save your API key:
cascade agent login
Or use environment variables (always take precedence over saved keys):
export ANTHROPIC_API_KEY=sk-ant-...
export OPENAI_API_KEY=sk-...
export GOOGLE_AI_API_KEY=AI...
Note: AI provider subscriptions (Claude.ai, ChatGPT Plus, Gemini Advanced) are separate from API access. You need an API key from each provider's developer console. Google AI Studio offers a free tier with generous rate limits.
Use the Agent
Interactive REPL
Start a conversation where the agent remembers context across requests:
cascade agent # uses your configured provider
cascade agent -p google # override provider for this session
cascade agent -p local # use fully on-device Qwen3.5-2B
Inside the REPL:
| Input | Action |
|---|---|
| any text | Send a request to the agent |
| /clear | Reset conversation history |
| /help | Show usage examples |
| /exit | Quit |
One-shot mode
Run a single request and exit — useful in scripts:
cascade agent "validate ~/health-data/patient.ttl"
cascade agent "how many lab results are in this record?" ~/records/patient.json
cascade agent -p openai -m gpt-4o "initialize a Pod at ~/my-pod and import patient.json"
Pod auto-detection v1.1.0
When you start the REPL from inside a directory that contains index.ttl (a Pod root), the agent automatically detects it and announces it at startup:
Cascade Agent v1.1.0 (local provider)
✓ Pod detected: ./my-health-pod
cascade> _
The detected pod is passed automatically to pod commands, so you can say "show me my medications" without specifying a path.
Multi-line input v1.1.0
End a line with \ to continue typing on the next line — useful for longer requests:
cascade> Import hospital-records.json and primary-care.json \
... into my pod, reconcile against existing records, \
... and show me a summary of any conflicts found
Document Intelligence
C-CDA clinical documents often contain free-text narrative sections alongside structured data. Cascade Agent can extract structured clinical entities from these narratives using an on-device LLM, then write them to your Pod as validated RDF.
How it works
Step 1 — Start the extraction server
cascade agent serve starts a local HTTP server on port 8765 that accepts narrative text and returns extracted clinical entities. On first run it downloads the Qwen3.5-4B model (~2.5 GB).
cascade agent serve
⌄ Cascade Agent extraction server
Checking model... Qwen3.5-4B not found, downloading (~2.5 GB)
… download complete
Listening on http://localhost:8765
POST /extract — GET /health
Keep the server running in a terminal tab while you run the next steps.
Step 2 — Import and extract
After importing a C-CDA document (cascade pod import), the import command will tell you if any narrative sections are queued for extraction. Then run:
cascade pod extract ~/my-pod
Extracted entities are routed by confidence:
| Confidence | Destination | Action needed |
|---|---|---|
| ≥ 0.85 | clinical/ai-extracted.ttl | Auto-accepted, none |
| 0.50 – 0.84 | analysis/review-queue.json | Human review (see below) |
| < 0.50 | analysis/discarded-extractions.ttl | Discarded, auditable |
Step 3 — Review borderline entities
Entities with confidence between 0.50 and 0.84 are placed in a review queue. Approve or discard them interactively:
cascade agent review --pod ~/my-pod
The review command walks through each queued entity, showing the source narrative and the extracted data, and asks you to accept or discard. Accepted entities are written to clinical/ai-extracted.ttl; discarded ones are moved to the audit log.
All extraction is on-device
- The extraction server uses Qwen3.5-4B running locally via
node-llama-cpp. No text is sent to any external API. - Extracted entities are tagged with
cascade:dataProvenance cascade:AIExtractedand a PROV-O activity node recording the model ID and confidence score. - All three output paths (auto-accepted, review queue, discarded) are preserved for audit purposes.
What you can ask
The agent understands any task expressible with the Cascade CLI. Some examples:
For batch jobs the agent writes a shell loop rather than making one call per file, so converting thousands of records is a single tool invocation.
Managing Providers & Models
# List configured providers
cascade agent provider
# Switch active provider
cascade agent provider google
# Show current model
cascade agent model
# Switch model using a shortcut
cascade agent model flash # gemini-2.0-flash
cascade agent model opus # claude-opus-4-6
cascade agent model sonnet # claude-sonnet-4-6
# Use any full model ID
cascade agent model gemini-1.5-pro
Settings are saved to ~/.config/cascade-agent/config.json. Environment variables (ANTHROPIC_API_KEY, OPENAI_API_KEY, GOOGLE_AI_API_KEY) always take precedence over saved keys.
Model shortcuts
| Shortcut | Resolves to |
|---|---|
| opus | claude-opus-4-6 |
| sonnet | claude-sonnet-4-6 |
| haiku | claude-haiku-4-5 |
| gpt4o | gpt-4o |
| o3 | o3 |
| flash | gemini-2.0-flash |
| pro | gemini-1.5-pro |
Session Logs
Every session is logged automatically. Review past sessions with:
cascade agent logs