Cascade Agent v1.1.3 — Bundled with Cascade CLI v0.5.8+. Requires Node.js 18+. All health data processed locally.

What is Cascade Agent?

Cascade Agent is a natural language interface and document intelligence engine built into the Cascade Protocol CLI. It does two things:

  1. Conversational assistant — Describe what you want in plain English; the agent figures out the right CLI commands to run.
  2. Document intelligence — Extracts structured clinical entities (medications, conditions, lab results, etc.) from free-text narrative sections in C-CDA documents using an on-device LLM.
▶ Convert all FHIR files in ~/records to Cascade RDF and save them to ~/output
⚙ shell $ mkdir -p ~/output && for f in ~/records/*.json; do cascade convert ...
↳ (command succeeded — 1,201 files converted)
Done. 1,201 patient records converted and saved to ~/output as .ttl files.

The agent streams responses in real time, shows every command it runs, and maintains conversation context so you can follow up naturally.

1

Prerequisites

Node.js 18+
Runtime
Cascade CLI
Includes Agent
AI provider
API key or local

Cascade Agent is bundled with the CLI since v0.5. Install or update the CLI to get both:

npm install -g @the-cascade-protocol/cli

Verify both are available:

cascade --version
cascade agent --version

⚠  Deprecation notice: cascade-agent standalone binary

The separate cascade-agent package and its binary are deprecated as of v1.0.0. Use cascade agent (subcommand of the CLI) instead. The standalone binary still works but prints a warning and will be removed in v2.0.0.

2

Configure an AI Provider

Cascade Agent supports five AI providers. You only need one. The Local provider runs entirely on-device with no API key or internet connection.

Provider Flag Cost / account Default model
Anthropic (Claude) -p anthropic Paid — console.anthropic.com claude-opus-4-6
OpenAI (GPT) -p openai Paid — platform.openai.com gpt-4o
Google (Gemini) -p google Free tieraistudio.google.com gemini-2.0-flash
Ollama -p ollama Free — runs on your machine llama3.2
Local (Qwen3.5-4B) -p local Free, fully on-device — no account Qwen3.5-4B (~2.5 GB)

The Local provider is powered by node-llama-cpp and downloads the Qwen3.5-4B model (~2.5 GB) on first use. It is the recommended provider for document intelligence tasks because no text ever leaves your machine. A smaller Qwen3.5-2B variant (~1.5 GB) is also available for memory-constrained machines.

Set up a cloud provider

Run the interactive setup to save your API key:

cascade agent login

Or use environment variables (always take precedence over saved keys):

export ANTHROPIC_API_KEY=sk-ant-...
export OPENAI_API_KEY=sk-...
export GOOGLE_AI_API_KEY=AI...

Note: AI provider subscriptions (Claude.ai, ChatGPT Plus, Gemini Advanced) are separate from API access. You need an API key from each provider's developer console. Google AI Studio offers a free tier with generous rate limits.

3

Use the Agent

Interactive REPL

Start a conversation where the agent remembers context across requests:

cascade agent                   # uses your configured provider
cascade agent -p google         # override provider for this session
cascade agent -p local          # use fully on-device Qwen3.5-2B

Inside the REPL:

InputAction
any textSend a request to the agent
/clearReset conversation history
/helpShow usage examples
/exitQuit

One-shot mode

Run a single request and exit — useful in scripts:

cascade agent "validate ~/health-data/patient.ttl"
cascade agent "how many lab results are in this record?" ~/records/patient.json
cascade agent -p openai -m gpt-4o "initialize a Pod at ~/my-pod and import patient.json"

Pod auto-detection v1.1.0

When you start the REPL from inside a directory that contains index.ttl (a Pod root), the agent automatically detects it and announces it at startup:

Output
Cascade Agent v1.1.0  (local provider)
✓ Pod detected: ./my-health-pod

cascade> _

The detected pod is passed automatically to pod commands, so you can say "show me my medications" without specifying a path.

Multi-line input v1.1.0

End a line with \ to continue typing on the next line — useful for longer requests:

cascade> Import hospital-records.json and primary-care.json \
... into my pod, reconcile against existing records, \
... and show me a summary of any conflicts found
4

Document Intelligence

C-CDA clinical documents often contain free-text narrative sections alongside structured data. Cascade Agent can extract structured clinical entities from these narratives using an on-device LLM, then write them to your Pod as validated RDF.

How it works

cascade pod importParses C-CDA, flags narrative sections
cascade agent serveLocal LLM extraction server
cascade pod extractSends narratives, routes by confidence
cascade agent reviewHuman review of borderline entities

Step 1 — Start the extraction server

cascade agent serve starts a local HTTP server on port 8765 that accepts narrative text and returns extracted clinical entities. On first run it downloads the Qwen3.5-4B model (~2.5 GB).

cascade agent serve
Output
⌄ Cascade Agent extraction server
  Checking model... Qwen3.5-4B not found, downloading (~2.5 GB)
  … download complete
  Listening on http://localhost:8765
  POST /extract  —  GET /health

Keep the server running in a terminal tab while you run the next steps.

Step 2 — Import and extract

After importing a C-CDA document (cascade pod import), the import command will tell you if any narrative sections are queued for extraction. Then run:

cascade pod extract ~/my-pod

Extracted entities are routed by confidence:

ConfidenceDestinationAction needed
≥ 0.85clinical/ai-extracted.ttlAuto-accepted, none
0.50 – 0.84analysis/review-queue.jsonHuman review (see below)
< 0.50analysis/discarded-extractions.ttlDiscarded, auditable

Step 3 — Review borderline entities

Entities with confidence between 0.50 and 0.84 are placed in a review queue. Approve or discard them interactively:

cascade agent review --pod ~/my-pod

The review command walks through each queued entity, showing the source narrative and the extracted data, and asks you to accept or discard. Accepted entities are written to clinical/ai-extracted.ttl; discarded ones are moved to the audit log.

All extraction is on-device

  • The extraction server uses Qwen3.5-4B running locally via node-llama-cpp. No text is sent to any external API.
  • Extracted entities are tagged with cascade:dataProvenance cascade:AIExtracted and a PROV-O activity node recording the model ID and confidence score.
  • All three output paths (auto-accepted, review queue, discarded) are preserved for audit purposes.

What you can ask

The agent understands any task expressible with the Cascade CLI. Some examples:

▸ Convert all the FHIR bundles in ~/Downloads/records to Cascade RDF
▸ Validate these .ttl files and tell me which ones have errors
▸ Initialize a new Pod at ~/my-health-data and import patient.xml into it
▸ How many condition records are in this Pod?
▸ Show me the medications and lab results from both providers, reconcile any duplicates
▸ There are 8 items in the review queue at ~/my-pod — walk me through them

For batch jobs the agent writes a shell loop rather than making one call per file, so converting thousands of records is a single tool invocation.

Managing Providers & Models

# List configured providers
cascade agent provider

# Switch active provider
cascade agent provider google

# Show current model
cascade agent model

# Switch model using a shortcut
cascade agent model flash       # gemini-2.0-flash
cascade agent model opus        # claude-opus-4-6
cascade agent model sonnet      # claude-sonnet-4-6

# Use any full model ID
cascade agent model gemini-1.5-pro

Settings are saved to ~/.config/cascade-agent/config.json. Environment variables (ANTHROPIC_API_KEY, OPENAI_API_KEY, GOOGLE_AI_API_KEY) always take precedence over saved keys.

Model shortcuts

ShortcutResolves to
opusclaude-opus-4-6
sonnetclaude-sonnet-4-6
haikuclaude-haiku-4-5
gpt4ogpt-4o
o3o3
flashgemini-2.0-flash
progemini-1.5-pro

Session Logs

Every session is logged automatically. Review past sessions with:

cascade agent logs
← Back to Documentation