Getting Started with Cascade Agent

What is Cascade Agent?

Cascade Agent is a natural language interface and document intelligence engine built into the Cascade Protocol CLI. It does two things:

Conversational assistant — Describe what you want in plain English; the agent figures out the right CLI commands to run.
Document intelligence — Extracts structured clinical entities (medications, conditions, lab results, etc.) from free-text narrative sections in C-CDA documents using an on-device LLM.

▶ Convert all FHIR files in ~/records to Cascade RDF and save them to ~/output

⚙ shell $ mkdir -p ~/output && for f in ~/records/*.json; do cascade convert ...

↳ (command succeeded — 1,201 files converted)

Done. 1,201 patient records converted and saved to ~/output as .ttl files.

The agent streams responses in real time, shows every command it runs, and maintains conversation context so you can follow up naturally.

1

Prerequisites

✓

Node.js 18+
Runtime

✓

Cascade CLI
Includes Agent

✓

AI provider
API key or local

Cascade Agent is bundled with the CLI since v0.5. Install or update the CLI to get both:

npm install -g @the-cascade-protocol/cli

Verify both are available:

cascade --version
cascade agent --version

⚠ Deprecation notice: `cascade-agent` standalone binary

The separate cascade-agent package and its binary are deprecated as of v1.0.0. Use cascade agent (subcommand of the CLI) instead. The standalone binary still works but prints a warning and will be removed in v2.0.0.

2

Configure an AI Provider

Cascade Agent supports five AI providers. You only need one. The Local provider runs entirely on-device with no API key or internet connection.

Provider	Flag	Cost / account	Default model
Anthropic (Claude)	`-p anthropic`	Paid — console.anthropic.com	claude-opus-4-6
OpenAI (GPT)	`-p openai`	Paid — platform.openai.com	gpt-4o
Google (Gemini)	`-p google`	Free tier — aistudio.google.com	gemini-2.0-flash
Ollama	`-p ollama`	Free — runs on your machine	llama3.2
Local (Qwen3.5-4B)	`-p local`	Free, fully on-device — no account	Qwen3.5-4B (~2.5 GB)

The Local provider is powered by node-llama-cpp and downloads the Qwen3.5-4B model (~2.5 GB) on first use. It is the recommended provider for document intelligence tasks because no text ever leaves your machine. A smaller Qwen3.5-2B variant (~1.5 GB) is also available for memory-constrained machines.

Set up a cloud provider

Run the interactive setup to save your API key:

cascade agent login

Or use environment variables (always take precedence over saved keys):

export ANTHROPIC_API_KEY=sk-ant-...
export OPENAI_API_KEY=sk-...
export GOOGLE_AI_API_KEY=AI...

Note: AI provider subscriptions (Claude.ai, ChatGPT Plus, Gemini Advanced) are separate from API access. You need an API key from each provider's developer console. Google AI Studio offers a free tier with generous rate limits.

3

Use the Agent

Interactive REPL

Start a conversation where the agent remembers context across requests:

cascade agent                   # uses your configured provider
cascade agent -p google         # override provider for this session
cascade agent -p local          # use fully on-device Qwen3.5-2B

Inside the REPL:

Input	Action
any text	Send a request to the agent
/clear	Reset conversation history
/help	Show usage examples
/exit	Quit

One-shot mode

Run a single request and exit — useful in scripts:

cascade agent "validate ~/health-data/patient.ttl"
cascade agent "how many lab results are in this record?" ~/records/patient.json
cascade agent -p openai -m gpt-4o "initialize a Pod at ~/my-pod and import patient.json"

Pod auto-detection v1.1.0

When you start the REPL from inside a directory that contains index.ttl (a Pod root), the agent automatically detects it and announces it at startup:

Output

Cascade Agent v1.1.0  (local provider)
✓ Pod detected: ./my-health-pod

cascade> _

The detected pod is passed automatically to pod commands, so you can say "show me my medications" without specifying a path.

Multi-line input v1.1.0

End a line with \ to continue typing on the next line — useful for longer requests:

cascade> Import hospital-records.json and primary-care.json \
... into my pod, reconcile against existing records, \
... and show me a summary of any conflicts found

4

Document Intelligence

C-CDA clinical documents often contain free-text narrative sections alongside structured data. Cascade Agent can extract structured clinical entities from these narratives using an on-device LLM, then write them to your Pod as validated RDF.

How it works

cascade pod importParses C-CDA, flags narrative sections

→

cascade agent serveLocal LLM extraction server

→

cascade pod extractSends narratives, routes by confidence

→

cascade agent reviewHuman review of borderline entities

Step 1 — Start the extraction server

cascade agent serve starts a local HTTP server on port 8765 that accepts narrative text and returns extracted clinical entities. On first run it downloads the Qwen3.5-4B model (~2.5 GB).

cascade agent serve

Output

⌄ Cascade Agent extraction server
  Checking model... Qwen3.5-4B not found, downloading (~2.5 GB)
  … download complete
  Listening on http://localhost:8765
  POST /extract  —  GET /health

Keep the server running in a terminal tab while you run the next steps.

Step 2 — Import and extract

After importing a C-CDA document (cascade pod import), the import command will tell you if any narrative sections are queued for extraction. Then run:

cascade pod extract ~/my-pod

Extracted entities are routed by confidence:

Confidence	Destination	Action needed
≥ 0.85	`clinical/ai-extracted.ttl`	Auto-accepted, none
0.50 – 0.84	`analysis/review-queue.json`	Human review (see below)
< 0.50	`analysis/discarded-extractions.ttl`	Discarded, auditable

Step 3 — Review borderline entities

Entities with confidence between 0.50 and 0.84 are placed in a review queue. Approve or discard them interactively:

cascade agent review --pod ~/my-pod

The review command walks through each queued entity, showing the source narrative and the extracted data, and asks you to accept or discard. Accepted entities are written to clinical/ai-extracted.ttl; discarded ones are moved to the audit log.

All extraction is on-device

The extraction server uses Qwen3.5-4B running locally via node-llama-cpp. No text is sent to any external API.
Extracted entities are tagged with cascade:dataProvenance cascade:AIExtracted and a PROV-O activity node recording the model ID and confidence score.
All three output paths (auto-accepted, review queue, discarded) are preserved for audit purposes.

What you can ask

The agent understands any task expressible with the Cascade CLI. Some examples:

▸ Convert all the FHIR bundles in ~/Downloads/records to Cascade RDF

▸ Validate these .ttl files and tell me which ones have errors

▸ Initialize a new Pod at ~/my-health-data and import patient.xml into it

▸ How many condition records are in this Pod?

▸ Show me the medications and lab results from both providers, reconcile any duplicates

▸ There are 8 items in the review queue at ~/my-pod — walk me through them

For batch jobs the agent writes a shell loop rather than making one call per file, so converting thousands of records is a single tool invocation.

Managing Providers & Models

# List configured providers
cascade agent provider

# Switch active provider
cascade agent provider google

# Show current model
cascade agent model

# Switch model using a shortcut
cascade agent model flash       # gemini-2.0-flash
cascade agent model opus        # claude-opus-4-6
cascade agent model sonnet      # claude-sonnet-4-6

# Use any full model ID
cascade agent model gemini-1.5-pro

Settings are saved to ~/.config/cascade-agent/config.json. Environment variables (ANTHROPIC_API_KEY, OPENAI_API_KEY, GOOGLE_AI_API_KEY) always take precedence over saved keys.

Model shortcuts

Shortcut	Resolves to
opus	claude-opus-4-6
sonnet	claude-sonnet-4-6
haiku	claude-haiku-4-5
gpt4o	gpt-4o
o3	o3
flash	gemini-2.0-flash
pro	gemini-1.5-pro

Session Logs

Every session is logged automatically. Review past sessions with:

cascade agent logs

Getting Started with Cascade Agent

What is Cascade Agent?

Prerequisites

⚠ Deprecation notice: `cascade-agent` standalone binary

Configure an AI Provider

Set up a cloud provider

Use the Agent

Interactive REPL

One-shot mode

Pod auto-detection v1.1.0

Multi-line input v1.1.0

Document Intelligence

How it works

Step 1 — Start the extraction server

Step 2 — Import and extract

Step 3 — Review borderline entities

All extraction is on-device

What you can ask

Managing Providers & Models

Model shortcuts

Session Logs

Next Steps

CLI Reference

TypeScript SDK

Python SDK

Security & Compliance

Getting Started with Cascade Agent

What is Cascade Agent?

Prerequisites

⚠ Deprecation notice: cascade-agent standalone binary

Configure an AI Provider

Set up a cloud provider

Use the Agent

Interactive REPL

One-shot mode

Pod auto-detection v1.1.0

Multi-line input v1.1.0

Document Intelligence

How it works

Step 1 — Start the extraction server

Step 2 — Import and extract

Step 3 — Review borderline entities

All extraction is on-device

What you can ask

Managing Providers & Models

Model shortcuts

Session Logs

Next Steps

CLI Reference

TypeScript SDK

Python SDK

Security & Compliance

⚠ Deprecation notice: `cascade-agent` standalone binary