Healthcare Analytics — NL-to-SQL
Natural language queries against a real star-schema warehouse with SCD2, dbt artifact context, and transparent SQL reasoning
A publicly hosted web app where recruiters type natural language questions about healthcare claims data and get live SQL results — not pre-canned responses. The system generates SQL using dbt artifact context (manifest.json + catalog.json) for accuracy, runs it against an embedded DuckDB warehouse, and shows both the results and the generated SQL with clause-level explanations.
Problem
What It Solves
Healthcare analytics work often depends on SQL knowledge, schema familiarity, and slow handoffs between business questions and technical reporting. This project explores how a natural language interface can make warehouse data more accessible while still keeping the SQL transparent and inspectable.
Solution
What I Built
The app combines a dbt-built dimensional warehouse, DuckDB query execution, and an LLM-powered text-to-SQL backend. It uses dbt artifacts for schema context, validates generated SQL before execution, limits queries to read-only behavior, and presents both charted results and the SQL behind them.
Architecture
How It Works
- Synthetic healthcare data is generated and modeled into a star schema with dbt.
- GitHub Actions runs data generation, dbt build, validation, and artifact publishing.
- Cloudflare R2 stores the DuckDB warehouse and build outputs for app consumption.
- The Streamlit UI lets users ask questions, view generated SQL, and inspect charted results.
- The text-to-SQL backend uses dbt manifest and catalog metadata to ground query generation.
Skills
Skills Demonstrated
Data Modeling
- Star schema design
- SCD2 dimensions
- dbt models
- Data quality checks
AI / Automation
- Text-to-SQL workflow
- LLM schema context
- SQL validation
- Result explanation
Cloud Skills
- Streamlit Cloud app deployment
- Cloudflare R2 object storage
- GitHub Actions CI
- GitHub Pages docs
Analytics App
- Streamlit UI
- DuckDB execution
- Plotly chart selection
- Recruiter-friendly demo flow
Experience
What I Learned
- How to use dbt artifacts as machine-readable context for an analytics application.
- How to make LLM-generated SQL safer with validation, timeouts, and read-only execution.
- How to package a data project so the warehouse, docs, app, and CI workflow tell one coherent story.
Roadmap
Next Improvements
- Deploy the curated Streamlit chart experience.
- Expose the text-to-SQL flow in the hosted UI.
- Move toward a React SPA with a Cloudflare Pages Function backend.