← All Projects
ML / AI In Progress

Healthcare Analytics — NL-to-SQL

Natural language queries against a real star-schema warehouse with SCD2, dbt artifact context, and transparent SQL reasoning

DuckDBdbtPythonStreamlitAnthropic APIPlotlyCloudflare R2GitHub Actions

A publicly hosted web app where recruiters type natural language questions about healthcare claims data and get live SQL results — not pre-canned responses. The system generates SQL using dbt artifact context (manifest.json + catalog.json) for accuracy, runs it against an embedded DuckDB warehouse, and shows both the results and the generated SQL with clause-level explanations.

What It Solves

Healthcare analytics work often depends on SQL knowledge, schema familiarity, and slow handoffs between business questions and technical reporting. This project explores how a natural language interface can make warehouse data more accessible while still keeping the SQL transparent and inspectable.

What I Built

The app combines a dbt-built dimensional warehouse, DuckDB query execution, and an LLM-powered text-to-SQL backend. It uses dbt artifacts for schema context, validates generated SQL before execution, limits queries to read-only behavior, and presents both charted results and the SQL behind them.

How It Works

  1. Synthetic healthcare data is generated and modeled into a star schema with dbt.
  2. GitHub Actions runs data generation, dbt build, validation, and artifact publishing.
  3. Cloudflare R2 stores the DuckDB warehouse and build outputs for app consumption.
  4. The Streamlit UI lets users ask questions, view generated SQL, and inspect charted results.
  5. The text-to-SQL backend uses dbt manifest and catalog metadata to ground query generation.

Skills Demonstrated

Data Modeling

  • Star schema design
  • SCD2 dimensions
  • dbt models
  • Data quality checks

AI / Automation

  • Text-to-SQL workflow
  • LLM schema context
  • SQL validation
  • Result explanation

Cloud Skills

  • Streamlit Cloud app deployment
  • Cloudflare R2 object storage
  • GitHub Actions CI
  • GitHub Pages docs

Analytics App

  • Streamlit UI
  • DuckDB execution
  • Plotly chart selection
  • Recruiter-friendly demo flow

What I Learned

  • How to use dbt artifacts as machine-readable context for an analytics application.
  • How to make LLM-generated SQL safer with validation, timeouts, and read-only execution.
  • How to package a data project so the warehouse, docs, app, and CI workflow tell one coherent story.

Next Improvements

  • Deploy the curated Streamlit chart experience.
  • Expose the text-to-SQL flow in the hosted UI.
  • Move toward a React SPA with a Cloudflare Pages Function backend.