Architecture
Architecture Overview
Section titled “Architecture Overview”Energy Copilot is built entirely on Databricks, using the platform as both the data and compute backbone. The architecture follows three layers: Data Platform (ingestion, processing, storage), Serving Layer (Lakebase and SQL Warehouse), and Application Layer (FastAPI backend + React frontend).
┌─────────────────────────────────────────────────────────────────────┐│ EXTERNAL DATA SOURCES ││ AEMO NEMWEB │ BOM Weather │ AER CDR │ OpenElectricity │ STTM Gas ││ CER LGC Registry │ AEMC │ ISP 2024 │ OpenNEM Facilities │ WEM │└─────────────────────────────────────────────────────────────────────┘ │ (30 pipeline jobs) ▼┌─────────────────────────────────────────────────────────────────────┐│ MEDALLION LAKEHOUSE (Unity Catalog) ││ ││ BRONZE (raw) SILVER (clean) GOLD (curated) ││ ───────────── ──────────────── ────────────────── ││ · Raw NEMWEB CSV · Deduped prices · nem_prices_5min ││ · API JSON dumps · Validated gen data · nem_generation_by_ ││ · Weather feeds · Normalised weather fuel ││ · ISP documents · Parsed AER docs · nem_fcas_prices ││ · STTM gas data · DNSP enriched data · market_briefs ││ · dnsp_asset_* ││ · 108+ tables total │└─────────────────────────────────────────────────────────────────────┘ │ │ ▼ ▼┌─────────────────┐ ┌──────────────────────────┐│ ML PLATFORM │ │ SERVING LAYER ││ ───────────── │ │ ────────────────────── ││ · MLflow │ │ Lakebase (primary) ││ Experiments │ │ · Postgres-compatible ││ · Model │ │ · Continuous sync from ││ Registry │ │ Gold Delta tables ││ · XGBoost │ │ · 10–38ms query latency ││ · Prophet │ │ ││ · IsolationFo. │ │ SQL Warehouse (fallback) ││ · LightGBM │ │ · 400–1000ms latency ││ 5 experiments │ │ · All 108+ tables │└─────────────────┘ └──────────────────────────┘ │ ┌────────────────────────┘ ▼┌─────────────────────────────────────────────────────────────────────┐│ FASTAPI BACKEND ││ 64 routers │ 636+ endpoints │ Python 3.11 ││ · Market data (prices, gen, FCAS, interconnectors, weather) ││ · DNSP endpoints (AIO, assets, RAB, vegetation, workforce) ││ · AI/ML endpoints (forecasts, anomaly, copilot SSE stream) ││ · Genie proxy (natural language → SQL results) ││ · Settlement + compliance + environmental endpoints │└─────────────────────────────────────────────────────────────────────┘ │ ▼┌─────────────────────────────────────────────────────────────────────┐│ REACT FRONTEND ││ React 18 │ TypeScript │ Vite │ Tailwind CSS │ Recharts ││ 564 pages │ React Router v6 │ SSE streaming copilot ││ · Front Office dashboards (live market, generation, FCAS, gas) ││ · Middle Office ETRM (deals, portfolio, risk, forward curves) ││ · Back Office (settlement, compliance, environmentals) ││ · DNSP Intelligence Suite (10 sub-groups) ││ · AI Copilot chat interface with live tool call streaming │└─────────────────────────────────────────────────────────────────────┘ │ ┌─────────┴──────────┐ ▼ ▼ ┌──────────────┐ ┌─────────────────────┐ │ FMAPI │ │ 12 GENIE AI/BI │ │ (Claude │ │ SPACES │ │ Sonnet 4.5) │ │ Natural Language │ │ 51 tools │ │ SQL interface │ └──────────────┘ └─────────────────────┘Data Platform: Medallion Lakehouse
Section titled “Data Platform: Medallion Lakehouse”Bronze Layer — Raw Ingestion
Section titled “Bronze Layer — Raw Ingestion”The Bronze layer stores raw data exactly as received from source systems, with minimal transformation. All Bronze tables are append-only and partitioned by ingest date.
| Schema | Purpose |
|---|---|
bronze | Raw NEM dispatch CSVs, API JSON dumps, weather feeds |
silver | Cleaned, validated, deduplicated intermediate tables |
gold | Curated, business-ready tables for dashboards and ML |
ml | Feature store tables, model outputs, evaluation metrics |
tools | UC SQL functions registered for the AI agent |
Silver Layer — Validation and Enrichment
Section titled “Silver Layer — Validation and Enrichment”Silver pipelines perform:
- Deduplication: using QUALIFY + ROW_NUMBER() OVER(PARTITION BY …) patterns
- Schema enforcement: Pydantic-style column type coercion in PySpark
- Null handling: configurable fill strategies per column
- Time zone normalisation: all timestamps stored as UTC with AEST offset columns
- Quality flags:
is_valid,quality_scorecolumns added for downstream filtering
Gold Layer — Curated Analytics
Section titled “Gold Layer — Curated Analytics”Gold tables are optimised for low-latency analytical queries:
- Partitioned by
interval_date(cast from dispatch timestamp) - Auto-optimise and auto-compaction enabled (
delta.autoOptimize.optimizeWrite = true) - Change Data Feed enabled on tables used by the AI agent
- COMMENT annotations on every column for the Genie semantic layer
Unity Catalog Structure
Section titled “Unity Catalog Structure”energy_copilot_catalog (or energy_copilot in prod)├── bronze/ # 11+ raw tables├── silver/ # 13+ validated tables├── gold/ # 84+ curated tables├── ml/ # Feature store + model outputs└── tools/ # 14 UC SQL functionsServing Layer
Section titled “Serving Layer”Lakebase (Primary Path — 10–38ms)
Section titled “Lakebase (Primary Path — 10–38ms)”Seven high-traffic Gold tables are continuously synced to a managed Postgres instance via Databricks Synced Tables. The FastAPI backend queries Postgres directly via psycopg3, bypassing the SQL Warehouse entirely for these hot-path tables:
gold.nem_prices_5min_dedup_syncedgold.nem_interconnectors_dedup_syncedgold.nem_generation_by_fuel_syncedgold.dashboard_snapshots_syncedgold.asx_futures_eod_syncedgold.emissions_factors_syncedgold.gas_hub_prices_synced
SQL Warehouse (Fallback — 400–1000ms)
Section titled “SQL Warehouse (Fallback — 400–1000ms)”If Lakebase is unavailable or data is stale (>30 minutes), the app falls back to databricks-sql-connector. Every API response includes X-Data-Source and X-Query-Ms headers for observability.
Dashboard Snapshots (<10ms)
Section titled “Dashboard Snapshots (<10ms)”job_10_dashboard_snapshots runs every 5 minutes, pre-computing JSON payloads for 15 high-traffic dashboard endpoints and storing them in gold.dashboard_snapshots. These are synced to Lakebase and served as pre-built responses — zero SQL at request time.
Application Layer
Section titled “Application Layer”FastAPI Backend
Section titled “FastAPI Backend”The backend is structured as a FastAPI application with 64 modular routers, each responsible for a domain:
# main.py — router registration (excerpt)app.include_router(prices_router, prefix="/api/prices")app.include_router(generation_router, prefix="/api/generation")app.include_router(fcas_router, prefix="/api/fcas")app.include_router(dnsp_router, prefix="/api/dnsp")app.include_router(chat_router, prefix="/api/chat")# ... 59 more routersKey features:
- Lifespan health check: validates Databricks + Lakebase connectivity at startup
- Structured logging: per-request JSON logs with
request_id,path,status_code,duration_ms - TTL cache: in-memory cache with configurable TTLs (10s for prices, 30s for generation, 60s for forecasts)
- Tiered data access: Lakebase → SQL Warehouse → pre-built snapshots
- SSE streaming: the
/api/chatendpoint streams typed events (text,tool_call,tool_result,done,error) for real-time copilot responses
React Frontend
Section titled “React Frontend”The frontend is built with Vite and served as static files from the Databricks App:
src/├── pages/ # 564 page components├── components/ # Shared UI components├── hooks/ # useMarketData, useWebSocket, etc.├── api/ # client.ts — typed API client└── App.tsx # React Router v6 route definitionsAI & ML Architecture
Section titled “AI & ML Architecture”Claude Sonnet 4.5 via FMAPI
Section titled “Claude Sonnet 4.5 via FMAPI”The copilot uses Claude Sonnet 4.5 routed through the Databricks Foundation Model API (FMAPI). This keeps all API calls within the Databricks network perimeter — no external Anthropic API calls:
client = anthropic.Anthropic( base_url=f"{DATABRICKS_HOST}/serving-endpoints", api_key=DATABRICKS_TOKEN,)response = client.messages.create( model="databricks-claude-sonnet-4-5", tools=ALL_51_TOOLS, messages=conversation_history,)The agentic loop runs up to 5 tool-call rounds per request, streaming intermediate results via SSE.
MLflow Model Registry
Section titled “MLflow Model Registry”Five ML experiments are tracked in MLflow:
price_forecast— LightGBM multi-horizon (5 regions, 6 horizons)demand_forecast— LightGBM demand forecastingwind_forecast— LightGBM wind generationsolar_forecast— LightGBM solar generation (night-excluded)anomaly_detection— Isolation Forest + Z-score ensemble
Models are registered with the alias production and loaded via models:/<name>@production URI in inference pipelines.
Genie AI/BI Spaces
Section titled “Genie AI/BI Spaces”12 Genie spaces provide natural language SQL interfaces over Gold tables:
| Space | Primary Tables |
|---|---|
| NEM Prices & Demand | gold.nem_prices_5min, gold.weather_nem_regions |
| Generation & Fuel Mix | gold.nem_generation_by_fuel |
| Interconnectors | gold.nem_interconnectors |
| FCAS Markets | gold.nem_fcas_prices |
| Settlement & Finance | gold.settlement_statements |
| DNSP Compliance | gold.dnsp_aio_metrics, gold.dnsp_stpis_metrics |
| Asset Intelligence | gold.dnsp_asset_register |
| Vegetation Risk | gold.dnsp_vegetation_risk |
| Workforce Analytics | gold.dnsp_workforce_metrics |
| Environmental & LGCs | gold.lgc_registry, gold.emissions_factors |
| Forward Curves | gold.asx_futures_eod |
| Gas Markets | gold.gas_hub_prices |
Deployment Architecture
Section titled “Deployment Architecture”The entire platform is managed via Databricks Asset Bundles (DAB):
databricks.yml # Root configresources/├── app.yml # Databricks App├── jobs.yml # 30 serverless jobs├── pipelines.yml # DLT pipelines├── model_serving.yml # Model Serving endpoints└── experiments.yml # MLflow experiments