Skip to content

Pipeline Jobs

Energy Copilot runs 30 scheduled serverless pipeline jobs in Databricks, orchestrating data ingestion from all external sources through the Bronze → Silver → Gold medallion architecture.

All jobs use serverless compute via environment_key — no cluster management, no spot instance configuration.

#Job NameScheduleDescriptionTarget Tables
00job_00_setupOne-timeCreates schemas, tables, seeds reference data, backfills 90 days of synthetic NEM dataAll schemas
0101_nemweb_ingestContinuous DLTDLT Autoloader: NEMWEB dispatch prices, SCADA, interconnectors, FCAS Bronze → Silver → Goldgold.nem_prices_5min, gold.nem_generation_by_fuel, gold.nem_interconnectors, gold.nem_fcas_prices
0202_openelec_ingestEvery 30 minOpenElectricity API incremental ingestgold.openelec_generation
0303_weather_ingestEvery hourBOM ACCESS-G weather observations + NWP forecastgold.weather_nem_regions
0404_solar_ingestEvery 30 minAPVI rooftop solar by stategold.rooftop_solar_state
0505_forecast_pipelineEvery 30 minMLflow model inference: price, demand, wind, solar forecastsgold.price_forecasts, gold.demand_forecasts, gold.wind_forecasts, gold.solar_forecasts
0606_market_summaryDaily 05:30 AESTClaude Sonnet 4.5 daily NEM market briefgold.market_briefs
0707_gas_ingestDaily 08:00STTM gas hub prices (Sydney, Adelaide, Brisbane)gold.gas_hub_prices
0808_data_qualityDaily 06:00Data completeness, freshness, null rate checksgold.data_quality_metrics
0909_simulatorContinuous (manual)Writes synthetic NEM market data every 30s for demo purposesAll Gold tables
1010_dashboard_snapshotsEvery 5 minPre-computes JSON responses for 15 high-traffic dashboard endpointsgold.dashboard_snapshots
1111_grant_lakebase_permsOne-time (post-deploy)Grants App SP SELECT on Lakebase gold schemaN/A
1212_recreate_synced_tablesOne-time (post-deploy)Creates continuous Delta → Lakebase synced tablesLakebase gold schema
1313_nemweb_bronze_to_goldEvery 5 minFast path: NEMWEB CSV directly to Gold (bypasses Silver for latency)gold.nem_prices_5min, gold.nem_generation_by_fuel
1414_cer_lgc_weeklyWeekly Sunday 02:00CER LGC registry weekly updategold.lgc_registry
1515_dnsp_outage_ingestEvery 15 minDNSP planned outage notificationsgold.dnsp_outages
1616_aer_reliability_ingestAnnual (Jan)AER published DNSP reliability data (SAIDI/SAIFI actuals)gold.dnsp_stpis_actuals
1717_aer_enforcement_ingestWeekly Monday 03:00AER enforcement register scrapegold.aer_enforcement
1818_aer_compliance_calendarMonthly 1stAER compliance calendar updategold.compliance_calendar
1919_esv_incidents_ingestQuarterlyESV electrical incident registergold.esv_incidents
2020_esv_regulations_ingestAnnualESV regulatory requirements librarygold.esv_regulations
2121_aemo_procedures_ingestMonthlyAEMO market procedures librarygold.aemo_procedures
2222_safeguard_mechanism_ingestAnnual (Nov)CER Safeguard Mechanism datagold.safeguard_mechanism
2323_aemc_rule_changes_ingestWeekly Monday 04:00AEMC rule change registergold.aemc_rule_changes
2424_btm_der_ingestMonthlyBehind-the-meter DER connection datagold.dnsp_hosting_capacity, gold.der_connections
2525_rab_determinations_ingestQuarterlyAER RAB determination datagold.dnsp_rab_schedule
2626_consumer_protection_ingestMonthlyAER consumer protection datagold.consumer_protection
2727_tnsp_tuos_ingestMonthlyTNSP transmission use of system chargesgold.tuos_charges
2828_rit_register_ingestQuarterlyAER Regulatory Investment Test registergold.rit_register
2929_offshore_wind_ingestMonthlyOffshore wind project pipeline from DCCEEWgold.offshore_wind_projects
nemweb_downloaderUtilityPolling utility with exponential backoff for NEMWEB file downloadsUsed by jobs 01 and 13

Job run health is tracked in the gold.data_quality_metrics table and displayed on the Data Quality report:

Terminal window
# Check all pipeline statuses
GET /api/reporting/pipeline-runs?days=1
# Response shows last run status for each pipeline:
{
"pipelines": [
{
"pipeline_name": "13_nemweb_bronze_to_gold",
"last_run": "2025-03-21T05:35:00Z",
"status": "SUCCESS",
"duration_seconds": 45,
"rows_written": 15
},
...
]
}

All pipelines implement exponential backoff retry logic for transient failures:

# nemweb_downloader.py — exponential backoff
import time
import random
def download_with_retry(url: str, max_retries: int = 5) -> bytes:
for attempt in range(max_retries):
try:
response = requests.get(url, timeout=30)
response.raise_for_status()
return response.content
except requests.HTTPError as e:
if attempt == max_retries - 1:
raise
wait = (2 ** attempt) + random.uniform(0, 1)
time.sleep(wait)

For NEMWEB specifically, backoff is important because AEMO’s portal can return 429 (rate limit) or 503 (maintenance) responses.

To add a new ingestion pipeline:

  1. Create pipelines/NN_source_name_ingest.py
  2. Add Bronze and Gold table DDL to setup/02_create_tables.sql
  3. Add job definition to resources/jobs.yml
  4. Run databricks bundle deploy to create the job
  5. Trigger a manual backfill run to populate historical data