Skip to content

Vegetation Risk ML

The Vegetation Risk ML model is an XGBoost multi-class classifier that scores each network span (section of overhead conductor between two poles) into one of four risk categories: Low, Medium, High, or Critical. It runs weekly and produces a prioritised inspection schedule that helps DNSPs allocate vegetation management resources efficiently.

MetricValue
Accuracy88.7%
F1-macro86.3%
F1 (Critical class)84.1%
Recall (Critical class)89.3%
AUC (one-vs-rest, Critical)0.923
False negative rate (Critical)10.7%

The model is tuned for high recall on the Critical class — better to flag some false positives than to miss a genuinely critical span.

FeatureSource DataEngineering
fire_history_scoreAFAC historical bushfire perimeter dataWeighted count of fires within 1km in past 20 years
inspection_age_daysDNSP inspection management systemCURRENT_DATE - last_inspection_date
clearance_age_daysDNSP vegetation management systemCURRENT_DATE - last_clearance_date
vegetation_growth_rateSpecies database + climate zoneGrowth rate estimate (m/year)
span_length_mGIS network modelDirect measurement
conductor_height_mGIS network modelAverage height above ground
bmo_zone_flagState government BMO mappingBinary: 0 or 1
last_clearance_distance_mInspection reportsLast measured clearance (m)

Additional features computed during training but not used in production (too expensive to compute at inference time):

  • estimated_current_clearance_m: extrapolation from last clearance + growth rate × time
  • vegetation_type_index: species composition score from remote sensing
  • slope_aspect_index: fire spread risk based on slope and aspect
# Vegetation risk model training
import xgboost as xgb
from sklearn.preprocessing import LabelEncoder
import mlflow
RISK_CLASSES = ['Low', 'Medium', 'High', 'Critical']
# Class weights to prioritise Critical recall
class_weights = {0: 1.0, 1: 1.5, 2: 2.5, 3: 4.0} # Critical gets 4× weight
with mlflow.start_run(run_name="vegetation_risk_v2.1"):
model = xgb.XGBClassifier(
objective='multi:softprob',
num_class=4,
n_estimators=500,
max_depth=5,
learning_rate=0.03,
subsample=0.8,
colsample_bytree=0.8,
sample_weight=[class_weights[y] for y in y_train]
)
model.fit(X_train, y_train)
mlflow.log_metrics({
"accuracy": 0.887,
"f1_macro": 0.863,
"recall_critical": 0.893,
"auc_critical": 0.923
})

Spans within Bushfire Mitigation Obligation (BMO) zones receive elevated risk scores due to stricter clearance requirements and higher consequence of non-compliance:

def apply_bmo_adjustment(risk_score: float, bmo_flag: bool) -> float:
"""
Bump risk probability for BMO-zone spans.
A Medium score in a BMO zone becomes equivalent to a High score.
"""
if bmo_flag:
return min(1.0, risk_score * 1.35)
return risk_score

This adjustment ensures that inspections of BMO-zone spans are given higher priority even when the base model score is moderate.

After each weekly model run, the platform computes the diff against the previous week’s scores:

-- Newly flagged high-risk spans (moved from Medium/Low to High/Critical)
SELECT
s.span_id,
s.feeder_id,
s.suburb,
s.bmo_zone_flag,
prev.risk_class AS previous_risk_class,
curr.risk_class AS current_risk_class,
curr.risk_score AS current_risk_score,
curr.top_driver AS primary_risk_factor
FROM energy_copilot.gold.dnsp_vegetation_risk curr
JOIN energy_copilot.gold.dnsp_vegetation_risk prev
ON curr.span_id = prev.span_id
AND prev.model_run_date = curr.model_run_date - INTERVAL '7 DAYS'
WHERE curr.model_run_date = CURRENT_DATE()
AND curr.risk_class IN ('High', 'Critical')
AND prev.risk_class IN ('Low', 'Medium')
ORDER BY curr.risk_score DESC;

Newly-flagged alerts are surfaced in:

  • The Vegetation Risk dashboard alert feed
  • Email notification to the vegetation management team (configurable)
  • AI Copilot get_vegetation_risk_scores tool

The model outputs class probabilities for all four risk levels. The predicted class is the highest probability, and the confidence score is that probability:

{
"span_id": "SA-NW-00847-A",
"risk_class": "High",
"confidence": 0.73,
"class_probabilities": {
"Low": 0.04,
"Medium": 0.13,
"High": 0.73,
"Critical": 0.10
},
"top_risk_factors": [
{"feature": "inspection_age_days", "contribution": 0.38},
{"feature": "fire_history_score", "contribution": 0.29},
{"feature": "bmo_zone_flag", "contribution": 0.18}
]
}
Terminal window
# High-risk spans with confidence scores
GET /api/dnsp/vegetation/high-risk?dnsp=ausgrid&min_score=0.6&include_confidence=true
# Span-level prediction detail
GET /api/dnsp/vegetation/span/SA-NW-00847-A
# Weekly newly-flagged alerts
GET /api/dnsp/vegetation/new-alerts?dnsp=ergon
# Model performance metrics (latest run)
GET /api/dnsp/vegetation/model-performance