Skip to content

Asset Failure Prediction

The Asset Failure Prediction model is an XGBoost binary classifier that scores every DNSP distribution network asset on its probability of failing within the next 12 months. It runs as a scheduled job (monthly) and as a real-time Model Serving endpoint for on-demand scoring.

MetricValue
Accuracy92.3%
AUC (ROC)0.961
Precision (High risk)88.4%
Recall (High risk)91.2%
F1 (High risk)89.8%
False positive rate8.1%

The high AUC (0.961) reflects strong discriminative power — the model reliably separates high-risk assets from low-risk ones. The 8.1% false positive rate is acceptable for this use case (it costs more to miss a failure than to inspect a false positive).

The model uses 7 features:

FeatureTypeImportance RankDescription
age_yearsFloat1 (highest)Asset age from commissioning date
health_indexFloat2Composite health score (0–100)
fault_count_5yrInteger3Number of faults in past 5 years
peak_load_ratioFloat4Peak load / thermal rating (0–1+)
days_since_maintenanceInteger5Days since last inspection or test
insulation_condition_scoreFloat6Insulation condition (0–100)
weather_exposure_indexFloat7 (lowest)Climate and geographic exposure score

Feature importance is computed via SHAP values and displayed per-asset in the Asset Intelligence Hub UI.

Training data: 3 years of asset failure records across 6 Australian DNSPs (de-identified).

# models/ -- XGBoost asset failure model training excerpt
import xgboost as xgb
import mlflow
from sklearn.model_selection import train_test_split
FEATURES = [
'age_years', 'health_index', 'fault_count_5yr',
'peak_load_ratio', 'days_since_maintenance',
'insulation_condition_score', 'weather_exposure_index'
]
with mlflow.start_run(run_name="asset_failure_v3.2"):
model = xgb.XGBClassifier(
n_estimators=300,
max_depth=6,
learning_rate=0.05,
scale_pos_weight=4.0, # Handle class imbalance (failures are rare)
eval_metric='auc',
)
model.fit(X_train, y_train, eval_set=[(X_val, y_val)], early_stopping_rounds=20)
mlflow.log_metrics({
"auc": 0.961,
"accuracy": 0.923,
"precision_high": 0.884,
"recall_high": 0.912,
})
mlflow.xgboost.log_model(model, "model")
# Register with production alias
mlflow.register_model("runs:/.../model", "asset_failure_predictor")
client.set_registered_model_alias("asset_failure_predictor", "production", version)

The model is registered as asset_failure_predictor with the production alias in Unity Catalog:

energy_copilot_catalog.ml.asset_failure_predictor (production alias → v3)

The inference pipeline loads via:

model = mlflow.xgboost.load_model("models:/asset_failure_predictor@production")

A Model Serving endpoint (asset-failure-predictor) provides real-time scoring:

Terminal window
POST /serving-endpoints/asset-failure-predictor/invocations
Content-Type: application/json
Authorization: Bearer <DATABRICKS_TOKEN>
{
"dataframe_records": [
{
"age_years": 45,
"health_index": 28,
"fault_count_5yr": 3,
"peak_load_ratio": 0.87,
"days_since_maintenance": 820,
"insulation_condition_score": 31,
"weather_exposure_index": 0.72
}
]
}
# Response:
{
"predictions": [
{
"failure_probability_12m": 0.834,
"risk_class": "High",
"top_risk_factors": [
{"feature": "health_index", "shap_value": 0.42},
{"feature": "days_since_maintenance", "shap_value": 0.28},
{"feature": "age_years", "shap_value": 0.21}
]
}
]
}

The Asset Intelligence Hub (/dnsp/asset-intelligence) integrates the model predictions:

  • Risk matrix: assets plotted by failure probability vs consequence
  • Failure probability gauge: per-asset percentage with confidence interval
  • SHAP waterfall chart: which features contributed most to this asset’s score
  • Peer comparison: how does this asset’s risk compare to similar assets?

Screenshot: Asset Intelligence Hub showing an individual asset’s failure prediction with SHAP waterfall chart explaining the key risk drivers.

Failure ProbabilityRisk ClassRecommended Action
0–0.10LowRoutine maintenance per schedule
0.10–0.30MediumEnhanced monitoring, accelerate next inspection
0.30–0.60HighPrioritise for condition assessment this year
0.60–0.80Very HighInclude in next maintenance program
0.80–1.00CriticalUrgent inspection and potential replacement