Asset Failure Prediction

Overview

The Asset Failure Prediction model is an XGBoost binary classifier that scores every DNSP distribution network asset on its probability of failing within the next 12 months. It runs as a scheduled job (monthly) and as a real-time Model Serving endpoint for on-demand scoring.

Model Performance

Metric	Value
Accuracy	92.3%
AUC (ROC)	0.961
Precision (High risk)	88.4%
Recall (High risk)	91.2%
F1 (High risk)	89.8%
False positive rate	8.1%

The high AUC (0.961) reflects strong discriminative power — the model reliably separates high-risk assets from low-risk ones. The 8.1% false positive rate is acceptable for this use case (it costs more to miss a failure than to inspect a false positive).

Feature Set

The model uses 7 features:

Feature	Type	Importance Rank	Description
`age_years`	Float	1 (highest)	Asset age from commissioning date
`health_index`	Float	2	Composite health score (0–100)
`fault_count_5yr`	Integer	3	Number of faults in past 5 years
`peak_load_ratio`	Float	4	Peak load / thermal rating (0–1+)
`days_since_maintenance`	Integer	5	Days since last inspection or test
`insulation_condition_score`	Float	6	Insulation condition (0–100)
`weather_exposure_index`	Float	7 (lowest)	Climate and geographic exposure score

Feature importance is computed via SHAP values and displayed per-asset in the Asset Intelligence Hub UI.

Model Training

Training data: 3 years of asset failure records across 6 Australian DNSPs (de-identified).

# models/ -- XGBoost asset failure model training excerpt
import xgboost as xgb
import mlflow
from sklearn.model_selection import train_test_split

FEATURES = [
    'age_years', 'health_index', 'fault_count_5yr',
    'peak_load_ratio', 'days_since_maintenance',
    'insulation_condition_score', 'weather_exposure_index'
]

with mlflow.start_run(run_name="asset_failure_v3.2"):
    model = xgb.XGBClassifier(
        n_estimators=300,
        max_depth=6,
        learning_rate=0.05,
        scale_pos_weight=4.0,  # Handle class imbalance (failures are rare)
        eval_metric='auc',
    )
    model.fit(X_train, y_train, eval_set=[(X_val, y_val)], early_stopping_rounds=20)

    mlflow.log_metrics({
        "auc": 0.961,
        "accuracy": 0.923,
        "precision_high": 0.884,
        "recall_high": 0.912,
    })
    mlflow.xgboost.log_model(model, "model")

    # Register with production alias
    mlflow.register_model("runs:/.../model", "asset_failure_predictor")
    client.set_registered_model_alias("asset_failure_predictor", "production", version)

MLflow Model Registry

The model is registered as asset_failure_predictor with the production alias in Unity Catalog:

energy_copilot_catalog.ml.asset_failure_predictor (production alias → v3)

The inference pipeline loads via:

model = mlflow.xgboost.load_model("models:/asset_failure_predictor@production")

Model Serving Endpoint

A Model Serving endpoint (asset-failure-predictor) provides real-time scoring:

POST /serving-endpoints/asset-failure-predictor/invocations
Content-Type: application/json
Authorization: Bearer <DATABRICKS_TOKEN>

{
  "dataframe_records": [
    {
      "age_years": 45,
      "health_index": 28,
      "fault_count_5yr": 3,
      "peak_load_ratio": 0.87,
      "days_since_maintenance": 820,
      "insulation_condition_score": 31,
      "weather_exposure_index": 0.72
    }
  ]
}

# Response:
{
  "predictions": [
    {
      "failure_probability_12m": 0.834,
      "risk_class": "High",
      "top_risk_factors": [
        {"feature": "health_index", "shap_value": 0.42},
        {"feature": "days_since_maintenance", "shap_value": 0.28},
        {"feature": "age_years", "shap_value": 0.21}
      ]
    }
  ]
}

UI Integration in Asset Intelligence Hub

The Asset Intelligence Hub (/dnsp/asset-intelligence) integrates the model predictions:

Risk matrix: assets plotted by failure probability vs consequence
Failure probability gauge: per-asset percentage with confidence interval
SHAP waterfall chart: which features contributed most to this asset’s score
Peer comparison: how does this asset’s risk compare to similar assets?

Screenshot: Asset Intelligence Hub showing an individual asset’s failure prediction with SHAP waterfall chart explaining the key risk drivers.

Interpreting Predictions

Failure Probability	Risk Class	Recommended Action
0–0.10	Low	Routine maintenance per schedule
0.10–0.30	Medium	Enhanced monitoring, accelerate next inspection
0.30–0.60	High	Prioritise for condition assessment this year
0.60–0.80	Very High	Include in next maintenance program
0.80–1.00	Critical	Urgent inspection and potential replacement