Operations

What to Expect From a Predictive Analytics Pilot: A Utility Engineer's Guide

What to Expect From a Predictive Analytics Pilot: A Utility Engineer's Guide

I've talked with utility engineers who've been burned by analytics pilots before: three months of IT meetings, a six-figure consulting engagement, and a dashboard that nobody uses because the data quality issues were never resolved. That experience creates reasonable skepticism about any vendor claiming they can produce a risk map in 14 days.

So let me be specific about what those 14 days actually involve, what your team needs to do, and where the pilot can fail. The goal of this post isn't to sell you on a pilot — it's to give you enough detail to assess whether this is a good use of your team's time before you commit.

Before day 1: what you need to prepare

The honest precondition for a 14-day first risk map is that your data exists in a reasonably accessible form. Specifically:

  • GIS pipe network data: A GIS export of your distribution main geometry with asset attributes — installation year (even if partial), material, diameter, and asset ID. ArcGIS feature class or shapefile format. We can work with ESRI ArcGIS Utility Network or traditional data model. Incomplete installation year data is fine; we'll flag those segments and handle them with imputation.
  • SCADA pressure historian access: Read-only access to at least 12 months of historical pressure data from your distribution system pressure loggers. AVEVA PI/OSIsoft, Wonderware Historian, and OPC-DA/UA sources are all supported. If your historian is on a network segment with restricted external access, your IT team will need to configure a data export or a read-only API connection.
  • Break event records: At minimum 5 years of main break event records with location (address or GPS coordinates), break date, pipe asset ID if recorded, and break type if recorded. These can come from your CMMS (Maximo, Cityworks, etc.) as a CSV export or from maintenance records. Asset ID matching to GIS will take some work.

You'll also need to designate an internal point of contact — typically a GIS analyst or distribution engineer — who can answer questions about the data during the integration week. Expect 8–12 hours of total internal time over the first two weeks, concentrated in days 1–5.

Days 1–7: data ingestion and cleaning

The first week is almost entirely data work. Your point of contact will receive a secure data transfer link and a data dictionary with the specific fields and formats we need. Once the GIS, SCADA, and break records are received, our integration engineer works through the data quality issues that are present in virtually every utility dataset we've touched.

Common issues resolved in this phase: missing or imprecise pipe installation years (handled by street segment vintage estimation and spatial interpolation from neighboring records); break records that reference addresses rather than asset IDs (resolved by spatial join within a configurable distance tolerance); pressure logger IDs in SCADA that don't match the GIS asset IDs for the same physical sensor location (resolved by coordinate matching and manual confirmation for ambiguous cases).

At the end of week one, we'll send you a data quality report: how many segments have complete attribute data, what percentage of your break records were successfully matched to GIS segments, how much pressure historian coverage you have across your network. If there are data quality issues serious enough to affect the initial model quality, we'll flag them here — not at the end of the 30-day pilot.

Days 8–14: first risk map

The initial model runs with the ingested data and produces a risk score for each pipe segment with sufficient data coverage. The risk score is a 0–100 composite that combines the pipe attribute features (age, material, soil classification, break history) with the SCADA-derived features (pressure variance, transient frequency, anomaly flags) and the spatial features (proximity to previous break events, pressure zone characteristics).

At day 14, you'll receive access to the dashboard and a risk map showing all scored segments color-coded by risk tier. A written summary identifies the top 20 highest-scoring segments with the primary drivers behind each score. This first map is genuinely useful as a prioritization input — but we'd encourage you to treat it as a starting point for calibration rather than the final word.

Days 15–90: calibration and validation

The 14-day map is based on the statistical model trained on your historical data. The 90-day pilot phase is where that model gets calibrated against your team's operational knowledge. Your distribution superintendent knows which corridors have been problematic and which segments are known concerns. That local knowledge should be compared against the model's rankings — both to validate where the model agrees with field experience and to investigate the interesting cases where it doesn't.

Discrepancies are informative in both directions. If the model flags a segment as high risk that your team considers low priority, that's worth investigating: either the model is seeing something your team doesn't (perhaps a pressure transient pattern that started recently), or there's a data quality issue with that segment's attributes. If your team has a segment on their watch list that scores low in the model, the model may be missing a failure mechanism that isn't captured in the features — and that's a calibration input.

The calibration period is also when the alert thresholds are tuned for your specific network. Default thresholds are set conservatively to minimize missed events, which typically means too many alerts in the first weeks. We'll work with your operations staff to adjust sensitivity based on their review of the initial alerts — this process is described in more detail in our alert fatigue post.

What the pilot cannot promise

We're not going to claim a specific ROI target or break reduction percentage for a pilot that hasn't started. The outcome metrics from our active deployments — 62% median emergency dig reduction across the pilot cohort — reflect networks where the model has been running for at least 12 months and the capital replacement decisions driven by the model have been implemented. A 90-day pilot produces a risk map and a calibrated alert engine; it doesn't produce a measurable break reduction in 90 days, because capital replacement cycles are longer than 90 days.

What a successful pilot does produce: a defensible, quantitative prioritization of your highest-risk segments, with the data provenance to explain the ranking to your utilities board or city council. That's the foundation that most of our utility partners have used to justify capital replacement programs that would have been difficult to approve under the old "we think this corridor is risky" framing.

If your utility is in a position where capital investment decisions are being made primarily on experience and corridor familiarity rather than quantitative risk data, the pilot is worth evaluating. If you're already running a mature M36-compliant asset management program with recent physical inspection data covering more than 20% of your network, the pilot will be additive but the incremental value will be lower.

Marcus Tran is Customer Success Lead at Watsynq. He manages the onboarding and pilot validation process for all new utility deployments.