Reliability Strategy
When Doing More Is Doing Harm
If your PM compliance is at 95% and your equipment is still failing, the problem isn’t your team. It’s the assumption that more maintenance equals more reliability.
When I review a maintenance program, the first thing I look at isn’t the equipment list or the CMMS dashboard. I look at the PM schedule and ask one question: how did each task get on this list, and when was the last time anyone challenged whether it should still be there?
Most of the time, nobody can answer. The PMs are there because they’ve always been there. The intervals are there because the OEM said so, or because something failed once five years ago, or because a previous reliability lead set them up before they retired. The compliance number is high. The dashboard is green. And the plant is still having unplanned shutdowns.
Here’s what I’ve learned after years of walking PM programs across different industries: there’s a point where preventative maintenance stops preventing failures and starts causing them. The data on this has been around for almost 50 years. It just hasn’t made it into how most teams plan their work.
What 50 Years of Data Actually Shows
In 1978, two engineers at United Airlines — Stanley Nowlan and Howard Heap — published a study commissioned by the U.S. Department of Defense that quietly upended how aviation thinks about maintenance. Their work has since been validated by independent studies from the U.S. Navy, NASA, ARC Advisory Group, and the Swedish industrial sector. The findings have been remarkably consistent.
Across those studies, only a small share of equipment failures are actually age-related.
Read that again. The majority of equipment failures have nothing to do with how long a component has been running. They’re random, or they’re induced — created by the maintenance work itself. Every time we open a machine to do a scheduled rebuild on a component that didn’t need it, we restart the high-risk early-failure period that engineers call infant mortality. Defective replacement parts. Misaligned reassembly. A gasket that doesn’t seat right. A seal damaged on installation. The 2 a.m. call that comes three weeks after a “routine” PM.
This isn’t a knock on preventative maintenance. PM is essential — for the right components, at the right interval, for the right failure mode. The myth is that more PM is always better. The data has been telling us otherwise for decades.
Where the Assumption Breaks Down
| The Old Assumption | What the Data Shows |
|---|---|
| If we replace the part before it fails, we prevent the failure. | Replacing a healthy part can introduce new defects. 68% of the time, the failure pattern is infant-mortality.² |
| Higher PM compliance = higher reliability. | Above ~90% PM-to-corrective ratio, plants often start over-maintaining and inducing failures.⁴ |
| Calendar-based intervals are the safe default. | Calendar-based repair/replace tasks only fit the 11–23% of failure modes that are actually age-related.¹ |
| If a component failed, we should PM it more often. | Frequency may be the cause. The right answer is often a different strategy, not a shorter interval. |
Why This Happens in Real Plants
Nobody sets out to over-maintain. The patterns I see across plants are honest, well-intentioned habits that compound over years.
What a Smarter PM Program Looks Like
The fix isn’t to do less maintenance. It’s to do the right maintenance, on the right components, at the right interval. Here’s what we’ve seen work in real facilities.
Match the Strategy to the Failure Mode
Not every asset needs the same approach. Components with predictable wear-out — gaskets, filters, brake pads — benefit from time-based PM. Components with random failure modes — bearings, motors, electronics — benefit far more from condition-based monitoring: vibration analysis, oil analysis, thermography, ultrasound. The first question for any task on the schedule shouldn’t be “how often” — it should be “what failure mode are we actually addressing here, and is calendar PM the right tool for it?”
Run a PM Optimization Review Every 2–3 Years
The PM list grows. It needs to be pruned. Pull every PM task on the schedule and ask three questions for each one:
- What failure mode does this prevent?
- What’s the consequence if we miss it?
- Is there a condition-based alternative?
Tasks that can’t answer the first two are candidates for removal. Tasks that answer yes to the third are candidates for migration. Industry data suggests roughly 30% of PM is performed too frequently — there’s real margin to free up.⁵
Track Induced Failures as Their Own Category
Add a flag in your work-order system: was this failure preceded by a PM event in the last 30 days? If yes, that’s a candidate for an induced-failure review. Once you start tracking it, you’ll find more than you expected. And once it’s visible, the team can act on it: tighter procedures, better parts handling, post-PM verification.
Measure What Actually Matters
PM compliance is an activity metric, not an outcome metric. Pair it with MTBF (mean time between failures), planned-vs-unplanned ratio, and the percentage of PM tasks driven by condition data versus calendar. When those metrics start moving in the right direction together, you’re actually building reliability — not just doing more work.
Pick one critical asset. Pull the last 12 months of work orders. Then ask:
- How many failures occurred within 30 days of a PM event?
- How many of our PM tasks address a documented failure mode — versus “we’ve always done it”?
- Which tasks could be replaced by a condition-based check?
If the answers are uncomfortable, that’s the work. You don’t need a new program — you need to look honestly at the one you have.
The goal of reliability isn’t to do as much maintenance as possible. It’s to keep equipment running safely, predictably, and at the lowest total cost — which sometimes means doing less, not more. Your team already has the discipline to execute. What they need is a program built on what the failure data is actually telling you, not on assumptions baked in twenty years ago.
If I had to give one piece of advice to plant leaders who feel stuck on this: don’t try to overhaul the entire program at once. Pick one critical asset, look hard at its PM history and its failure history, and ask whether the two actually connect. That’s usually all it takes to see the gap. From there, the work becomes obvious — and your team will be the ones who close it.
Alex Schoeni, CMRP – Reliability Solutions
Alex is a Certified Maintenance & Reliability Professional (CMRP) with over a decade of experience in industrial reliability and maintenance. He spent nearly 10 years as a Nuclear Engineering Lab Technician and Instructor with the US Navy before transitioning to Reliability Solutions, where he teaches root cause analysis and machine maintenance practices at the national level. Today, Alex leads the development and implementation of Reliable Manufacturing processes at manufacturing sites worldwide.
