OMDEC | Optimal Maintenance Decisions Inc.

Reliability-centered knowledge - Preface to Course Notes

By Murray Wiseman (Extracted from Reliability-centered Knowledge)

 

Using Maintenance Databases for Reliability Analysis and Improvement

Part 1. A Reliability-centered Knowledge Base        11

 

Part 2. Using Maintenance Data                              29

 

Part 3. Condition Based Maintenance                      71

 

Part 4. Reliability Centered Maintenance                 137

 

This book provides the course notes for a CBM (condition based maintenance[1]) training session that describes in 3 parts:

 

  1. Database attributes required for reliability analysis,
  2. Use of information from such databases, and particularly,
  3. Optimal interpretation of observations generated by condition based maintenance activities.

 

Reliability-centered maintenance forms the philosophical framework of this work. Effective CBM flows from the application of RCM. Reliability-centered knowledge implies that structured and valid information will drive reliability improvement. We have, for this reason, included Part 4, “Reliability-centered maintenance” (RCM). The book and course draw liberally from RCM practice with such RCM concepts such as “failure mode causality depth selection”, “decision analysis”, and  “age exploration”.

 

Parts 1 and 2 usually fill the first morning of the course, providing an introduction and background for EXAKT.  Part 3 begins with a theoretical development of CBM,  a history of CBM, and  a discussion of the reasons for selecting CBM as a proactive task. The second section of Part 3 presents the anatomy of CBM, specifically its three sub-processes –  data acquisition, signal processing, and decision making. The latter leads into the introduction of EXAKT CBM decision optimization. The fundamentals of CBM are explored further and the RCM concept of the “P-F interval”[2] is described and reconciled with the methodology of EXAKT. The development of the relationship among data, risk, and cost ensues, using a time-based maintenance example. This approach is shown then, to be extendable to CBM using the Weibull PHM[3] model. The need for automated decision making, as a consequence of the growing volumes of data and the diminishing resources that characterize today’s maintenance departments is expounded upon.

 

At this point participants (or readers) are invited to work through a step-by-step exercise during which they encounter most of the basic features of EXAKT. This includes the 5 principal database tables and their table structure. They proceed to build a decision model using a reduced set of  haul truck transmission oil analysis data.  In the exercise that follows, they deploy the model that they will have previously created. That is, they set up an (EXAKTd) intelligent agent and examine its automated analysis, reporting, and database functionality.

 

Next, the issue of data validation is explored. The example is from a CBM project at the Cardinal River Coals mine in which invalid data, missing data, faulty failure definition, the impact of oil changes on oil analysis data, and cost sensitivity analysis are all encountered, and their respective EXAKT remedial functions explored. This discussion is then reinforced by an exercise in which all of the results of the Cardinal River Coals project are replicated by the class working in pairs on their own laptops.  The exercise includes an introduction to general transformations[4] in EXAKT.

 

At this time, an advanced topic is introduced – the analysis of complex items[5]. A complex item is defined and the data structure for representing complex items in a model is described. How to map CMMS database fields to EXAKT’s key fields of B, EF, and ES (Beginning, Ending by Failure, and Ending by Suspension) is elaborated at some length and then reinforced immediately with an exercise using a two-failure-mode gearbox as an example.

 

The final exercise provides an introduction and practice in the use of history specific transformations, for the purpose of smoothing erratic data. Additional sophistication is demonstrated via the elimination of a “drooping” artifact as a result of the basic smoothing  algorithm.  Additionally this final exercise introduces the testing of the shape-factor-equal-to-one[6] hypothesis, and the reasoning behind its use in this specific case. This ends the formal part of the course. Finally, the attendees are asked to search their respective records and databases for potentially good CBM optimization projects. The criteria for “good” is articulated in the form of a balanced compromise between data availability (inspection and event) on the one hand, and, the gravity of the consequences of failure on the other.

I hope you enjoy the course. I invite your comments at murray@omdec.com.

 

Murray Wiseman

Optimal Maintenance Decisions (OMDEC) Inc.

 

Introduction by Andrew Jardine

Over the past decade, in my work as principal investigator at the CBM laboratory and during my travels and speaking engagements, people ask what inspired  the EXAKT development project. The answer to that question is quite simple. Condition based maintenance is the most desirable form of maintenance, yet,  former students, now maintenance professionals, told me that they find often that their current CBM programs, such as oil analysis, don’t deliver the intended results. I asked how “exactly” their staff interpreted condition monitoring data. In other words, how did they decide whether or not to remove an item for repair? Their answers led me to investigate whether a more  rigorous decision methodology might improve the payback on the rather large investment they were making in condition based maintenance.

 

I found that two approaches were being used to interpret and act upon CBM data. One method arrived at decisions by recalling solid experience and engineering knowledge that a known level of a monitored variable indicates the initiation of a particular failure mode. The second, relied on “trend analysis” as the basis for  making the “maintain-now-or-continue-operating” decision. Looking closely at the data from both cases, I found that, while the former achieved, generally, the expected benefits, the latter failed to provide measurable return on the investment in the fixed and running costs of the CBM program.

 

In the first case, CBM detection of,  for example, diesel fuel in lubricating oil, reflects the “ground truth” of a failed condition – that is, a leaking of fuel past the sealing surfaces of some interface, perhaps the piston, ring, and cylinder wall. Similarly, coolant in the lube oil, reflects the breakdown of some interface, possibly a gasket, separating the cooling and lubricating fluids. However,  where “data trending” is the principal method for decisions, the relationship between monitored data and the failure mechanism is often vague. We rely on a palpable deviation from some “normal” trend to alert us to a problem.

 

Although this sounds like a reasonable approach, it works only if the data clearly reflects a developing failure. But such is often not the case. Usually, several separate or inter-related phenomena affect the monitored data. Although common sense would have us believe that monitored signals from the machine must contain its health information, we know little about the nature of that relationship. For example, if the operator of a nuclear reactor alters the temperature of the sealing fluid in the cooling water pump, then the leak rate, normally used to monitor seal health, would tend to decrease, even if the seal were, indeed, beginning to fail. The interpretation of trends, thus, becomes complicated. Add to this, random noise, the effects of load variation, and more than one failure mode, and you can imagine that attempting trend analysis of multiple data streams, emanating from complex systems, might frustrate the well-intentioned maintenance planner or engineer.

 

Resolving this problem posed a unique challenge. The condition monitoring phrase “Equipment Health” brings to mind the idea of human health. I looked at the medical field where the problem of symptoms based prognostics is well known. The concept of “risk factors” that associate medical test results with specific illnesses seemed perfectly analogous to the problem of risk based decisions in maintenance.  Cox’s proportional hazard model  in the 1970’s had proved useful in the detection of illnesses and in the prediction of human survival. I applied these ideas, first, to jet propulsion engines, and discovered that  we could model the risk of engine failure in terms of the oil analysis results of iron and chromium, and the engine’s accumulated flight hours since overhaul. That work proved very encouraging. So much so, that we set out to develop a general purpose software platform for PHM (proportional hazard modeling) prediction. Over the past decade, at the CBM laboratory of the University of Toronto, we gradually improved the program by applying it to many industrial CBM situations. It has reached the stage now, where it should be made commercially available to the mainstream of the physical asset management community. That is the reason OMDEC was spun off from the CBM lab.

 

I have often been asked why we called the program “EXAKT”, implying that CBM is an “exact” science, while in fact the methodology of EXAKT is based on probabilities and statistics. Certainly, I can see why some people think that the name “EXAKT” and the probabilistic nature of failure are incongruous. Most managers, however, understand risk. They instinctively weigh probabilities when making decisions in the normal course of their activities. If they were told “exactly” the risk levels associated with alternative decisions, they would find such information helpful indeed. Otherwise stated, if they knew “exactly” with what level of confidence they may accept a residual life estimate for some operating physical asset, they could adjust their operational and maintenance plans accordingly.

 

Self doable, tutorial exercises are a good way to provide a comfort factor to potential users. EXAKT, is actually a usable tool. But, because EXAKT evolved as a research platform, some people have formed the impression that it is too difficult for them. This book sets out to dissolve that impression. Besides a sound treatment of the founding principles of RCM knowledge, it contains step-by-step tutorials that convey a number of common data problem solving techniques.

 

I take great pleasure in writing this introduction to “Reliability-centered Knowledge”. I am certain that it will add substantially to the success of its readers’ CBM endeavours.

 

 

Andrew Jardine

Principal Investigator, CBM Lab

Professor, Mechanical and Industrial Engineering

University of Toronto


 

Contents:

Part 1.     A Reliability-centered Knowledge Base 11

Chapter 1. 11

Introduction_ 11

The Work Order UML Class Diagram_ 12

Incorporating RCM knowledge attributes 13

The Seven Knowledge elements of RCM__ 14

The “failure code” problem_ 15

Chapter 2.       Requirements of Information_ 17

Data Structure_ 19

Implementing a Reliability Knowledge Base_ 20

Other “FMEA” data types and definitions 25

Conclusions 28

Part 2.     Using Maintenance Data_ 29

Chapter 3.       Analyzing data_ 29

Introduction_ 29

The problem with failure rates 30

How to use maintenance data?_ 31

Age Exploration Procedures 33

Random Failure 33

Failure Finding Intervals 34

Measuring Reliability Improvement 37

Refining the maintenance program_ 39

Extending inspection intervals where no experience is available – opportunity sampling 40

Assessing the effectiveness of a CBM Program_ 42

Improving the program through failure mode assessment 43

Software analytic tools 44

CBM (on-condition maintenance) benefits analysis 47

Engineering Change Assessment 52

Recording Events 53

Component age 53

Significant components 53

Chapter 4.       Monte Carlo Simulation_ 55

Introduction_ 55

Modeling a simple system using SPAR_ 55

Objective of the analysis 56

The system function 56

Running the program_ 57

Remarks 58

Repair effectiveness 58

Applying Preventive Maintenance 60

Optimizing PM_ 62

Chapter 5.       Case based reasoning_ 64

Introduction_ 64

Efficient Troubleshooting_ 65

Performance measurement 67

Case Base Development 67

Terminology 67

Building a knowledge domain 68

Building a case 68

The seed case base_ 69

Conclusions 70

Part 3.     Condition Based Maintenance 71

Chapter 6.       Deciding on CBM?_ 71

Introduction_ 71

Why do CBM?_ 72

History of CBM__ 75

Chapter 7.       Anatomy of CBM_ 79

Data Acquisition_ 79

Signal Processing_ 79

Decision Making_ 84

Chapter 8.       CBM Fundamentals 86

The fundamental premise of CBM__ 86

CBM Program Criteria 86

CBM Monitoring Frequency 86

Estimating the PF Interval 88

Chapter 9.       The Elusive P-F Curve 89

Are failures required – multiple levels of intrusiveness? 91

Discussion of Case 2_ 92

Discussion of Case 1_ 94

Chapter 10.     Optimizing CBM_ 96

Developing a Maintenance Risk Model 96

The traditional risk model 96

Combining Data and Risk_ 97

The Optimal Risk_ 99

A Time Based Maintenance Model 100

Blending in Cost 105

A Condition Based Maintenance Model 107

Automated CBM Decision Making 108

Example 1 Creating a decision model 109

Example 2 Data validation 115

Example 3 Complex Items 130

Summary_ 135

References 136

Part 4.     Reliability Centered Maintenance 137

Chapter 11.     Pillars of RCM_ 137

Introduction_ 137

Chapter 12.     Failure Modes and Effects Analysis 140

Question 1 – Functional Analysis 140

The process 140

Example 1_ 143

Example 2_ 146

Example 3_ 148

Example 4_ 149

Question 2 – Failure analysis 150

The process 150

Example 1_ 150

Example 2_ 150

Example 3_ 151

Question 3 – Failure modes analysis 152

The process 152

Example 1_ 154

Example 2_ 155

Example 3_ 156

Question 4 – Effects analysis 156

The process 156

Example 1_ 157

Example 2_ 166

Example 3_ 167

Chapter 13.     The RCM Decision Algorithm_ 168

Questions 5, 6, and 7_ 168

The process 168

Example 1_ 169

Example 2_ 172

Example 3_ 175

Example 4_ 178

Chapter 14.     Can RCM and Streamlined RCM peacefully co-exist?_ 184

Introduction_ 184

Why streamline RCM?_ 184

RCM/RCM Turbo dictionary_ 185

Example 1_ 189

Conclusions 192

Chapter 15.     Integrating Reliability Information_ 194

UML Class Diagrams 195

Chapter 16.     Managing Strategy 199

Introduction_ 199

Extending the Maintenance Audit 200

Physical asset management inputs, outputs, and control 201

Physical Asset Management Effectiveness Indicators (KPIs) 202

Choosing between model 1 and model 2_ 203

Drilling down from the KPIs 205

How to start 207

Chapter 17.     Appendices 208

Appendix 1. 208

The role of the RCM Facilitator 208

Appendix 2. 213

Sizing the analysis 213

Selecting the significant items 215

Appendix 3. 215

Failure finding intervals for complex items (multiple failure modes and devices) 215

Appendix 4. 217

Truck description 217

Appendix 5. 226

Terminology used: 226

Various definitions of “Life” 227

Appendix 6. 228

Time to Failure - Relationship among hazard, reliability, and probability density functions 228

Appendix 7. 230

Random failure survival curve 230

Appendix 8. 230

Inherent reliability characteristics 230

Appendix 9. 231

Failure mode depth of causality 231

Appendix 10. 232

Expected failure time 232

Appendix 11. 233

Exercise (Example 2 Data validation) 233

Exercise 4 data smoothing and fixing shape factor to 1_ 236

Appendix 12. 238

Data for RCM Turbo_ 238

Appendix 13. 239

Default decision diagram answers in the absence of operating experience 239

Appendix 14. 241

Additional Relcode examples 241

 



[1] Also called Predictive Maintenance (PdM), Condition Monitoring (CM), and On-condition maintenance.

[2] The term P-F Interval was coined by John Moubray to represent the concept described by Nowlan and Heap for the period between the appearance of a potential failure and the occurrence of a functional failure. See Chapter 8 of the course notes “The Elusive P-F Interval”.

[3] The PHM (proportional hazard model) extends the age based reliability model developed by Walodi Weibull in the 1950’s to one that adds condition monitoring and performance data to the age-reliability relationship.

[4] It is often necessary to transform available data into new combinations such as “rolling averages”, rates, or ratios to find the key risk factors associated with failure.

[5] Complex items are items that are subject to more than one failure mode.

[6] The shape factor is a parameter estimated by the software. If the shape factor is equal to one, it means that failure behaviour is random, indicating that time based overhaul will not be economically advantageous.

OMDEC | Optimal Maintenance Decisions Inc.
  www.omdec.com   info@omdec.com