ABB, Oceana Sensor and Omdec demonstrate - Condition Based Maintenance (CBM)

This presentation describes an evolving CBM demonstration by Oceana Sensors, ABB, and OMDEC at the International Maintenance Excellence Conference

ABB, Oceana Sensor and Omdec demonstrate - Condition Based Maintenance (CBM)-Body-6

In building this demo, we asked ourselves:

  • What do we really want to prove?
  • What should a CBM strategy target?
  • How should we measure its benefits?
  • How should we support and improve the strategy?
Bear with me, as I try to convey how the demo sheds considerable light on these essential questions.

We state our objective simply. When faced with a set of operating conditions, a set of requirements for our asset, and a set of condition monitoring data, how, exactly, do we interpret what all this information is telling us. How do we make our decision as to whether:

  1. to intervene immediately?, or
  2. to continue operation until the next observation?
How can we know whether our decision process leads to the best decision - the so called optimal one? What does the word "optimally" really mean?

Here is what we, in developing the demo, thought it means. Please tell me if you disagree. As our stated goal for the demo, and for CBM in general, we desire to declare a potential failure at the right time.

That of course begs the question, “What is the right time?”.

The answer we came up with is "At the right level of risk"

How to define risk?

Here is a text book graph (variations of which) you have all seen. The vertical axis quantifies the bottom-line cost (per unit of production) of owning an asset. On the horizontal axis, we indicate some policy (scaled to risk) for deciding when to maintain the asset. We may consider the horizontal axis as a spectrum of varying risk. What's risk? Most people see risk as the combination of the probability that something is going to go wrong, and, the gravity of how bad it will be.

Generally speaking, we can be conservative in our CBM data interpretation policy, panicing too quickly at the slightest increase in a monitored variable. Looking solely at the issue of cost, that policy, gravitating towards the left end of the spectrum, is expensive.

On the other hand we can run our CBM program liberally, throwing caution to the winds, and setting our action limits high. That too will be costly. As a matter of fact, it will approach the cost of having no CBM program at all.

ABB, Oceana Sensor and Omdec demonstrate - Condition Based Maintenance (CBM)-Body

Some point on the risk spectrum, therefore, should correspond to lowest cost. If cost were our main consideration, then we want to set our CBM data interpretation policy at that point.

Availability

We know, however, that cost may not be the only consideration. Availability can be more important than cost in a production constrained environment. And, in general, maximum availability does not coincide with minimum cost.

Reliability

But cost and availability may not suffice to specify our policy for an asset in its operating context. We may require to specify a survival probability during an upcoming mission. A CBM interpretation policy that targets survival probability as an objective, will not, generally, coincide, with our maximum availability or with minimum cost CBM policies. The 64$ question is, then, given our current operating context (our policy objectives) “where on the risk spectrum do we want to operate?”.

Data Interpretation Policy

And the 65$ question is “How do we set our potential failure declaration rule to respect that policy?

We usually need help to answer those two questions. Help can take the form of an agent. An agent applies our stated objectives to the current data set. It interprets (formulates decisions based on) the CBM data in accordance with our objectives.

The agent could be a computer algorithm, or a human applying a set of rules based on experience or written down in a standard operating procedure. It monitors data from the CBM and other operating databases. It takes into account maintenance and process events that have occurred recently. With due consideration of all relevant factors, it issues the "right" decision.

Now we are getting closer to defining what we wanted our demonstration to prove.

ABB, Oceana Sensor and Omdec demonstrate - Condition Based Maintenance (CBM)-Body-5

Here is the demo hardware. An obvious challenge to building a CBM demo is the design of the failure mechanism. On one hand we wanted the failure to behave realistically. On the other, we did not wish to injure anyone or cause excessive damage.

After considerable trial and error, we decided upon the failure mechanism in the photo on the far right.

ABB, Oceana Sensor and Omdec demonstrate - Condition Based Maintenance (CBM)-Body-4

The tee component is held in position by friction applied by the perpendicular compression spring. The friction force resists the load exerted by the lateral extension spring.

Failure in this case is the loss of the function: “to hold the tee in place against a spring force in the presence of vibrations induced by an unbalanced rotor”.

ABB, Oceana Sensor and Omdec demonstrate - Condition Based Maintenance (CBM)-Body-3

At failure the tee strikes the golf ball sending it down the ramp where it activates a switch indicating that functional failure has occurred. To initiate a test run the operator inserts the tee into the spring-loaded holding device and places the golf ball into position. He powers the motor drive, and hits.

At each CBM inspection, a decision is displayed (blue box) as an updated residual life estimate (“0” in this case) and optimal recommendation (“replace immediately” in this case).

The white box continuously updates the KPIs that evaluate the effectiveness of the active CBM policy.

The graph on the top right is the hazard rate graph. The hazard rate graph, also known as the conditional probability of failure graph (hazard is proportional to the conditional probability) tells us what the probability of failure is at the time during the life cycle when you ask that question. The dark blue curve is the hazard rate for both potential and functional failures. The pink curve represents the potential failures alone. The closeness of these two curves tells us that the CBM policy applied by the agent is effective. That is, our CBM policy is detecting most potential failures in time to pre-empt functional failures.

This is exactly what we want our CBM policy and program to do. But at what cost? We could be operating way to the left on the risk spectrum, that you will recall from slide 4.Shifting our attention back to the KPI report, we note the total cost of maintenance (both preventive and reactive) per unit of working age (or per unit of productions). We also read the savings compared to a laissez-faire policy.

ABB, Oceana Sensor and Omdec demonstrate - Condition Based Maintenance (CBM)-Body-2

Additionally we note the savings compared to some original policy (prior to adopting the current model). These KPIs give us some idea of how optimal our CBM policy is. We'll pursue that theme in a moment on slide 8.

But before we do, look at the bar chart on the bottom right. It tells us how good our residual life estimates have been. At every inspection the agent produces an estimated time to failure (functional or potential). If we plot the errors in those estimates for all inspections we get the histogram shown. It tells us that most (412) estimates were less than 10% off. Approximately 5 % of our estimates were as much as 80% off. The point is, that we have demonstrated a method for evaluating the accuracy, or predictability of our CBM program (also known as a “predictive” maintenance program). All CBM programs should monitor their own effectiveness.

How does the EXAKT predictive agent work. It applies a model (called a proportional hazard model) that was constructed from past patterns of data. When a functional or potential failure occurred, it was recorded. Those events were correlated with monitored CBM data to provide a risk model. A transition probability analysis provides a predictive model of the monitored variables. Finally, economic considerations are blended into the model. A model is a yardstick that assists users in making decisions.

In this case our “yardstick” is the green, yellow, and red graph of Slide 8. The position and shape of the boundaries between “good” and “bad” depends on failure probabilities, on cost or repair time proactive-to-reactive ratios, and on our organizational objectives for the asset.

In this case the objectives have been set to minimize cost. We see that the cost of applying the proposed model is 65 cents, which is a savings of 51% over some previous policy.

However, nothing is free. Look at the mean-time-to-replacement column. It is only 1791 compared to 3944. This means that we have achieved lower cost at the price of more frequent maintenance. That policy is optimum, but only with respect to the objective of cost.

The point is that the model allows us to set whatever compromise, among cost, availability, and reliability appropriate to our current operating context. We may, thereafter, operate (or have our agent operate) our CBM program accordingly.

Where does an optimal decision model come from? It comes, usually, from the data of past experience. The CMMS historical database holds valuable data. This is particularly true when work orders have been documented by the technician using the principles of "living" RCM.

Here is an example of a well documented work order. Notice the inclusion of the 5 RCM knowledge elements:

  1. What function was lost, compromised or threatened?
  2. In what way - (including whether the loss of function was total, partial, or potential)?
  3. What was the cause (at a practical depth in the causality chain)?
  4. What happened - (the sequence of relevant events preceding, during, and following the failure)? and
  5. How did/could it matter (consequences to the user/owner/society)?
These knowledge elements are key to good data for reliability analysis.


Notice the field RCMREF. It has been filled with a number that refers to a record in the RCM knowledge base. This work order, then, is an instance of an already known item-function-failure-cause. Hence the values for the 5 knowledge elements may be referenced rather than duplicated on the work order form.


Work orders, documented using the principles of living RCM, permit EXAKT’s work order processor (EWOP) to generate an Events table such as this one. Note that each event is an instance of an RCM record. Reliability engineers and Weibull experts among you will quickly recognize this table as the ideal fuel of reliability analysis (RA).

Note the Event code, e.g. E836FF. “E” denotes that this is an “ending” event. “836” refers to a record in the knowledge base for an item-function-failure-cause. And “FF” means that this was a “functional” failure. Hence the failure “code” contains the precise information required for RA. Contrast this method with the failure codes used in the drop down lists of the typical CMMS work order. Data compiled from such lists provide little value for RA.


Work orders, documented using the principles of living RCM, permit EXAKT’s work order processor (EWOP) to generate an Events table such as this one. Note that each event is an instance of an RCM record. Reliability engineers and Weibull experts among you will quickly recognize this table as the ideal fuel of reliability analysis (RA).

Note the Event code, e.g. E836FF. “E” denotes that this is an “ending” event. “836” refers to a record in the knowledge base for an item-function-failure-cause. And “FF” means that this was a “functional” failure. Hence the failure “code” contains the precise information required for RA. Contrast this method with the failure codes used in the drop down lists of the typical CMMS work order. Data compiled from such lists provide little value for RA.

The EWOP generates the Events table. The Events table provides the data in a perfect form ready for any type of reliability analysis software or procedure – for example, Weibull, EXAKT, Jackknife, Pareto, and many others.

Further, the EWOP appends new knowledge to the RCM knowledge base. Where a technician has discovered an item-function-failure-cause that the original RCM FMEA analysis had not anticipated, he performs that FMEA directly within the work order, on the spot when the relevant observations are immediately before him. The EWOP transfers that knowledge from the work order to the RCM table. A supervisor, a reliability engineer, or anyone versed in the RCM “language” and with adequate knowledge of the equipment, audits the new records. He discusses the proposed new record with the originator, so that a conscensus is reached on the content and level of detail appropriate.