|
|
|||||||||
IMEC
2005 – Speaker Notes and Slides
November 2, 2005, 14:00 – 14:30
University of Toronto
Condition based Maintenance (CBM) demonstration
Presenter:
Murray
Wiseman, VP
Optimal
Maintenance Decisions (OMDEC) Inc.

Slide
1:
This
presentation describes an evolving CBM demonstration by Oceana Sensors, ABB,
and OMDEC. In building this demo, we asked ourselves:
Bear
with me, as I try to convey how the demo sheds considerable light on these
essential questions. |
|
Slide 2
We
state our objective simply. When faced with a set of operating conditions, a
set of requirements for our asset, and a set of condition monitoring data,
how, exactly, do we interpret what all this information is telling us. How do
we make our decision as to whether:
How
can we know whether our decision process leads to the best decision - the so
called optimal one? What does the word "optimally" really mean? |
|
Slide 3
Here
is what we, in developing the demo, thought it means. Please tell me if you
disagree. As our stated goal for the demo, and for CBM in general, we desire
to declare a potential failure at the right time. That
of course begs the question, “What is the right time?”. The
answer we came up with is "At the right level of risk" How
to define risk? |
|
Slide 4
Here
is a text book graph (variations of which) you have all seen. The vertical
axis quantifies the bottom-line cost
(per unit of production) of owning an asset. On the horizontal axis,
we indicate some policy (scaled to risk) for deciding when to maintain the
asset. We may consider the horizontal axis as a spectrum of varying risk.
What's risk? Most people see risk as the combination of the probability
that something is going to go wrong, and, the gravity of how bad it will be. Generally
speaking, we can be conservative in our CBM data interpretation policy,
panicing too quickly at the slightest increase in a monitored variable.
Looking solely at the issue of cost, that policy, gravitating towards the
left end of the spectrum, is expensive.
On
the other hand we can run our CBM program liberally, throwing caution to the
winds, and setting our action limits high. That too will be costly. As a
matter of fact, it will approach the cost of having no CBM program at
all. Some
point on the risk spectrum, therefore, should correspond to lowest cost. If
cost were our main consideration, then we want to set our CBM data
interpretation policy at that point. [Animation:
Availability]
We
know, however, that cost may not be the only consideration. Availability can
be more important than cost in a production constrained environment. And, in
general, maximum availability does not coincide with minimum cost. |
|
[Animation:
Reliability]
But
cost and availability may not suffice to specify our policy for an asset in
its operating context. We may require to specify a survival probability
during an upcoming mission. A CBM interpretation policy that targets survival
probability as an objective, will not, generally, coincide, with our maximum
availability or with minimum cost CBM policies. The 64$ question is, then,
given our current operating context (our policy objectives) “where on the
risk spectrum do we want to operate?”.
[Animation:
Data interpretation policy]
And
the 65$ question is “How do we set our potential failure declaration rule
to respect that policy?” |
|
Slide 5
We
usually need help to answer those two questions. Help can take the form of an
agent. An agent applies our stated objectives to the current data set.
It interprets (formulates decisions based on) the CBM data in accordance with
our objectives. The
agent could be a computer algorithm, or a human applying a set of rules based
on experience or written down in a standard operating procedure. It monitors
data from the CBM and other operating databases. It takes into account
maintenance and process events that have occurred recently. With due
consideration of all relevant factors, it issues the "right"
decision. Now
we are getting closer to defining what we wanted our demonstration to prove. |
|
Slide 6
Here
is the demo hardware. An obvious challenge to building a CBM demo is the
design of the failure mechanism. On one hand we wanted the failure to behave
realistically. On the other, we did not wish to injure anyone or cause
excessive damage. After
considerable trial and error, we decided upon the failure mechanism in the
photo on the far right. The
tee component is held in position by friction applied by the perpendicular
compression spring. The friction force resists the load exerted by the
lateral extension spring. Failure
in this case is the loss of the function: “to hold the tee in place against a
spring force in the presence of vibrations induced by an unbalanced
rotor”. At
failure the tee strikes the golf ball sending it down the ramp where it
activates a switch indicating that functional failure has occurred. To
initiate a test run the operator inserts the tee into the spring-loaded
holding device and places the golf ball into position. He powers the motor
drive, and hits |
|
|
“Begin
life cycle” on the console. An
intelligent agent (not quite as intelligent as agent Smith[1])
applies a proportional hazard model to the data in near real time, and issues
an “optimal” maintenance decision. At
that time, a more intrusive (visual) inspection will confirm an impending
(i.e. “potential”) failure. Consequently, the equipment may be stopped and PM
performed[2], thus avoiding
a functional failure – the goal of CBM. |
|
Slide 7
At
each CBM inspection, a decision is displayed (blue box) as an updated
residual life estimate (“0” in this case) and optimal recommendation
(“replace immediately” in this case).
The
white box continuously updates the KPIs that evaluate the effectiveness of
the active CBM policy. The
graph on the top right is the hazard rate graph. The hazard rate graph, also
known as the conditional probability of failure graph (hazard is proportional
to the conditional probability) tells us what the probability of failure is
at the time during the life cycle when you ask that question. The dark blue
curve is the hazard rate for both potential and functional failures. The pink
curve represents the potential failures alone. The closeness of these two
curves tells us that the CBM policy applied by the agent is effective. That
is, our CBM policy is detecting most potential failures in time to pre-empt
functional failures. This
is exactly what we want our CBM policy and program to do. But at what cost?
We could be operating way to the left on the risk spectrum, that you will
recall from slide 4. Shifting
our attention back to the KPI report, we note the total cost of maintenance
(both preventive and reactive) per unit of working age (or per unit of
productions). We also read the savings compared to a laissez-faire policy. |
|
|
Additionally
we note the savings compared to some original policy (prior to adopting the
current model). These KPIs give us some idea of how optimal our CBM
policy is. We'll pursue that theme in a moment on slide 8. But
before we do, look at the bar chart on the bottom right. It tells us how good
our residual life estimates have been. At every inspection the agent produces
an estimated time to failure (functional or potential). If we plot the errors
in those estimates for all inspections we get the histogram shown. It tells
us that most (412) estimates were less than 10% off. Approximately 5 % of our
estimates were as much as 80% off. The point is, that we have demonstrated a
method for evaluating the accuracy, or predictability of our CBM
program (also known as a “predictive” maintenance program). All CBM programs
should monitor their own effectiveness! |
|
Slide 8
How
does the EXAKT predictive agent work. It applies a model (called a
proportional hazard model) that was constructed from past patterns of data.
When a functional or potential failure occurred, it was recorded. Those
events were correlated with monitored CBM data to provide a risk model. A
transition probability analysis provides a predictive model of the monitored
variables. Finally, economic considerations are blended into the model. A
model is a yardstick that assists users in making decisions. In
this case our “yardstick” is the green, yellow, and red graph of Slide 8. The
position and shape of the boundaries between “good” and “bad” depends on
failure probabilities, on cost or repair time proactive-to-reactive ratios,
and on our organizational objectives for the asset. In
this case the objectives have been set to minimize cost. We see that the cost
of applying the proposed model is 65 cents, which is a savings of 51% over
some previous policy. However,
nothing is free. Look at the mean-time-to-replacement column. It is only 1791
compared to 3944. This means that we have achieved lower cost at the price of
more frequent maintenance. That policy is optimum, but only with
respect to the objective of cost. |
|
|
The
point is that the model allows us to set whatever compromise, among cost,
availability, and reliability appropriate to our current operating context.
We may, thereafter, operate (or have our agent operate) our CBM program
accordingly. |
|
Slide 9
Where
does an optimal decision model come from? It comes, usually, from the data of
past experience. The CMMS historical database holds valuable data. This is
particularly true when work orders have been documented by the technician
using the principles of "living" RCM. Here
is an example of a well documented work order. Notice the inclusion of the 5
RCM knowledge elements:
These
knowledge elements are key to good data for reliability analysis. |
Notice
the field RCMREF. It has been filled with a number that refers to a record in
the RCM knowledge base. This work order, then, is an instance of an already known item-function-failure-cause.
Hence the values for the 5 knowledge elements may be referenced rather than
duplicated on the work order form. |
Slide
10
Work
orders, documented using the principles of living RCM, permit EXAKT’s work
order processor (EWOP) to generate an Events table such as this one. Note
that each event is an instance of an RCM record. Reliability engineers
and Weibull experts among you will quickly recognize this table as the ideal
fuel of reliability analysis (RA). Note
the Event code, e.g. E836FF. “E” denotes that this is an “ending” event.
“836” refers to a record in the knowledge base for an
item-function-failure-cause. And “FF” means that this was a “functional”
failure. Hence the failure “code” contains the precise information required
for RA. Contrast this method with the
failure codes used in the drop down lists of the typical CMMS work order.
Data compiled from such lists provide little value for RA. |
|
Slide
11
Note,
we are making a clear distinction between reliability analysis (RA) and
reliability-centered maintenance analysis (RCMA). The former is the study of
what did happen while the latter is the study of what could
happen. We are proposing, in essence,
that, using the EWOP, the two processes cross-fertilize and populate a
common knowledge base – a RCM knowledge base. RA and RCMA, using the
methodology of the EWOP, continuously improve that knowledge base. The
process augments and refines our comprehension of the failure behavior of
each of our significant phyiscal assets. |
|
Slide 12
The
EWOP generates the Events table. The Events table provides the data in a
perfect form ready for any type of reliability analysis software or procedure
– for example, Weibull, EXAKT, Jackknife, Pareto, and many others. Further,
the EWOP appends new knowledge to the RCM knowledge base. Where a technician
has discovered an item-function-failure-cause that the original RCM FMEA
analysis had not anticipated, he performs that FMEA directly within the work
order, on the spot when the relevant observations are immediately before him.
The EWOP transfers that knowledge from the work order to the RCM table. A
supervisor, a reliability engineer, or anyone versed in the RCM “language”
and with adequate knowledge of the equipment, audits the new records. He
discusses the proposed new record with the originator, so that a conscensus is
reached on the content and level of detail appropriate. |
|
Slide
13
How
does the growth and exploitation of the knowledge base support our global
maintenance strategy. Most maintenance experts and consultants agree that
KPIs assist the physical asset manager in knowing and achieving his
targets. Let's
look at one KPI as an example, quality loss. Quality loss may be due to process
or machine malfunction. We, in maintenance, are concerned, primarily,
with the latter. [Animation:
CMMS]
We
can drill down from the KPI using our, now, well documented CMMS. [Animation:
Reliability analysis software]
With
our rich data source, we may call upon scores of reliability analysis
software tools to study and draw conclusions from our database. [Animation:
Improved maintenance policies]
The
output of those analyses and studies are, without doubt, Improved
maintenance policies. Improved polices result in improved KPIs. And so
the cycle is closed. |
|
|
Most
maintenance information systems today lack adequate emphasis on the dashed
box on the left - the CMMS historical database populated via living RCM. The
EWOP approach enables and encourages the enrichment of that important
intellectual asset. An EWOP-enabled CMMS will drive continuous reliability improvement. |
|
Slide
14
Here
is another way of expressing the continuous improvement cycle. In physical asset management we strive to
find a strategy that results in a desired performance. [Animation:
Maintenance Policy]
Our
strategy embodies our maintenance policy that includes our mix of reactive
and proactive tasks and the policies whereby we carry out those tasks (for
example, a specific CBM data interpretation policy). [Animation:
KPIs and Reliability Analysis]
We
measure the results of our maintenance policies with KPIs and drill down with
reliability analysis tools. Those KPIs should have been set to meet our
organization’s performance requirements. Those requirements fall into
two categories 1. The balance sheet, and 2. Social responsibility [Animation:
feedback loop 4]
If
we aren't meeting Performance we need to adjust our KPI targets. [Animation:
feedback loop 3]
And
if we're not meeting our KPI targets, we need to adjust our maintenance
policy. |
|
|
Understanding
the detailed nature of the relationships,
I
submit to you, is the essence of effective physical asset management. Thank
you for your attention. |
[1] The Matrix, 1999
[2] By pushing the tee back to its starting position, thus renewing the asset with respect to this failure mode.
|