|
|
|||||||||
Advanced Residual
The
purpose of this document is to provide a high level conceptual approach for
constructing an Advanced Residual Life Estimation Platform (ARLEP) for
aircraft engines operating in the field. Correlating the engine’s test cell
results to sustainment in the field provides the foundation for implementing Performance-Based
Maintenance (PBM) strategies on engines while still in the depot.
Performance based maintenance will lower unscheduled engine removal rates in
the field and increase the likelihood of meeting depot performance warranties
with minimal work scopes and parts replacement or refurbishments.
Low performance is the leading
cause of engine failure in the field and the second leading cause of engine
removals for any reason.[1] Depots have to guarantee that these repaired/overhauled
engines and modules will indeed sustain their performance in the field for at
least the warranty period duration. Tests on engines and modules are executed
in test cells at the depots to ensure their performance sustainment in the
field. Engines and modules in test cells must
perform
better than a predefined set of minimum standards/thresholds before being
‘certified’ as serviceable in order that may be reintroduced back into the
supply system. Engine/modules that fail the test are rejected and will require
additional rework before being tested again.
The test cells’ minimal thresholds for engine/module
performance are based on average performance degradation curves provided by the
OEM. As engines/modules are used in the field, degradation occurs over time as
illustrated (blue curve) in Figure 1. A minimal
threshold is set to a performance value. It is anticipated, from this
performance value, that the unit in the field will not reach the minimal performance
level in a period shorter than the warranty period. However, in
reality many unscheduled removal events occur – significantly more than
expected – in which the cause of failure was loss of performance. This
phenomenon is the basis for the notion that the actual performance curve (purple curve) degrades
in time in a more rapid manner than the design OEM curve suggests. But such a
conclusion isnot necessarily correct. Another possible explanation may be that
the performance sustainment of the engine is not a linear function of the
various performance metrics and readings.
That is to say, higher vales of all performance measurements in a
tested engine do not necessarily translate to longer periods in which the
engines will sustain their performance in the field.
Regardless of the reason (for poor predictabilty of depot
testing) maintenance personnel in the depot require a tool that will predict
engine performance sustainment periods with high fidelity. Prediction from test
cell measurements should be based on
actual field data. Such a tool will provide the means to plan the scope of
repairs/overhauls that increase the likelihood of meeting depot performance
warranties. Ultimately it will lead
lead to lower unscheduled engine removals rates.
Cox’s proportional hazard model (PHM) provides an
advanced data analysis technique for incorporating engines’ test cell
measurements into performance sustainment period predictions of the engine in
the field. Cox’s PHM model is considered the best practical methodology in
providing long-term equipment failure predictions and handling the complicated
nature of failure modes. PHM is gaining recognition as the most appropriate
model that is able to incorporate operational conditions, sensory data, and
external conditions in reliability analysis or life estimation of
equipment.
In
short, the PHM model will provide probability density distributions for each
failure mode identified by the FMECA analysis (Failure Modes, Effects and
Criticality Analysis) as being a cause for inducting engines from the field to
the depot for a major overhaul. Performance sustainment predictions of the
highest-level assembly i.e. the engines, will be obtained by simulating the
engine in an AT-LAST[2] module.
Implementation
will consists of two phases. At the end of the first phase – baseline
predictions - a performance sustainment estimation platform will be
deployed. The platform will provide sustainment predictions of engines given
their age, and the ages of their modules and components. In the second phase – advanced
predictions – test cell measurements of engines will be incorporated into
performance sustainment predictions.
The tasks associated in each phase are illustrated in Figure 2,
and are elaborated in the following sections.
Figure 2: Phased development
plan
The development plan consists of
the following stages:
a) Failure Modes, Effects & Criticality Analysis (FMECA)
b) Data Mining and Cleansing
c) Weibull/Lognormal Failure Data Analysis
d) AT-LAST Simulation
a) Failure Modes, Effects & Criticality Analysis
(FMECA): FMECA report provides the
framework for collecting data and provides meaning to the data. The FMECA
analysis using the reliability-centered maintenance (RCM) process shall link an
item in the engine’s hierarchical structure with the failure modes associated
with loss of performance. The FMECA
analysis will be conducted only for T700 Series engines.
The FMECA report includes:
i) Bill of materials
(BOM) – Engine structure down to part level including technological
upgrades. The engine’s bill of materials is organized as hierarchical breakdown
structures. An engine is broken down to modules, modules to parts and parts to
sub-parts.
ii) List of loss-of-performance failure modes by
hierarchical structure – A list comprising all failure modes that are
characterized by ‘loss of performance’ symptoms (symptoms that can be
physically measured), where the repair of these failure modes requires an
overhaul at the depot. These failure modes are related to hierarchical structure – e.g. engine level
failure modes, module-specific failure modes, etc.
FMECA
analysis must be conducted with the close collaboration of subject matter
experts on T700 series engines.
b) Data Mining and Cleansing: Statistical failure
data analysis requires the gathering of historical failure (and censored) data
of similar equipment. Experience shows that it is very common that data is
missing, contains errors, or is incomplete. This historical failure data must
be preprocessed to ensure coherency and correctness. Preprocessing helps the
user validate the data and perform corrections. It may involve graphical and
statistical analysis through the use of a variety of plots and through the
calculation of basic statistical parameters. The objective of this effort is to
generate sets of ‘engine histories’ –i.e. times failures (and suspensions)
associated with removal causes.
c) Weibull/Lognormal Failure Data Analysis: The
Weibull distribution, due to its flexibility of representing various behaviors
of the propensity for failure, is the most common age dependent life
distribution in reliability engineering. The second most common distribution is
the lognormal distribution often used to represent a two-stage failure mode
such as fatigue in which a long ‘incubation’ period is followed by a rapid
progression of the failure.
The two most common techniques for estimating
distribution parameters are (i) rank regression, in which failure (and
suspension) times are ranked by increasing order along with probability
estimates, followed by linear/non-linear regression for finding the line or
curve that best fits (describes) the data points, and (ii) maximum likelihood
estimation (MLE), in which it is assumed that the observed outcomes are the
most likely set of outcomes. A likelihood function is constructed that
measures the probability of obtaining the observed outcomes as a function of
explicit distribution parameters. The estimated parameters are the distribution
parameters that maximize the likelihood function. Statisticians prefer the MLE
method, especially in cases where heavily censored data sets are expected. The
MLE method incorporates the exact times of censored data in the analysis and
not just their order as in rank regression methods.
Another important element in failure data analysis is the
confidence attributed to the estimates. Foe example the MLE parameter
estimation procedure includes the calculation of standard errors. The standard
error of an estimate shows how precise an estimate is. Larger standard errors
mean less precise estimation and less confidence that the estimated parameters
actually represent the real values. The standard error depends on the sample
size. One of the properties of MLE estimators is that they are asymptotically
normal and they can be obtained through Fisher Matrix Bounds.
d) AT-LAST Simulation: Monte Carlo techniques will
be employed to produce expected performance sustainment period distributions
curves of engines (see Figure 3).
An accurate estimation of the performance sustainment distribution curve
of a specific engine will require the initialization of all the components and
failure modes in the specific engine to their current ages and to the damage
that each of the failure modes have accumulated (i.e. cumulative hazard). SPAR
technology is uniquely capable of initializing a model with any set of values
of ages and cumulative damage.
Figure
3: Performance sustainment distribution curves
The
development plan of phase II consists of the following stages:
a) Advanced FMECA
b) Test Cell Data Mining and Cleansing
c) PHM Failure Data Analysis
d) Advanced AT-LAST Simulation
a) Advanced
FMECA: The objective is to associate a given test cell measurement with the
phase I failure modes. Any measurement that can be either considered as a
driver (or a cause) of the deterioration or the progression of a failure mode,
or that reflects (or affects) the deterioration or progression of a failure
mode, should be listed and considered. The aim is to narrow down the list of
plausible covariates, although the actual significant covariates will be
determined in the 3rd stage (PHM analysis).
b) Test
Cells Data Mining and Cleansing: The objective is to generate historical
comprehensive sets of test cell measurements by engine serial number that will
include any test cell measurements that may be significant covariates. Ideally,
a set of measurements for every engine will consist of identical categories.
Nevertheless it is frequently necessary to deal with incomplete data. If a
record lacks one or more covariate values, it is considered incomplete. But the
record may still be used in the MLE estimation. If one uses the
Expectation-Maximization (EM) algorithm, the estimation procedure takes into
account all possible values of missing data (with their probabilities) to find
the most likely values. The EM algorithm is a general method of finding the
maximum-likelihood estimate of parameters in an underlying distribution from a
given data set when the data is incomplete or has missing values.
![]()
The two most common techniques for estimating the PHM
coefficients is (i) Cox’ partial likelihood method that estimates the
coefficients without making any assumptions about the form of the base hazard
and (ii) Maximum likelihood estimation (MLE) method in which an explicit form
for base hazard (e.g. Weibull hazard) is assumed. (See appendix for more
mathematical details in both methods)
An
integral part of the PHM analysis is the systematic and scientific
discrimination between the significant covariates and non-influential data.
Although there is no straightforward procedure to identify significant
covariates, the significance of the covariates can be determined through the
analysis of various statistics. Several standard statistical tests are
available to assist the modeler in identifying significant covariates (See
appendix for more details).
Another important challenge in PHM analysis that will
require attention is covariate correlation. It should be expected that test
cell readings would be highly correlated. Incorporating highly correlated test
cell readings as possible covariates in the PHM model contradict the assumption
that the covariates are independent. The standard errors calculated for correlated
covariates in MLE method will be relatively large, indicating an inaccurate
model. One method to address this issue
is to transform the data space by using a technique such as Principal Components Analysis (PCA) or Partial Least Squares (PLS). PCA can be
used to transform the covariates into principal components that are
independent. Thus a more accurate model that uses these transformed covariates
may be obtained. PCA retains and uses
all of the available information in the data. An alternative method might
result in a somewhat arbitrary elimination of useful covariates and thus a loss
of information. (For more details on PCA see appendix).
Frequently, historical failure data is not in abundance
nor was is it collected meticulously. A commonly used practice (when data is
scarce) is to set some parameters to prior believed values and then to
estimate the other parameter using the MLE procedure. Integrating Bayesian
statistics into the MLE analysis makes it possible to specify prior
knowledge or belief
(obtained for example from physical models) in the PHM parameter
estimation. Prior knowledge can be described through fixing certain parameters and estimating the other parameters
using PHM MLE or by specifying prior distributions of certain parameters
e.g. maintenance factors and by analyzing the empirical data using PHM MLE
obtaining more credible posterior distributions (or estimates) of these
parameters (if our prior knowledge or beliefs are indeed
appropriate). See appendix for more details on Bayesian statistics.
d) Advanced AT-LAST Simulation: The Phase I
simulation model will be enhanced and shall incorporate test cell data as
covariates in PHM distributions. The form of the expected performance
sustainment period distribution curve may not look different from Phase I
curves but the incorporating of test cell data improves the accuracy of the
projections as depicted in Figure 4.
Figure 4: Accurate Projections
The MLE method assumes that the
observed outcomes are the most likely set of outcomes. The likelihood function
measures the probability of obtaining the observed outcomes as a function of
explicit distribution parameters. The estimated parameters are the distribution
parameters that maximize the likelihood function. The general likelihood
function with right-censored data (i.e. with r failures and n-r right censored
points) takes the form:

(Eq: 1 Right Censored
Likelihood)
Where:
T1,T2,,,Tr = failure times
Tr+1, Tr+2,,,Tn
= suspension times
Specifically for Weibull PHM, it
is assumed that the hazard function takes the form:

(Eq: 2 PHM Weibull Hazard
Function)
In substituting (2) into (1) the
likelihood function takes the form:

(Eq: 3 PHM Weibull Likelihood)
Cox partial likelihood function
is the product over all failure times of the conditional probability of failure
of the item that actually failed at time ti. That is, if Ri
is the set of indexes of items not failed and not suspended just before time ti
then:

(Eq: 4 Cox Partial Likelihood
For Single Event)
This is the required conditional
probability for time ti. The partial likelihood function is the
product of terms Li being taken over all failure times:

(Eq: 5 Cox Partial Likelihood )
where
à is the set of indices of
failed items and ti is the time of failure of item iÎÃ. Assuming a proportional hazard model (i.e.
) then the partial likelihood
function takes the form:

(Eq: 6 Cox PHM Partial
Likelihood)
By
maximizing the partial likelihood with respect to the coefficients g, it is possible to estimate the effects of the
covariates without making any assumptions about the form of the base hazard h0(t).
Several
standard statistical tests are available to assist the modeler in identifying
significant covariates. The Wald test can be used to check various hypotheses
of interest about the parameters. The test checks whether the difference
between an assumed and estimated parameter value is significant or not by
reporting an appropriate p-value. If the p-value is small then the assumed
parameter can be rejected. The Wald test is conducted on the shape parameter
(the hypothesis that b=1
is tested i.e. and if the p-value is small then the hypothesis that the working
age is not an important variable is rejected) and all the covariates
coefficients (hypothesis that gi=0 is tested
and if p-value is small then the hypothesis that covariate zi(t) is
insignificant is rejected).
Another technique for comparing models is to check
whether a simpler sub-model can replace a more complicated one by using the
chi-squared test based on the deviance change. The deviance is a numerical
value obtained for every sub-model during the estimation procedure. The basic
sub-model has the smallest deviance. The difference between the basic sub-model
deviance and the deviance of another sub-model is called the deviance change,
and is used for testing the hypothesis that two sub-models are statistically
equivalent. For every deviance change, a p-value is calculated. For the basic
sub-model, the deviance change = 0, and the p-value = 1, by definition. If the
p-value for a sub-model is small, then this sub-model is considered not good
enough to replace the basic one. If two non-basic sub-models are compared, then
the one with the higher p-value can be considered as the one that better
represents the data.
The method of Cox-generalized residual can be applied to
test for evidence that the data points are well represented by the Weibull PHM
model. Residuals (i.e. the cumulative
hazards) are calculated for every observed failure or suspension.
Kolmogorov-Smirnov test (K-S test) checks whether the residuals themselves
follow statistically a negative exponential distribution as would be expected
if the model fits the data. The test calculates the distance between the
theoretical exponential distribution, and the distribution estimated from the
residuals (adjusted for suspensions) and reports a p-value. If the p-value is
small then the hypothesis that the model does not fit the data well can be
rejected.
Principal Components Analysis
(PCA)
The
objective of PCA is to calculate a set of principal components or “artificial
variables”,
, that contain the same
information as a set of “original” variables,
, but which are
uncorrelated. This is accomplished
(assuming r original variables and r artificial variables) by
using the following linear transformation:
![]()
(Eq: 7 Definition of
artificial variable)
The vectors gj = (g1j, g2j, .., grj) for j=1, 2,
…,r are calculated by solving the following equations:

(Eq: 8
Equations for calculating vector gj)
Where:
X is the k x r matrix of original data with
samples.
X’, g’j are transposed
of matrix X and of vector gj respectively.
l1, …, lr are the r eigenvalues of the correlation matrix X’X.
It can be shown that the wj are independent,
or uncorrelated, and that the wj corresponding to the largest lj explains the
largest proportion of the variation in the original data set. The wj
corresponding to the second largest eigenvalue explains the next largest
proportion of variation and so on. If
the original data were correlated, fewer w-‘s, artificial variables, than x-‘s,
original variables, can be used to accurately describe the original data set
because each subsequent or smaller eigenvalue corresponds to less and less
information. The number of artificial
variables used is limited to those that explain a significant amount of
variation.
Once the set of artificial variables is selected, these
variables are used as covariates in the PHM analysis rather than the original
variables.
Extensions to this analysis that may be applicable in
this work include the inclusion of discrete variables along with continuous
sensor data and considering sensor-time profiles rather than single point
values. These extensions have been
discussed in the literature.
The
Bayesian approach assumes that the data is known and q (the distribution
parameters of the entire population) are random variables. q has a distribution of possible values and the observed
data provides evidence for different values. The classical statistical approach
(‘frequentist approach’) assumes that the parameters of the entire
population are fixed but unknown, and data is a random sample from that
population. In Bayesian estimation we have a distribution p(q) over possible values for q which is called
the prior distribution. By analyzing the data D, we can update our
beliefs to take into account the observed data. This leads to a distribution p(q|D) over all possible values for q given D which is called the posterior
distribution. Baye’s theorem is used to obtain the posterior:
![]()
![]()
![]()
(Eq: 10 Likelihood
and Posterior)
Jardine A.K.S.,
Anderson P. M. and Mann D. S., ‘Application of the Weibull Proportional Hazard
Model to Aircraft and Marine Engine Failure Data’, Quality & Reliability
Engineering International, 5, 77-82 (1987)
EXAKT v. 3.00
Guide and Manual, CBM laboratory,
Department of Mechanical Industrial Engineering, University of Toronto (2001)
Kalbfleisch J.D.
and Prentice R. L., The Statistical Analysis of Failure Data, Wiley,
1980.
Lawless J. F.,
Statistical Models and Methods for Lifetime Data, Wiley, 1982.
MacGregor, John
F., et al., Multivariate Statistical
Process Control Of Batch Processes Using PCA and PLS, IFAC ADCHEM 1994
Preprints, May 4-27, Kyoto, Japan.
Dubi A., Monte
Carlo Applications in Systems Engineering, Wiley, 1999.
Dubi A. and
Goldfeld A., SPAR v. 5.0 User Guide, Department of Nuclear Engineering,
Ben Gurion University of the Negev (2002).
[1] Control exchange is the leading reason for removal that
do not lead to engine maintenance.
[2] An aircraft engine specialized SPAR Monte Carlo Simulation
software process
|