OMDEC | Optimal Maintenance Decisions Inc.

Advanced Residual Life Estimation for Aircraft  Engines


By Naaman Gurvitz

Clockwork Solutions

I. Objective

The purpose of this document is to provide a high level conceptual approach for constructing an Advanced Residual Life Estimation Platform (ARLEP) for aircraft engines operating in the field. Correlating the engine’s test cell results to sustainment in the field provides the foundation for implementing Performance-Based Maintenance (PBM) strategies on engines while still in the depot. Performance based maintenance will lower unscheduled engine removal rates in the field and increase the likelihood of meeting depot performance warranties with minimal work scopes and parts replacement or refurbishments. 

         

II. Background

Low performance is the leading cause of engine failure in the field and the second leading cause of engine removals for any reason.[1] Depots have to guarantee that these repaired/overhauled engines and modules will indeed sustain their performance in the field for at least the warranty period duration. Tests on engines and modules are executed in test cells at the depots to ensure their performance sustainment in the field. Engines and modules in test cells must

                                    Figure 1: Performance vs. Flight Hours


perform better than a predefined set of minimum standards/thresholds before being ‘certified’ as serviceable in order that may be reintroduced back into the supply system. Engine/modules that fail the test are rejected and will require additional rework before being tested again.

The test cells’ minimal thresholds for engine/module performance are based on average performance degradation curves provided by the OEM. As engines/modules are used in the field, degradation occurs over time as illustrated (blue curve) in Figure 1. A minimal threshold is set to a performance value. It is anticipated, from this performance value, that the unit in the field will not reach the minimal performance level  in a period shorter  than the warranty period. However, in reality many unscheduled removal events occur – significantly more than expected – in which the cause of failure was loss of performance. This phenomenon is the basis for the notion that the actual performance curve (purple curve) degrades in time in a more rapid manner than the design OEM curve suggests. But such a conclusion isnot necessarily correct. Another possible explanation may be that the performance sustainment of the engine is not a linear function of the various performance metrics and readings.  That is to say, higher vales of all performance measurements in a tested engine do not necessarily translate to longer periods in which the engines will sustain their performance in the field.

Regardless of the reason (for poor predictabilty of depot testing) maintenance personnel in the depot require a tool that will predict engine performance sustainment periods with high fidelity. Prediction from test cell measurements should be  based on actual field data. Such a tool will provide the means to plan the scope of repairs/overhauls that increase the likelihood of meeting depot performance warranties.  Ultimately it will lead lead to lower unscheduled engine removals rates.  

 III.  Phased Approach

Cox’s proportional hazard model (PHM) provides an advanced data analysis technique for incorporating engines’ test cell measurements into performance sustainment period predictions of the engine in the field. Cox’s PHM model is considered the best practical methodology in providing long-term equipment failure predictions and handling the complicated nature of failure modes. PHM is gaining recognition as the most appropriate model that is able to incorporate operational conditions, sensory data, and external conditions in reliability analysis or life estimation of equipment. 

In short, the PHM model will provide probability density distributions for each failure mode identified by the FMECA analysis (Failure Modes, Effects and Criticality Analysis) as being a cause for inducting engines from the field to the depot for a major overhaul. Performance sustainment predictions of the highest-level assembly i.e. the engines, will be obtained by simulating the engine in an AT-LAST[2] module.

Implementation will consists of two phases. At the end of the first phase – baseline predictions - a performance sustainment estimation platform will be deployed. The platform will provide sustainment predictions of engines given their age, and the ages of their modules and components.  In the second phase – advanced predictions – test cell measurements of engines will be incorporated into performance sustainment predictions.  The tasks associated in each phase are illustrated in Figure 2, and are elaborated in the following sections.   

-->

Figure 2: Phased development plan

 

IV. Phase I: Baseline Predictions

The development plan consists of the following stages:

 

a)      Failure Modes, Effects & Criticality Analysis (FMECA)

b)      Data Mining and Cleansing

c)      Weibull/Lognormal Failure Data Analysis

d)      AT-LAST Simulation

 

a) Failure Modes, Effects & Criticality Analysis (FMECA):  FMECA report provides the framework for collecting data and provides meaning to the data. The FMECA analysis using the reliability-centered maintenance (RCM) process shall link an item in the engine’s hierarchical structure with the failure modes associated with loss of performance.  The FMECA analysis will be conducted only for T700 Series engines.

The FMECA report includes:

i) Bill of materials (BOM) – Engine structure down to part level including technological upgrades. The engine’s bill of materials is organized as hierarchical breakdown structures. An engine is broken down to modules, modules to parts and parts to sub-parts.

ii) List of loss-of-performance failure modes by hierarchical structure – A list comprising all failure modes that are characterized by ‘loss of performance’ symptoms (symptoms that can be physically measured), where the repair of these failure modes requires an overhaul at the depot. These failure modes are related to  hierarchical structure – e.g. engine level failure modes, module-specific failure modes, etc.

FMECA analysis must be conducted with the close collaboration of subject matter experts on T700 series engines.

b) Data Mining and Cleansing: Statistical failure data analysis requires the gathering of historical failure (and censored) data of similar equipment. Experience shows that it is very common that data is missing, contains errors, or is incomplete. This historical failure data must be preprocessed to ensure coherency and correctness. Preprocessing helps the user validate the data and perform corrections. It may involve graphical and statistical analysis through the use of a variety of plots and through the calculation of basic statistical parameters. The objective of this effort is to generate sets of ‘engine histories’ –i.e. times failures (and suspensions) associated with removal causes. 

c) Weibull/Lognormal Failure Data Analysis: The Weibull distribution, due to its flexibility of representing various behaviors of the propensity for failure, is the most common age dependent life distribution in reliability engineering. The second most common distribution is the lognormal distribution often used to represent a two-stage failure mode such as fatigue in which a long ‘incubation’ period is followed by a rapid progression of the failure.

The two most common techniques for estimating distribution parameters are (i) rank regression, in which failure (and suspension) times are ranked by increasing order along with probability estimates, followed by linear/non-linear regression for finding the line or curve that best fits (describes) the data points, and (ii) maximum likelihood estimation (MLE), in which it is assumed that the observed outcomes are the most likely set of outcomes. A likelihood function is constructed that measures the probability of obtaining the observed outcomes as a function of explicit distribution parameters. The estimated parameters are the distribution parameters that maximize the likelihood function. Statisticians prefer the MLE method, especially in cases where heavily censored data sets are expected. The MLE method incorporates the exact times of censored data in the analysis and not just their order as in rank regression methods.

Another important element in failure data analysis is the confidence attributed to the estimates. Foe example the MLE parameter estimation procedure includes the calculation of standard errors. The standard error of an estimate shows how precise an estimate is. Larger standard errors mean less precise estimation and less confidence that the estimated parameters actually represent the real values. The standard error depends on the sample size. One of the properties of MLE estimators is that they are asymptotically normal and they can be obtained through Fisher Matrix Bounds.

d) AT-LAST Simulation: Monte Carlo techniques will be employed to produce expected performance sustainment period distributions curves of engines (see Figure 3).  An accurate estimation of the performance sustainment distribution curve of a specific engine will require the initialization of all the components and failure modes in the specific engine to their current ages and to the damage that each of the failure modes have accumulated (i.e. cumulative hazard). SPAR technology is uniquely capable of initializing a model with any set of values of ages and cumulative damage.

                                    Figure 3: Performance sustainment distribution curves

V. Phase II: Advanced Predictions

The development plan of phase II consists of the following stages:

 

a)      Advanced FMECA

b)      Test Cell Data Mining and Cleansing

c)      PHM Failure Data Analysis

d)      Advanced AT-LAST Simulation

a) Advanced FMECA: The objective is to associate a given test cell measurement with the phase I failure modes. Any measurement that can be either considered as a driver (or a cause) of the deterioration or the progression of a failure mode, or that reflects (or affects) the deterioration or progression of a failure mode, should be listed and considered. The aim is to narrow down the list of plausible covariates, although the actual significant covariates will be determined in the 3rd stage (PHM analysis).   

b) Test Cells Data Mining and Cleansing: The objective is to generate historical comprehensive sets of test cell measurements by engine serial number that will include any test cell measurements that may be significant covariates. Ideally, a set of measurements for every engine will consist of identical categories. Nevertheless it is frequently necessary to deal with incomplete data. If a record lacks one or more covariate values, it is considered incomplete. But the record may still be used in the MLE estimation. If one uses the Expectation-Maximization (EM) algorithm, the estimation procedure takes into account all possible values of missing data (with their probabilities) to find the most likely values. The EM algorithm is a general method of finding the maximum-likelihood estimate of parameters in an underlying distribution from a given data set when the data is incomplete or has missing values.

c) PHM Failure Data Analysis: The Proportional Hazards Model (PHM) was proposed by David Cox in 1972 and is a valuable statistical data analysis tool. It assumes that the hazard function is the product of a baseline hazard rate and a term containing explanatory variables. It is defined as:

 

where h0(t) is the baseline hazard function and d is the functional covariates term. Here, z is a vector of covariates and g is a vector of covariates coefficients. Covariates are concomitant factors influencing life behavior. The model is used to identify significant covariates and to quantify their effects on the propensity for failure (the hazard) as a function of covariates values and the working age (time). The objective of PHM failure data analysis is to estimate the covariates coefficients thus providing a quantitative measure of the importance of each covariate and their impact on the propensity for failure and life distributions.

The two most common techniques for estimating the PHM coefficients is (i) Cox’ partial likelihood method that estimates the coefficients without making any assumptions about the form of the base hazard and (ii) Maximum likelihood estimation (MLE) method in which an explicit form for base hazard (e.g. Weibull hazard) is assumed. (See appendix for more mathematical details in both methods)  

An integral part of the PHM analysis is the systematic and scientific discrimination between the significant covariates and non-influential data. Although there is no straightforward procedure to identify significant covariates, the significance of the covariates can be determined through the analysis of various statistics. Several standard statistical tests are available to assist the modeler in identifying significant covariates (See appendix for more details).

Another important challenge in PHM analysis that will require attention is covariate correlation. It should be expected that test cell readings would be highly correlated. Incorporating highly correlated test cell readings as possible covariates in the PHM model contradict the assumption that the covariates are independent. The standard errors calculated for correlated covariates in MLE method will be relatively large, indicating an inaccurate model.  One method to address this issue is to transform the data space by using a technique such as Principal Components Analysis (PCA) or Partial Least Squares (PLS). PCA can be used to transform the covariates into principal components that are independent. Thus a more accurate model that uses these transformed covariates may be obtained.  PCA retains and uses all of the available information in the data. An alternative method might result in a somewhat arbitrary elimination of useful covariates and thus a loss of information. (For more details on PCA see appendix).

Frequently, historical failure data is not in abundance nor was is it collected meticulously. A commonly used practice (when data is scarce) is to set some parameters to prior believed values and then to estimate the other parameter using the MLE procedure. Integrating Bayesian statistics into the MLE analysis makes it possible to specify prior knowledge or belief  (obtained for example from physical models) in the PHM parameter estimation. Prior knowledge can be described through  fixing certain parameters and estimating the other parameters using PHM MLE or by specifying prior distributions of certain parameters e.g. maintenance factors and by analyzing the empirical data using PHM MLE obtaining more credible posterior distributions (or estimates) of these parameters (if our prior knowledge or beliefs are indeed appropriate). See appendix for more details on Bayesian statistics.   

d) Advanced AT-LAST Simulation: The Phase I simulation model will be enhanced and shall incorporate test cell data as covariates in PHM distributions. The form of the expected performance sustainment period distribution curve may not look different from Phase I curves but the incorporating of test cell data improves the accuracy of the projections as depicted in Figure 4.

 

Figure 4: Accurate Projections


 

Appendix

PHM Maximum Likelihood Estimation Method

The MLE method assumes that the observed outcomes are the most likely set of outcomes. The likelihood function measures the probability of obtaining the observed outcomes as a function of explicit distribution parameters. The estimated parameters are the distribution parameters that maximize the likelihood function. The general likelihood function with right-censored data (i.e. with r failures and n-r right censored points) takes the form:

(Eq: 1 Right Censored Likelihood)

Where:

T1,T2,,,Tr   = failure times

Tr+1, Tr+2,,,Tn = suspension times

Specifically for Weibull PHM, it is assumed that the hazard function takes the form:

(Eq: 2 PHM Weibull Hazard Function)

In substituting (2) into (1) the likelihood function takes the form:

(Eq: 3 PHM Weibull Likelihood)

Cox Partial Likelihood Estimation Method

Cox partial likelihood function is the product over all failure times of the conditional probability of failure of the item that actually failed at time ti. That is, if Ri is the set of indexes of items not failed and not suspended just before time ti then:

(Eq: 4 Cox Partial Likelihood For Single Event)

This is the required conditional probability for time ti. The partial likelihood function is the product of terms Li being taken over all failure times:

(Eq: 5 Cox Partial Likelihood )

where à is the set of indices of failed items and ti is the time of failure of item iÎÃ. Assuming a proportional hazard model (i.e. ) then the partial likelihood function takes the form:

(Eq: 6 Cox PHM Partial Likelihood)

By maximizing the partial likelihood with respect to the coefficients g, it is possible to estimate the effects of the covariates without making any assumptions about the form of the base hazard h0(t).

Significant Covariates Tests

Several standard statistical tests are available to assist the modeler in identifying significant covariates. The Wald test can be used to check various hypotheses of interest about the parameters. The test checks whether the difference between an assumed and estimated parameter value is significant or not by reporting an appropriate p-value. If the p-value is small then the assumed parameter can be rejected. The Wald test is conducted on the shape parameter (the hypothesis that b=1 is tested i.e. and if the p-value is small then the hypothesis that the working age is not an important variable is rejected) and all the covariates coefficients (hypothesis that gi=0 is tested and if p-value is small then the hypothesis that covariate zi(t) is insignificant is rejected).

Another technique for comparing models is to check whether a simpler sub-model can replace a more complicated one by using the chi-squared test based on the deviance change. The deviance is a numerical value obtained for every sub-model during the estimation procedure. The basic sub-model has the smallest deviance. The difference between the basic sub-model deviance and the deviance of another sub-model is called the deviance change, and is used for testing the hypothesis that two sub-models are statistically equivalent. For every deviance change, a p-value is calculated. For the basic sub-model, the deviance change = 0, and the p-value = 1, by definition. If the p-value for a sub-model is small, then this sub-model is considered not good enough to replace the basic one. If two non-basic sub-models are compared, then the one with the higher p-value can be considered as the one that better represents the data.

The method of Cox-generalized residual can be applied to test for evidence that the data points are well represented by the Weibull PHM model.  Residuals (i.e. the cumulative hazards) are calculated for every observed failure or suspension. Kolmogorov-Smirnov test (K-S test) checks whether the residuals themselves follow statistically a negative exponential distribution as would be expected if the model fits the data. The test calculates the distance between the theoretical exponential distribution, and the distribution estimated from the residuals (adjusted for suspensions) and reports a p-value. If the p-value is small then the hypothesis that the model does not fit the data well can be rejected. 

Principal Components Analysis (PCA)

The objective of PCA is to calculate a set of principal components or “artificial variables”, , that contain the same information as a set of “original” variables, , but which are uncorrelated.  This is accomplished (assuming r original variables and r artificial variables) by using the following linear transformation:

(Eq: 7 Definition of artificial variable)

The vectors gj = (g1j, g2j, .., grj) for j=1, 2, …,r are calculated by solving the following equations:

 

(Eq: 8 Equations for calculating vector gj)

Where:
X
is the k x r matrix of original data with samples.
X’,
gj are transposed of matrix X and of vector gj respectively.
l1, …, lr are the r eigenvalues of the correlation matrix XX.

It can be shown that the wj are independent, or uncorrelated, and that the wj corresponding to the largest lj explains the largest proportion of the variation in the original data set. The wj corresponding to the second largest eigenvalue explains the next largest proportion of variation and so on.  If the original data were correlated, fewer w-‘s, artificial variables, than x-‘s, original variables, can be used to accurately describe the original data set because each subsequent or smaller eigenvalue corresponds to less and less information.  The number of artificial variables used is limited to those that explain a significant amount of variation.

Once the set of artificial variables is selected, these variables are used as covariates in the PHM analysis rather than the original variables.

Extensions to this analysis that may be applicable in this work include the inclusion of discrete variables along with continuous sensor data and considering sensor-time profiles rather than single point values.  These extensions have been discussed in the literature.

Bayesian Statistics

The Bayesian approach assumes that the data is known and q (the distribution parameters of the entire population) are random variables. q has a distribution of possible values and the observed data provides evidence for different values. The classical statistical approach (‘frequentist approach’) assumes that the parameters of the entire population are fixed but unknown, and data is a random sample from that population. In Bayesian estimation we have a distribution p(q) over possible values for q which is called the prior distribution. By analyzing the data D, we can update our beliefs to take into account the observed data. This leads to a distribution p(q|D) over all possible values for q given D which is called the posterior distribution. Baye’s theorem is used to obtain the posterior:

(Eq: 9 Bayes Theorem)

The posterior gives a distribution over parameters values. All Bayesian inference including estimation (point estimates, measures of spread, Bayesian intervals or credible sets), hypothesis testing and prediction (i.e. estimating the values of potentially observable but currently unobserved quantities) is based on the posterior distribution. For example, if we would like a point estimate, we would use the same principle as in MLE and choose the value of q which maximizes the posterior (maximum a posterior or MAP method). Noting that for a given data set D and a particular model, P(D) is constant so we can write:

 

       

   Posterior          Likelihood      Prior

(Eq: 10 Likelihood and Posterior)

However one should bear in mind that Bayesian inference is a very challenging task both computationally and otherwise. 

References:

Jardine A.K.S., Anderson P. M. and Mann D. S., ‘Application of the Weibull Proportional Hazard Model to Aircraft and Marine Engine Failure Data’, Quality & Reliability Engineering International, 5, 77-82 (1987)

EXAKT v. 3.00 Guide and Manual, CBM laboratory, Department of Mechanical Industrial Engineering, University of Toronto (2001)

Kalbfleisch J.D. and Prentice R. L., The Statistical Analysis of Failure Data, Wiley, 1980.

Lawless J. F., Statistical Models and Methods for Lifetime Data, Wiley, 1982.

MacGregor, John F., et al., Multivariate Statistical Process Control Of Batch Processes Using PCA and PLS, IFAC ADCHEM 1994 Preprints, May 4-27, Kyoto, Japan.

Dubi A., Monte Carlo Applications in Systems Engineering, Wiley, 1999.

Dubi A. and Goldfeld A., SPAR v. 5.0 User Guide, Department of Nuclear Engineering, Ben Gurion University of the Negev (2002).

 

 



[1] Control exchange is the leading reason for removal that do not lead to engine maintenance.

[2] An aircraft engine specialized SPAR Monte Carlo Simulation software process

OMDEC | Optimal Maintenance Decisions Inc.
  www.omdec.com   info@omdec.com