Different Analysis Scenarios for Examining Repairable Systems
This article reviews the different analysis scenarios that can be used when examining the reliability of repairable systems. Five different methods will be reviewed, specifically:
The following procedure will be used to create failure data for a hypothetical system that can be analyzed with all five of these methods in order to compare the analysis results.
SimulationFor simplicity, we have chosen to use a race car as our hypothetical system. The system is broken down to two major subsystems: The Front Assembly is composed of:
The Rear Assembly is composed of:
Table 1 shows the failure distributions and corresponding parameters that were set to represent the "true" reliability behavior of these components. Figure 1 shows the RBDs for the system. Table 1: "True" failure distributions for the components in the system
Figure 1: RBDs for the system Three simulations were performed based on the RBD shown in Figure 1 and the distributions given in Table 1. The mission time of the first simulation was 2500 km, the second was 1976 km and the third was 800 km. In addition, and to make the simulation more realistic, preventive maintenance was performed on the brakes every 305 km. The RBD is intended to represent an actual system that operates in the field and each of the three simulations represents one fielded system. The distributions that will generate the failures are known, but we will pretend that we do not know them and try to estimate the reliability using the five analysis methods mentioned in the introduction. The preventive maintenance policy was added in order to replicate a more realistic scenario of systems operating in the field. Table 2 shows the events obtained from the three simulations. As shown in this table, the event time was recorded along with the corresponding component that initiated the event. (Note that the event times are in a cumulative scale.) We will now assume that this is all we know about the design and that each simulation represents a system in operation, where System 1 has operated for 2500 km so far, System 2 for 1976 km and System 3 for 800 km. Table 2: Simulation results Approach #1: Using the Mean Value of the System's Times Between Failure (TBFs)With the first analysis approach, the objective is to estimate the MTBF of the design (system) and, based on this estimation, possibly make predictions about future events. Under this model, we look only at the times between failure (TBFs) for each system, which are shown in Table 3. In this table, all of the preventive maintenance events have been removed, since they do not represent failures. Table 3: Times between failures (TBFs) for each system
There are two ways to utilize the TBF data of Table 3 in order to obtain an MTBF. The first is to simply sum all the system ages and divide by the total number of events, or:
where:
The MTBF calculated using this approach for this data set is 329.75 km. However, this result could be very misleading, since it assumes a random failure behavior (i.e. constant rate of occurrence of events, aka failure rate). If the system exhibits an aging pattern (wearout) or an infant mortality pattern, this equation will average out all of the TBFs and the mean could be overestimated in the case of wearout or underestimated in the case of infant mortality. Obtaining the MTBF from the distribution of the TBFs would provide a better estimator. Under this approach, a distribution is fitted to the TBFs that will represent their behavior. The data set for this example is entered in the Weibull++ software and Figure 2 shows the analysis results. Notice that there is one suspension for each system, which is the time between the last event and the current age of the system.
Figure 2: Analyzing the mean value of the TBFs in Weibull++ Under this analysis, the best-fit distribution is the Weibull with beta = 1.10 and eta = 337 km. Based on this distribution, the MTBF can be calculated with the Weibull++ QCP and is found to be MTBF = 324.5 km. This estimate is preferred over the one obtained using Eqn. (1) because the behavior of the TBFs is considered and the estimate is based on a best-fit model rather than assuming a constant rate of occurrence of failures (ROCOF). (Note that a constant ROCOF is similar to assuming an exponential distribution or, more correctly, a homogeneous Poisson process.) However, caution must be used when selecting this approach, because even though this analysis represents a better estimate than the one given by Eqn. (1), it is still an average, and as it will be shown later, averages work well only when sufficient data are present AND when systems have reached a "steady state" AND when predicting future events. A second caution is regarding the misuse of this approach. In many cases, analysts misinterpret this model as being the failure distribution of the system, and they perform additional estimations, such as reliability, BX calculations, etc. These types of results are incorrect since the model simply describes the behavior of the TBFs and, in essence, it is a model of the MTBF. In addition to the MTBF, we also can use this model to obtain, for example, what percentage of the TBFs falls within a given time range, but this does not represent a reliability/unreliability calculation. For example, if we compute P(t = 200 km) for this model, we get 43%. This does not mean that the probability of failure of the system is 43%. Rather, it means that 43% of the TBFs were in the order of 200 km or less. Obviously, this is very different from a reliability/unreliability statement. Also notice that this statement is invariant of the chronological order of the TBFs and the 43% of the TBFs that are less than or equal to 200 km could have occurred at the beginning of the life of the system, at the latter stages, or just randomly. All we can tell from this model is that we expect 43% of the TBFs to be less than or equal to 200 km. The following graphic demonstrates this point. It depicts a chronological order of the failure events of a system. In the graphic, Ti represents the cumulative time to event and ti represents the times between events. In addition, the vertical line represents the time when the system has accumulated 200 km of operation and all of the times between events that are less than or equal to 200 km are contained within the circle.
If we were to estimate the reliability
at
200 km,
then it would be defined as the probability that the system will
operate for
200 km
without a failure. It easily can be seen from the graph that this
is different from the percentage of TBFs whose order of magnitude
is less than or equal to
200
km (circled events). The percentage of events
included in this circle is what was calculated previously to be
43%. Therefore,
we can conclude that reliability predictions are not valid with
this model. However, the model can be used in order to predict the
expected number of failures (ENOF) over time by:
Table 4 provides the estimated number
of failures based on the above equation, at different system ages
and using the calculated MTBF of 324.5 km. The estimate is compared
to the "true" number of failures, which is determined from the original
distributions and RBD. Table 4: Expected failures, using
MTBF
From this table, we can see that
the difference between the estimate and the "true" value improves
at higher ages. This is expected since averages are more suitable
as system age approaches steady state and time reaches infinity. The second analysis method that we
will consider is based on the approach used previously, with the
exception that it uses the actual distribution of the TBFs instead
of the MTBF. This can be done easily using the BlockSim software.
A single block is created in BlockSim with a failure distribution
obtained from the TBFs. In this example, we obtained a Weibull distribution
with beta = 1.10 and eta = 337 km. Since
this is a repairable system, a repair distribution also is needed
in BlockSim. Since we are ignoring the downtime in this example,
the corrective maintenance duration is set to zero. Under these settings, we run multiple
BlockSim simulations for different system ages and we record the
results. In this case, the metric of interest is the expected number
of failures (ENOF), as shown in Table 5. Table 5: Expected failures, using
TBF distribution in BlockSim
From this table, we can see that
little improvement is achieved using this approach for this example.
However, it is a useful approach in case we need to model large-scale
systems composed of multiple repairable subsystems. In this case, we determine the individual
failure distributions of the components from the data. This is done
by obtaining the times between failure for each individual component.
For example, Table 6 gives the TBFs for the engine and Figure 3
shows the Weibull++ analysis to obtain the failure distribution
of the engine based on this data set. Table 6: Engine TBFs for each system
Figure 3: Using Weibull++ to obtain the TBF distribution for the engine The distributions of all the components
can be determined in a similar manner, and are given in Table 7. Table 7: Component failure distributions
It should be noted that for the brakes,
all of the preventive maintenance actions were considered as suspensions
when building the model. In addition, the data from all the rear
brakes were considered as one data set (regardless of the side),
and similarly for the front brakes. It can be seen that this method requires
sufficient failure information at the component level. If component
failures are scarce, then it would be difficult and possibly inaccurate
to implement this method. These distributions were entered
in BlockSim and simulations were performed for different system
ages. A preventive maintenance policy for the brakes was included
in the model as well. The results are given in Table 8. Table 8: Expected failures, using
component distributions in BlockSim
With the fourth analysis approach,
the individual cumulative times to event for each system are considered
and the NHPP model with a power intensity function is fitted to
the data. The model is given by the following equation: where: and: This model is an extension
of the homogeneous Poisson process, in which the failure
rate is assumed to be constant (i.e. exponential
distribution). In the case of the NHPP, however, the failure
rate could be increasing, decreasing or constant (as in
the Weibull distribution), based on the value of beta in
Eqn. (3). The assumption of this model is that, after each
failure, the system is restored to the same condition it
was in prior to the failure ("as-bad-as-old"). This assumption
is sufficient when dealing with large systems; however,
it becomes less applicable for smaller systems (fewer components)
where a replacement has a significant impact on the system
(renewal). Using the RGA software, the
NHPP with a power law intensity function is fitted to the
cumulative failure times of each system (recorded as mileage
in this example). As shown in Figure 4, the beta is 1.65,
which indicates an increasing ROCOF for this system/design
(i.e. wearout). In other words, as the systems age,
more events are observed and the TBF intervals decrease.
The expected number of failures at different ages can be
computed based on the model and the results are given in
Table 9.
Figure 4: Analyzing system level data with the NHPP model in RGA Table 9: Expected failures,
using the NHPP model
The last approach, using
the General Renewal Process (GRP) model, is an improvement
to the NHPP approach. As mentioned in the previous section,
the NHPP model assumes that the system is "as-bad-as-old"
after each failure (i.e. in the same condition as
it was prior to the failure). The GRP model relaxes this
assumption by including an additional parameter,
q,
which is a measure of the degree of restoration (renewal)
and is determined from the data. The data set used is the
same as the one used with the NHPP approach, i.e.
cumulative times to event of each system. The GRP model is fitted to
the data using Weibull++, as shown in Figure 5. The results
are given in Table 10.
Figure 5: Analyzing system level data with the GRP model in Weibull++ Table 10: Expected failures,
using the GRP model
In this article, five different
analysis methods were used to model the failure behavior
of a repairable system. The data were generated using simulation
based on predefined failure distributions. The expected
number of failures was used as a metric for comparing the
results from each analysis approach to the "true" behavior
of the system (which is known, since the generating failure
distributions are known). The table and plot in Figure 6
compare the results of the five different methods.
Figure 6: Comparing the results of the five analysis methods It can be seen that the RBD
simulation approach offers the more realistic estimates
in this example. Of course, the estimation always depends
on the number of observed events, and this is why the analysis
method should be chosen based on the available information.
If, for example, very few failure events had been observed,
the RBD simulation approach based on component distributions
would be very hard to adopt, and most likely a bad estimator.
The simulation using the system’s TBF distribution could
offer a better estimator when dealing with few failures
at the component level, but it typically becomes a good
predictor when extrapolating to longer system ages. In addition,
this method cannot be used for reliability/unreliability
calculations. The MTBF method is recommended only for quick,
"back of the envelope" calculations since the simulation
based on the TBF distribution is similar and slightly more
accurate. Finally, the GRP model is typically more accurate
than the NHPP model (which is actually a special case of
the GRP). Even though it is more complicated, the GRP is
recommended over the NHPP. Therefore, the following recommendations
can be made: In addition to these recommendations,
this article can be used to further understand the assumptions
behind each analysis method, the data required and the type
of results that can be obtained.
|
|||||||