Analyzing
Data from Equipment Downtime Logs
For product
manufacturers who rely on repairable manufacturing equipment, downtime logs
can be a valuable source of life data for reliability, maintainability and
availability analyses. In order to prepare the data for reliability analysis,
the analyst must convert the information in the equipment downtime logs into
times-to-failure and times-to-repair. This can be a time-consuming and
error-prone process when performed manually.
This article describes the
process for converting equipment downtime logs to usable reliability data and
introduces Weibull++ MT, a special version of ReliaSoft’s Weibull++ life
data analysis software package that provides utilities to automate the
process.
Variety
Among Equipment Downtime Logs
Equipment downtime logs may be constructed in a variety of formats and the
type of data in the log determines the process that must be used to convert
the log entries to life data (i.e. data that can be used for
reliability analysis). Typically, an equipment downtime log will contain the
dates and times when events occurred, the dates and times when the system was
restored to operation and an indication of the component that was responsible
for each event. The “events” can represent system failures as well as
other events of interest, such as user interventions or planned maintenance
activities. Failure events and other non-failure events will be treated
differently in the conversion process. In addition, the responsible components
can represent various levels in the system configuration (e.g. system,
subsystem, assembly and part) and these levels will also impact the
analysis.
Shift patterns for
the operation of the equipment must also be taken into account during the
conversion process because the accumulated age of the components will be
different depending on the hours of operation for the system. Finally, some
items may continue to accumulate age while the system is down due to the
failure of another component, whereas other items will only accumulate age
when the system is operating. This characteristic of the component must be
taken into account when determining the times-to-failure.
The following
example will be used to demonstrate the process required to convert one type
of equipment downtime log into life data. A similar process with specific
adjustments can be used to convert and analyze the data in other types of
downtime logs.
Example:
Converting a Downtime Log to Life Data
Consider a simple system composed of two components, A and B. The shift of
operation for the system goes from 8:00 a.m. to 5:00 p.m. Monday through
Friday. When a failure is observed, the system undergoes repair until the
system is operational again, regardless of the shift pattern for system
operation. In other words, the repair would continue beyond the end of the
shift, if necessary, until the system is operational. The downtime log for
this system is presented in Figure 1.
Figure
1: Sample equipment downtime log
The sample
equipment downtime log contains a record of events from 12:00 p.m. on January
1, 1997 through 1:00 p.m. on March 18, 1997. All events reported in the log
are failures and repair involves the replacement of the responsible component.
The log contains the following information:
- The date and
time when the system failed.
- The date and
time when the system was repaired and restored to operation.
- The component
responsible for the failure.
- An indication
(in the OTF column) of whether the responsible component continues to age
even when the system is down due to the failure of another component.
This information
can be used to obtain times-to-failure and times-to-repair for each component.
The procedure to analyze component B is different than the procedure for
component A because component B continues to accumulate age even when the
system is down due to the failure of another component. Both procedures for
conversion are presented next.
Analysis
for Component A
We begin the analysis by looking at component A. The first time that component
A is known to have failed is recorded in row 1 of the equipment downtime log
table in Figure 1. The first data point for component A, [1],
is the sum of the hours of operation for each day from the date/time when
events began to be recorded in the downtime log to the first failure
date/time. Thus, [1] = 5 hr + 8 hr = 13 hr. This
is shown graphically in Figure 2.
Figure
2: First time-to-failure for component A
This represents a
right censored data point (i.e. suspension) because we do not know how
long the equipment operated before events began to be recorded in the downtime
log. The time-to-repair for component A as the result of this failure, [1],
is the total time between the date/time when the failure occurred and the
date/time when the system was repaired, or [1]
= (1/2/97 1997 7:49 PM - 1/2/97 4:00 PM) = 3 hr, 49 min = 3.817 hr.
Continuing with
component A, the second system failure due to component A is found in row 4 at
3:26 p.m. on January 12, 1997. Remember that component A does not age when the
system is down due to the failure of component B. Therefore, to compute [2],
we must look at the age the component accumulated from the last time the
system was restored to operation, which does not include the time between
operating shifts or the time when the system was down for repair due to
component B. This is shown graphically in Figure 3.
Figure
3: Second time-to-failure for component A
To describe this
mathematically, we will use the function , which returns the shift hours
worked during a range of times. For this example, given an 8 a.m. to 5 p.m.
shift, (1/1/97 3:00 AM, 1/1/97 6:00 PM) = 9 shift hours. Furthermore,
DTO represents the date and time a failure occurred, DTR represents the date
and time a repair was completed and numerical subscripts represent the row
number for the entry in the downtime log. Therefore, the total possible hours
(TPH) that component A could have operated from the time it was first repaired
to the time it failed the second time, if the failure of component B had not
caused the system to shut down, is:

The time that
component A was not operating (NOP) during normal hours of operation is the
time that the system was down due to the failure of component B, or:

Thus, the second
time-to-failure for component A, [2], is the
total possible hours minus the time that component A was not operating due to
the failure of component B, or:

To compute the
time-to-repair for this failure, we determine the time between the occurrence
of the failure and the completion of the repair, or:

The same process
can be repeated for the rest of the observed failures of component A.
Analysis
for Component B
Since component B continues to operate even when the system is down, the
process to determine the times-to-failure for component B is less complex than
the process for component
A. The time-to-failure for component B is calculated in the same way that the
total possible hours (TPH) were calculated for component A, regardless of the
time that the system was down due to the failure of another component. The
first three times-to-failure are calculated as follows and the remaining times
are calculated in a similar manner:

The process to
compute the times-to-repair for component B is the same as the process for
component A. For example:

The complete data
set with times-to-failure and times-to-repair for components A and B is
presented in Table 1. Note that the last points for components A and B are
right censored (i.e. suspensions) because we know that each component
was operating successfully at the end of the observation period. We do not
know what may have happened after the observation period ended. The
reliability information in this table can be analyzed with standard
reliability, maintainability and availability analysis techniques.

Table
1: Times-to-failure and times-to-repair for components A and B
Using
Weibull++ MT to Automate the Analysis
Although product manufacturers can realize substantial benefits by obtaining
life data from equipment downtime logs, the conversion process can be
cumbersome and error-prone when performed manually. Through automation of
tedious and repetitive calculations, the Weibull++ MT software speeds up and
simplifies the process. In addition, the software allows the user to transfer
the data to a Weibull++ data folio for further life data analysis or to
BlockSim for system reliability, maintainability and availability analysis
based on component data. Weibull++ MT is a special industry-specific version
of ReliaSoft’s Weibull++ 6 that has been specifically designed to meet the
needs of the machine tool supplier community and other organizations with
similar needs. The MT edition includes all the features and functionality of
Weibull++ 6 and adds a “machine tools” interface for specialized data
entry and conversion. Figure 4 shows an example of the Weibull++ MT interface,
with the data entry and shift pattern functionality displayed. Weibull++ MT is
on the Web at http://www.ReliaSoft.com/Weibull/mt.
Figure
4: Example of the Weibull++ MT interface

|