Reliability Edge Newsletter

Quarter 1, 2002:  Volume 3, Issue 1

Reliability Edge Home

Analyzing Data from Equipment Downtime Logs

For product manufacturers who rely on repairable manufacturing equipment, downtime logs can be a valuable source of life data for reliability, maintainability and availability analyses. In order to prepare the data for reliability analysis, the analyst must convert the information in the equipment downtime logs into times-to-failure and times-to-repair. This can be a time-consuming and error-prone process when performed manually. 

This article describes the process for converting equipment downtime logs to usable reliability data and introduces Weibull++ MT, a special version of ReliaSoft’s Weibull++ life data analysis software package that provides utilities to automate the process.

Variety Among Equipment Downtime Logs 
Equipment downtime logs may be constructed in a variety of formats and the type of data in the log determines the process that must be used to convert the log entries to life data (i.e. data that can be used for reliability analysis). Typically, an equipment downtime log will contain the dates and times when events occurred, the dates and times when the system was restored to operation and an indication of the component that was responsible for each event. The “events” can represent system failures as well as other events of interest, such as user interventions or planned maintenance activities. Failure events and other non-failure events will be treated differently in the conversion process. In addition, the responsible components can represent various levels in the system configuration (e.g. system, subsystem, assembly and part) and these levels will also impact the analysis. 

Shift patterns for the operation of the equipment must also be taken into account during the conversion process because the accumulated age of the components will be different depending on the hours of operation for the system. Finally, some items may continue to accumulate age while the system is down due to the failure of another component, whereas other items will only accumulate age when the system is operating. This characteristic of the component must be taken into account when determining the times-to-failure. 

The following example will be used to demonstrate the process required to convert one type of equipment downtime log into life data. A similar process with specific adjustments can be used to convert and analyze the data in other types of downtime logs. 

Example: Converting a Downtime Log to Life Data 
Consider a simple system composed of two components, A and B. The shift of operation for the system goes from 8:00 a.m. to 5:00 p.m. Monday through Friday. When a failure is observed, the system undergoes repair until the system is operational again, regardless of the shift pattern for system operation. In other words, the repair would continue beyond the end of the shift, if necessary, until the system is operational. The downtime log for this system is presented in Figure 1. 

Downtime Log

Figure 1: Sample equipment downtime log

The sample equipment downtime log contains a record of events from 12:00 p.m. on January 1, 1997 through 1:00 p.m. on March 18, 1997. All events reported in the log are failures and repair involves the replacement of the responsible component. The log contains the following information:

  • The date and time when the system failed. 
  • The date and time when the system was repaired and restored to operation. 
  • The component responsible for the failure. 
  • An indication (in the OTF column) of whether the responsible component continues to age even when the system is down due to the failure of another component.

This information can be used to obtain times-to-failure and times-to-repair for each component. The procedure to analyze component B is different than the procedure for component A because component B continues to accumulate age even when the system is down due to the failure of another component. Both procedures for conversion are presented next. 

Analysis for Component A 
We begin the analysis by looking at component A. The first time that component A is known to have failed is recorded in row 1 of the equipment downtime log table in Figure 1. The first data point for component A, [1], is the sum of the hours of operation for each day from the date/time when events began to be recorded in the downtime log to the first failure date/time. Thus, [1] = 5 hr + 8 hr = 13 hr. This is shown graphically in Figure 2. 

Component A time-to-failure

Figure 2: First time-to-failure for component A

This represents a right censored data point (i.e. suspension) because we do not know how long the equipment operated before events began to be recorded in the downtime log. The time-to-repair for component A as the result of this failure, [1], is the total time between the date/time when the failure occurred and the date/time when the system was repaired, or [1] = (1/2/97 1997 7:49 PM - 1/2/97 4:00 PM) = 3 hr, 49 min = 3.817 hr. 

Continuing with component A, the second system failure due to component A is found in row 4 at 3:26 p.m. on January 12, 1997. Remember that component A does not age when the system is down due to the failure of component B. Therefore, to compute [2], we must look at the age the component accumulated from the last time the system was restored to operation, which does not include the time between operating shifts or the time when the system was down for repair due to component B. This is shown graphically in Figure 3. 

Component A time-to-failure

Figure 3: Second time-to-failure for component A

To describe this mathematically, we will use the function , which returns the shift hours worked during a range of times. For this example, given an 8 a.m. to 5 p.m. shift, (1/1/97 3:00 AM, 1/1/97 6:00 PM) = 9 shift hours. Furthermore, DTO represents the date and time a failure occurred, DTR represents the date and time a repair was completed and numerical subscripts represent the row number for the entry in the downtime log. Therefore, the total possible hours (TPH) that component A could have operated from the time it was first repaired to the time it failed the second time, if the failure of component B had not caused the system to shut down, is:

Component A TPH

The time that component A was not operating (NOP) during normal hours of operation is the time that the system was down due to the failure of component B, or:

Component A NOP

Thus, the second time-to-failure for component A, [2], is the total possible hours minus the time that component A was not operating due to the failure of component B, or:

Component A time-to-failure

To compute the time-to-repair for this failure, we determine the time between the occurrence of the failure and the completion of the repair, or:

Time-to-failure

The same process can be repeated for the rest of the observed failures of component A. 

Analysis for Component B 
Since component B continues to operate even when the system is down, the process to determine the times-to-failure for component B is less complex than the process for
component A. The time-to-failure for component B is calculated in the same way that the total possible hours (TPH) were calculated for component A, regardless of the time that the system was down due to the failure of another component. The first three times-to-failure are calculated as follows and the remaining times are calculated in a similar manner:

First three time-to-failure

The process to compute the times-to-repair for component B is the same as the process for component A. For example:

Component B time-to-failure

The complete data set with times-to-failure and times-to-repair for components A and B is presented in Table 1. Note that the last points for components A and B are right censored (i.e. suspensions) because we know that each component was operating successfully at the end of the observation period. We do not know what may have happened after the observation period ended. The reliability information in this table can be analyzed with standard reliability, maintainability and availability analysis techniques.

Components A and B times-to-failure and times-to-repair

Table 1: Times-to-failure and times-to-repair for components A and B 

Using Weibull++ MT to Automate the Analysis 
Although product manufacturers can realize substantial benefits by obtaining life data from equipment downtime logs, the conversion process can be cumbersome and error-prone when performed manually. Through automation of tedious and repetitive calculations, the Weibull++ MT software speeds up and simplifies the process. In addition, the software allows the user to transfer the data to a Weibull++ data folio for further life data analysis or to BlockSim for system reliability, maintainability and availability analysis based on component data. Weibull++ MT is a special industry-specific version of ReliaSoft’s Weibull++ 6 that has been specifically designed to meet the needs of the machine tool supplier community and other organizations with similar needs. The MT edition includes all the features and functionality of Weibull++ 6 and adds a “machine tools” interface for specialized data entry and conversion. Figure 4 shows an example of the Weibull++ MT interface, with the data entry and shift pattern functionality displayed. Weibull++ MT is on the Web at http://www.ReliaSoft.com/Weibull/mt.

Weibull++ MT interface

Figure 4: Example of the Weibull++ MT interface