As a former air safety investigator, I was often presented with an accident or incident where one of the key elements of the event was the presence of a workaround or deviation to published procedures established by the organization or mandated by the manufacturer.
It’s a common problem in all organizations and is rooted in our innate ability to problem solve coupled with resource driven pressures to get the job done better, cheaper, faster. By resource I mean time, money, and labor.
Learn from experience
It is said that our experience is the sum of our mistakes. However, we are fortunate that the means to communicate the errors of ourselves and others is so prolific in today's world. By reading about other mechanics’ experiences - some of them bad - we have the opportunity to learn and improve our own performance daily.
Here is a famous accident that hit the news many years ago: This was an accepted workaround supported by internal work instructions. The process failed to anticipate the failure of ground support equipment as a DC-10 engine was hung with the pylon attached to the engine rather than separately as called out by the manufacturer. The engine is left over night with the forklift supporting the forward portion of the engine. The forklift loses pressure and the forks settle creating a twist in the rear engine mount which then cracks. The next day the engine change is completed, but the crack goes unnoticed. As the aircraft departs the airport the left engine departs the airplane along with most of the left wing lift devices. The aircraft goes down.
Here’s another (It didn’t make the news):
An aircraft elevator jack screw was removed and sent for overhaul. Upon completion of the overhaul, it was received and sent back to the aircraft for re-installation. After some time, it was installed and when the required inspection was accomplished it was found to have broken limit switch seals and a damaged switch housing. The unit was sent back to the overhaul agency who estimated the damage at $15,000. Further investigation revealed that the unit was the wrong part number for that aircraft. The project manager objected because the unit was originally removed from the aircraft; however, it was found that the unit had been modified by the previous owner, a non-U.S. air carrier and installed years before. The assumption and accepted practice was that, since it was the unit removed previously, it was acceptable for re-installation.
Standard operating procedures
The air carrier SOP (standard operating procedure) was written requiring the mechanic to verify a parts’ acceptability for installation; which includes assuring the part was the right one for the aircraft. But their logic and the “rule of thumb” said that it was acceptable to install because of previously assumed installation history. In most cases this had worked for them. But this time ... not so much.
The root or contributing cause for many incidents or accidents lies in the failure of maintenance personnel to follow standard operating procedures. Often these systems contain some kind of double check system such as inspection buy back, ops check read backs, lock out and tag out, etc.
Key departures from maintenance SOP include:
- Failure to perform an adequate turn over during a work stoppage or shift change resulting in missing key information;
- Failure to follow a check list or procedural step as directed by the aircraft maintenance manual;
- Use of improper tooling, improper tool substitutions - or misuse of tooling;
- Improper management of processes and their controls.
The point of maintenance processes and controls are to assure that high levels of safety and workmanship are maintained for the airplane to which they apply. Following them provides the means to avoid hazards as well as reduce the creation of hazards that are latent. It does something else that most people fail to realize, but on reflection becomes obvious – it promotes repeatability. If standards are high and followed, the quality will consistently reflect the standards. If a workaround is in place then repeatability of a lesser standard may become the norm.
Tolerance creep
The thing is workarounds are insidious. They are subject to “tolerance creep.”
Tolerance creep is defined as: The gradual deterioration of a standard or limit by the assumption that previous experience shows that limits are flexible. As each evaluation of the limit is made for the same item or similar items on aircraft elsewhere further “judgment calls” allow the limits to be exceeded based on logical sounding assumptions that promote general consensus.
A good sample of tolerance creep is fuel prices. Once gas got to $4 a gallon, there was a lot of public outcry. But as time passed and the fuel price became the norm, people stopped protesting and have by and large accepted that fuel prices are going to be higher.
In any organization once personnel have established that something works, even if it violates a standard, it becomes an accepted norm over time and is subject to constant enhancement by use and application in the work environment. In the end it may be that no one will know where the process came from … it will be accepted as the way things are. Complacency has set in.
To that end latent hazards are not often detectable until an event reveals their presence. In fact, the decision to accomplish a short cut in a procedure or ignore it altogether may not result in immediate negative consequences. These latent “states” will wait for the right set of circumstances to reveal themselves. The most embarrassing may be, in fact, the established means by which things get done.
Procedural deviations
Aircraft accidents are rare events and the least likely outcome of workarounds, but their severity greatly magnifies the outcome of such violations.
For example: In January 2003 a Beech 1900D crashed on takeoff from Charlotte, NC. The findings from the report:
- The accident airplane's elevator control system was incorrectly rigged during the detail six maintenance check, and the incorrect rigging restricted the airplane's elevator travel to 7 degrees airplane nose down, or about one-half of the downward travel specified by the airplane manufacturer.
- The changes in the elevator control system resulting from the incorrect rigging were not conspicuous to the flight crew.
- The repair station quality assurance inspector did not provide adequate on-the-job training and supervision to the mechanic who examined and incorrectly adjusted the elevator control system on the accident airplane.
- Because the repair station’s quality assurance inspector and the mechanic did not diligently follow the elevator control system rigging procedure as written, they missed a critical step that would have likely detected the mis-rig and thus prevented the accident.
This is tragic. It’s not unlikely these things had happened before, but not in a way that would result in the destruction of the aircraft until all the right circumstances became combined. The last bulleted sentence says it all.
In creating a progressive organization based on best practices, management must lead the way in ruthlessly examining internal processes for compliance, currency, and safety in the context of a dynamic working environment. Internal self-audit or evaluation methods are the most effective way of examining operations and challenging complacency. In using all the tools at our disposal we drive the risk posed by procedural deviations way down, to levels that assure every flight remains an uneventful journey by the flying public.
It may be that some people’s sole purpose in life is to be an example of what not to do in this world. Don’t become the poster child for what not to do when fixing an airplane. Take the long way home ... no workarounds.
Vern Berry began his aviation career as an A&P mechanic in 1979. His experience within the aviation industry includes key management roles in quality and safety for both MRO and air carrier operations. He and his wife currently reside in upper state New York where he writes and manages a consultant firm at www.blowntireaviation.com.