A Perfunctory “Press on Regardless” Turns Nasty

The aircraft, an Airbus A319, was dispatched under the provisions of the operator’s Minimum Equipment List (MEL) with the Auxiliary Power Unit (APU) generator on line, substituting for the No. 1 main generator, which had been selected off after a fault on the previous flight had caused it to trip off-line.

During the cruise, the APU generator disconnected from the system, probably because of a recurrence of the original fault. This caused the loss of a substantial number of aircraft services, including many flight instruments and all means of Radio Telephony (RTF) communication.

In theory, manual reconfiguration of the electrical system should have recovered many of the services, but the flight crew was unable to achieve this. Since they were without RTF communications, the crew considered that the best option was to select the emergency transponder code and continue the flight in accordance with the flight plan.

On the prior sector from Stansted to Alicante, Spain, the #1 Integrated Drive Generator (IDG) had failed. Attempted resets were unsuccessful. Approval was available for redispatch under the MEL with IDG1 selected off and with the APU (not normally left running) to provide generator services in lieu. It was supplying the Main AC busbar #1 (AC1) and IDG2, the other engine-driven generator, was powering AC2.

In the cruise at Flt Lvl 320, the crew heard a clunk and lost multiple services and systems. These included all the captain’s flight instruments: Primary Flight Display (PFD), Upper Electronic Centralized Aircraft Monitoring display (ECAM), and the Multi-purpose Control and Display Unit (MCDU). The autopilot kicked out with an aural Master Warning alarm tone, autothrust was lost (also triggering an aural alert).

In addition, many illuminatory lights were lost. That’s no big deal in daylight, but it would be at night. Of more concern was the loss of all lights on the overhead panels. It just “went dark”. This meant not only the illuminatory lighting but all integrated captions.

Consequently, if a warning (RED) or caution (AMBER) or advisory (GREEN) caption would have been illuminated to indicate a further systems loss or deficiency (which was likely under the circumstances), no such advice was available and no further correspondence was to be entered into.

The center pedestal lighting and captions similarly went to sleep. Such sweeping multiple electrical failures in an all-electric jet would have gotten the crew’s rapt attention. There’s no mention in the report about what might have been happening aft of the cockpit door.

The copilot, noting that the Commander no longer had any flight instruments, assumed control. He ascertained that the FBW (fly-by-wire) aircraft was now in a reversionary Alternate Law mode. The captain started running the appropriate checklist. He pushed the AC Essential Feed push-button but this had no effect.

He also noted that its integral caption was dead so he couldn’t discern whether its pre-push status had been NORMAL or ALTERNATE (all the captions were dead, remember?). Can one have faith that the button’s electronic function is still available in such circumstances? Might this be considered a design dichotomy?

The conundrum deepened when the pilot tried to transmit a Mayday call. He tried VHF1 and VHF2 but obviously his own Radio Management Panel was dead. The copilot tried his and then they tried the Observer station’s (i.e., jump-seat) VHF3. All were dodo’d, as in “dead as a ..”

In such circumstances, it’s not unknown for a crew to transmit “in the blind” on the assumption that the loss of receiver side-tone (only) may be the problem (as in it being muted or oversquelched) and that a one-way comms transmission is way better than no comms at all. Perhaps a passenger’s cell-phone might have been a fall-back solution?

Pressing on with the ECAM actions, the pilot selected ATC2 (the alternative ATC transponder) and dialed up the Distress code of 7700. This at least (he hoped) would cause the bells to ring and captions to flash on ATC radar screens along his route of flight. ATC would keep the flight cocooned in an airspace bubble and assume that the low-cost flight was either in electronic limbo — or into big-time savings on its electricity bill.

The crew’s selected course of action was to fly the flight-plan to destination, although they’d have no updated weather for it. When the gear was selected down it failed to extend, but the mechanical drop system worked, although reassuring indications were still very sparse. A safe landing was made at Bristol despite the cumulative deficiencies.

In the Airbus system, a Generator Control unit (GCU) monitors the IDG’s outputs and opens a GLC relay (Gen Line Contactor) if it detects an out-of-limits condition (volts/amps/cycles etc). Selecting a GEN to OFF also opens said GLC. In the event of IDG failure, the APU Gen can contribute to the load via a BTC (Bus Transfer Contactor).

The BTC’s other important function is to keep the two IDG’s isolated from each other. The manufacturer’s Master MEL allows for non-ETOPS dispatch with one IDG inop, provided the APU GEN is online. Fault monitoring within the GCU checks that the inoperative IDG’s GLC has isolated it from the supply system. It does this by monitoring each phase of Gen current through current transformers (CT’s) in each GEN.

If a fault current is detected, the GCU opens the associated BTC. Note that well, and now note this: As this function is intended to protect against any failure of the GLC contacts to spring open after a fault, it remains in effect even when the faulty IDG is selected OFF. Think of it as a form of redundancy bootstrapping that can also cross-tie your L & R shoe-laces together.

The distribution system has an AC Essential Bus (AC ESS) with all the stuff you’d never want to lose on it, and that’s normally powered from AC1. It has two DC busbars (1 & 2) powered from AC1 and AC2 resp via transformer rectifiers. A DC Essential Busbar, powering similarly vital stuff, normally sucks its power from DC1 via a DC Battery Busbar (DC BAT). Each ESS busbar supplies an ESS SHED busbar.

Thus, a loss of AC1 kills the AC ESS busbar and the flow-on (or rather, OFF) effect is the death of AC ESS SHED, DC ESS and DC ESS SHED. DC1 busbar is also lost but after 5 seconds it auto-transfers to feed from DC2 via DC BAT. However, it will not then supply the DC ESS busbar. For those who prefer distribution schematics, there’s one for G-EZAC’s predicament at tinyurl.com/ynmyv2.

You’ll recall from above that the first thing the captain did was push the (now blazingly blank) AC ESS FEED pushbutton to manually operate a change-over contactor and transfer the AC ESS busbar to AC2, in an attempt to “bring everything back”. It didn’t. Tell you why in a bit.

But why should all the radios also fail you might ask (or one would hope you would)? Well, the airplane (G-EZAC) had been fitted with an upgraded AMU (Audio Management Unit) that integrates ALL radios (as in eggs in one basket). Unlike earlier versions, its operation depended solely upon the continued operation of DC ESS.

Now one might ask, especially in light of all the Airbus A380 negative publicity, “whatever’s happened to Failure Modes Effects and Criticality Analysis (FMECA) within Airbus?” Airbus blandly advises that “this meets present certification standards”. Yeah. Sure. Right.

Luckily, the Flight Data Recorder just happened to be on the AC2 busbar and remained powered. It recorded the point at which AC1, AC ESS and DC ESS had lost power. The system losses were all caused by the one triggering event. BTC2 had opened. However, on the ground it was looking like a “no fault found” outcome.

Everything was operating normally except that the aircraft took a few attempts to accept ground power. Bench testing of contactors, pushbuttons and relays found no anomalies. Laboratory level testing eventually found an intermittent fault within GCU1. A pseudo current was being incorrectly detected by one of the CT’s within the “failed” and inoperative GEN (IDG1).

This tied in with a fault monitoring code on the flight into Alicante and it was repeated on the flight out to Bristol. The fault within the GCU’s monitoring system had been incorrectly interpreting as a failure of GLC1 to open, and thus locked out the Bus Contactor (BTC2), repeatedly rejecting the APU GEN’s feed.

An ostensibly failed, OFF and isolated IDG1 was still calling the shots. Testing also revealed a logic fault within the GAPCU (combined Groundpower/APU Generator Control panel). The Voice Recorder (CVR) is powered by the AC ESS SHED busbar, so we’re not able to hear the puzzlement quotient or the crew’s epithetical cussing of “what’s it not doing now?”.

Within a fault-tolerant system’s protocols, Byzantine means “behavior in a failure-uncontrolled manner”. Thus, this fault logic train-wreck is really a Byzantine failure mode, so once again we ask about FMECA. And when step #1 in a critical failure checklist has you pressing a pushbutton whose own internal status lights have failed, you might well long for an analog rotary switch and remote caption that gets its power from elsewhere.

It was determined that having a status-lit pushbutton in lieu of a solenoid-held paddle-switch contributed to the Flash Airline 737 Captain’s (and F/O’s) confusion about the engaged status of his autopilot, long enough to allow the unrecoverable unusual attitude to develop way beyond simple task saturation. In some applications, perhaps we should “extract the digit” and revert to analog.

The UKAAIB investigation additionally found that a number of hardware faults also existed within GCU’s (and GAPCU’s) of the type used on the A320 genus, the A330 and A340. The contents of a Static Read-Only Memory (SRAM) component could alter and result in a GCU Failsafe fault (that being a similar “denial-of-service” condition to G-EZAC’s enigma).

However, this condition could normally be reset by repeatedly pushing the GEN’s ON/OFF push-button. An Operator’s Information Telex (OIT) had been sent out warning of this, but as it was in the maintenance chain, it didn’t go to flight-crews.

Other failures were identified that could cause the loss of the DC ESS busbar (and cause a total comms loss) so it was recommended that Airbus should advise all operators of this possibility. (Editor’s note: Perhaps all ATC world-wide also, following on from recent events in Brazil.)

Airbus considers that the certification standard for comms loss (1 x 10-5) is being met. It was also suggested that Airbus might segregate some comms capabilities by spreading the bus-load. Airbus is “studying the feasibility”.

The UKAAIB says blithely, “It’s undesirable that a system’s misinterpretation of a single fault should cause the loss of multiple busbars”…. and asked Airbus to revise its fault monitoring logic. This will happen in a software release sometime in the indeterminate future. Upon the subject of the redispatch under MEL, the AAIB expressed it as “a matter of particular concern that…..”.

Airbus in turn stated that their “System Safety Assessment predicts a sufficiently low probability of recurrence to allow their safety objectives to be met in this dispatch configuration.” Airbus is however considering an OIT to warn crews to check with their maintenance department before redispatching with IDG1 inop.

Strangely, joining in the same sang froid approach, the UKAAIB thereupon agreed not to issue any MMEL recommendations. But hopefully all Airbus simulator instructors are amending their syllabi to include similar joy-rides to that of the G-EZAC crew’s.

The Great Unspoken would seem to be whether it’s acceptable to end up on single GEN ops at night or in weather. Beyond that, we’d also have to ask: “What happens to an all electric FBW jet when the plug is pulled out of the wall?”

It’s not the same as the total segregation of two engines under ETOPS rules. No matter how many generators you’ve got, there’s only one non-redundant electrical distribution system — and it’s all tied together and interdependent. How many other Byzantine failure modes might lurk sight unseen within?