The world’s Largest Sharp Brain Virtual Experts Marketplace Just a click Away
Levels Tought:
Elementary,Middle School,High School,College,University,PHD
| Teaching Since: | Apr 2017 |
| Last Sign in: | 328 Weeks Ago, 2 Days Ago |
| Questions Answered: | 12843 |
| Tutorials Posted: | 12834 |
MBA, Ph.D in Management
Harvard university
Feb-1997 - Aug-2003
Professor
Strayer University
Jan-2007 - Present
26 | Loss Prevention Bulletin 251 October 2016 Incident The Challenger Space Shuttle disaster
John Wilkinson, Human Instrumental Ltd, UK
Summary
The space shuttle Challenger disintegrated 73 seconds
after launch on 28 January 1986 killing all seven
astronauts aboard. An O-ring seal in the right solid
rocket booster (SRB) failed at lift off causing a breach in
the SRB joint seal. This let pressurised hot gas escape
and ignite, affecting nearby SRB attachment hardware
and an external fuel tank leading to structural failure.
NASA management knew the design of the SRB had a
potentially catastrophic flaw in the O-rings but did not
address this effectively. They also appeared to have
disregarded warnings from engineers and not to have
passed on their technical concerns. Keywords: Production pressure, culture, risk
assessment, design, hindsight bias This review is based on:
• the original (brief) LPB coverage1 in a wider review of
communication failures; • the original US Presidential Commission's report of the
investigation (the Rogers report)2; • the US Congress Committee on Science and Technology's
review3 of the Roger's report and NASA's own
investigation;
the seminal account by Diane Vaughan (published in 1997
but recently republished as an enlarged 2016 edition —
the only change is a new foreword on Columbia)4; and
the subsequent Columbia Accident Investigation Board's
(CAIB) report of the 2003 Columbia space shuttle disaster5. • • In considering the disaster on this 30th anniversary, the author
has aimed to stand back from the later Columbia accident.
Since 2003 Challenger is mostly seen and studied through the
lens o f Columbia (as an example o f an organisational learning
failure) but it is worth looking at what was known before this so
that the original accident is seen more clearly. Even though the
CAIB report acknowledges this risk explicitly, there is inevitably
a risk o f hindsight bias and selectivity in such post-Columbia
accounts of Challenger. Therefore, the focus here is more
on Vaughan's original and exhaustive account of Challenger
alone.
Like Andrew Hopkins (of 'Lessons from Longford' fame)
Vaughan is a sociologist, appropriate enough for the sociotechnical systems involved both in space travel and in the
process industries. Explaining major accidents of any kind
requires both engineering / technical expertise as well as an
understanding of how organisations (as social structures) and IChemE people work. This sociological input produces better learning
from such events and improves the chances of avoiding future
disasters. This paper summarises the accident, its technical
and immediate causes and the contributing organisational
factors. Clear lessons emerge for the process industries. One
of the big enemies of learning from accidents is a defensive
'checklist' approach e.g. 'we don't have that equipment, that
process, that goal - so this doesn't apply to us'. This approach
screens out potential learning opportunities. It is much better
to say 'OK, this doesn't look like a direct correlation, but what
can we learn?' This turns learning into a potentially much more
productive process rather than a checklist approach. The accident
Challenger launched at 11.38 a.m. EST on 28 January. It
disintegrated 73 seconds into the first tw o minute ascent
stage killing all seven astronauts on board. They included the
well-publicised presence of Christa McAuliffe, a teacher due
to teach elementary pupils from space. Rather like the Space
Lab today, the shuttle launches were then seen as sufficiently
routine to allow such diversity.
The technical explanation for the disaster is relatively
straightforward. There were tw o Solid-propellant Rocket
Boosters (SRBs) attached to the space shuttle. The Solid
Rocket M otor (SRM) was contained within the four main
central segments of the assembled SRB. The SRBs provided
80% of the thrust required at lift-off to get the whole shuttle
assembly o ff the ground and into space. The shuttle itself
initially consisted of the orbiter vehicle, the external fuel
tank and the SRBs. The solid fuel in the SRBs was reacted
to produce very hot high-pressure gas which expanded and
accelerated on moving through the rear nozzle to provide
thrust. The SRBs were jettisoned two minutes into the ascent
and were later recovered and reused. The use of solid fuel was
a well-recognised solution to provide the necessary extra thrust
required to get the shuttle off the ground and into space. It
was also a relatively cheap choice. The third attachment to the
shuttle for lift-off was the external liquid fuel tank consisting of
a hydrogen tank, an oxygen tank and an inter-tank which fed
the three main shuttle rocket engines with a hydrogen-oxygen
mix. The external fuel tank was jettisoned once the shuttle had
escaped the earth's atmosphere and was not recoverable.
The SRBs were prefabricated by Morton Thiokol (the
contractor who designed, manufactured and maintained
the SRBs) from seven original sections into four cylindrical
segments each with factory-sealed joints. Propellant was
poured into each segment where it solidified. The four
segments were assembled after transport to the Kennedy
Space Centre and so the remaining joints were known as ‘field' © Institution of Chemical Engineers 0260-9576/16/$ 17.63 + 0.00 Loss Prevention Bulletin 251 joints. The pressure generated at lift-off ignition created a
very small gap in the SRB joints. The O-rings were designed to
seal these gaps against the high pressure hot propellant gases
developing inside. The seal was achieved by using quarter-inch
diameter Viton rubber-like O-rings. There were tw o of these,
the primary and secondary O-rings, the secondary acting as a
back-up in case any o f the hot propellant gases generated on
ignition should erode and pass the primary.
The air temperature at the launch was the lowest recorded
for any previous shuttle lift-off. This hardened the O-rings and
adversely affected their ability to achieve an effective seal. On
the previous coldest launch in January 1985, a primary joint
was breached and eroded but the secondary seal worked
as intended. For low temperature to impact on the seated
seals fully required about three days' exposure — a relatively
rare event. On Challenger's launch in January 1986, the hot
combustion gases produced on ignition inside the SRM on the
right-hand SRB were able to erode and then 'blow by' both the
primary and secondary O-rings on the aft field joint. Cameras
captured the resulting smoke puffs at the joint showing that the
grease, joint insulation and O-ring material were being burned
and eroded by the hot propellant gases.The escaping gases
ignited and the ensuing flame started to damage the adjacent
SRB aft field joint attachment hardware and then was deflected
onto the external fuel tank. The hydrogen tank located aft
within the external fuel tank either failed or was weakened and
the liquid fuel inside subsequently leaked and started burning.
The original flames by this time had also caused the SRB lower
strut connecting it to the external fuel tank to break. The SRB
then rotated away and the external fuel tank itself failed leading
to a major release of hydrogen and a subsequent fireball (not
an explosion)4'?391. The shuttle was also by then breaking up
mechanically in the normal atmospheric turbulence associated
with the launch because the external fuel tank was a key
structural part (the 'backbone') of the whole shuttle assembly. Risk assessment should not be about maintaining or
defending the status quo — the process should not
take over from the purpose. A questioning attitude and
mind-set is required. There is always the possibility that
something new is happening which designers could not
foresee. • Organisations need sufficient checks and balances
for safety to ensure that safety is not over-ridden by
organisational structures and processes. These can include:
sufficiently independent and resourced safety oversight
and an adequate baseline for key arrangements such as
engineering and design decisions. If key decision makers
cannot see the baseline (or if the baseline is wrong) they
cannot easily spot significant deviations from it, especially
when these are incremental. • W hether a new design is developed or an old one used
or modified, there are risks to be managed. New designs
bring in more potential for 'Unknown unknowns'. In
the case o f the SRBs, the existing designs (such as the
Titan rockets) were not a straight 'read across' to the
space shuttle, and introduced misunderstandings about
redundancy. Lessons for investigators
• If the full underlying causes (organisational and some extraorganisational) are not understood and learned from, and
the organisation's structure and arrangements changed and
maintained accordingly, then accidents can and will repeat. • Just relying on the official investigation reports for major
accidents can be misleading and incomplete. Even with
good investigations and reports, what the press and others
choose to focus on is not necessarily the full picture, and
nor is a company digest or flyer. Companies need to think
for themselves and exercise judgement about the full range
of lessons learned and consider the full picture presented.
This implies that they know what good looks like for an
investigation and what the underlying organisational
factors may be. • Learning is a process and not just an outcome.
Organisations can learn something from most incidents
if they view learning in this way. Using a screening out or
defensive checklist approach will inhibit learning. • The hindsight bias can warp investigator judgements and
skew the lessons drawn from accidents like Challenger.
Investigators need to establish the full baseline against
which key decisions and actions occurred. The history
of O-ring anomalies and how to interpret them may look
obvious after the Challenger failure but was not obvious
to those involved at the time. Based on what they knew or
was available to them they acted rationally and in line with
the prevailing safety processes. • Investigations which produce stereotypes (heroes or
villains in whatever guise, such as ‘management') are good
stories but unlikely to change anything or produce real
learning. People generally behave in ways that make sense
to them at the time. The first job in an investigation is to
understand things from their viewpoint. • The full impact o f human factor issues on issues such as
critical communication arrangements (like those affecting The lessons are listed here but the detail which underpins the
organisational causes is discussed further below. Lessons for the process industries • • External pressures on organisations, such as the production
pressures on NASA, can establish ways of doing things in
the organisational culture, structure and processes which
incrementally align reality with what the organisation
wishes for — its goals. Managing these pressures and
being mindful o f their potential distorting effects is difficult
and requires vigilance overtim e and a proper sense of
chronic unease.
To prevent such pressures distorting an organisation's
arrangements it is important to establish a clear baseline
or rationale for e.g. engineering and technical decisions,
so that any incremental movement away from this can be
spotted.
Incremental changes can lead to the normalisation process
so that each individual anomaly is explained or justified but
the full picture is not seen until after a significant adverse
event. Each event is rationalised and validated against e.g.
risk assessment processes but not evaluated ("Is this really
doing what we want? Against what baseline?") © Institution of Chemical Engineers 0260-9576/16/$ 17.63 + 0.00 | 27 • Lessons learned • October 201 6 IChemE 28 I Loss Prevention Bulletin 251 October 2016 the final teleconferences) and fatigue can be missed if
investigators either do not prioritise human factors or do
not value them sufficiently. These factors can be major
contributors to poor decision-making. The organisational causes the demands of competition over a long period conspired
to establish a culture of production; structural secrecy
prevented key information from flowing effectively through the
organisation. All o f these elements affected decision-making
including the final fatal launch decision.
• The underlying causes o f the disaster are complex and
organisational. These are discussed below. The normalisation of deviance helps explain: Launch delays
The launch was put back five times from the original 22 January
date before the disastrous launch on 28 January. The shuttle
before this was delayed seven times over 25 days before
finally launching on 12 January. This affected the subsequent
Challenger launch. The last two delays were due to weather
and a fault respectively. Delays were a major concern for NASA
because the launch schedule had become central in their
competition for scarce funding. Production pressures were at
their peak before the Challenger launch. Vaughan's very thorough investigation provides a more
nuanced view, and ultimately a more convincing one.
Her conclusions also make more sense in the light of the
subsequent Columbia disaster. Rather than the simplistic
popular account derived from the Rogers Commission and
the Committee's reports, she argues that “No extraordinary
actions by individuals explain what happened: no intentional
managerial wrongdoing, no rule violations, no conspiracy.
The cause o f the disaster was a mistake embedded in the
banality o f organisational life and facilitated by an environment
o f scarcity and competition, elite bargaining, uncertain
technology, incrementalism, patterns o f information,
routinisation, organisational and interorganisational structures,
and a complex culture." ibld[pxxxvi! The normalisation o f deviance
Vaughan divides this into three elements: the production of
culture; the culture o f production; and structural secrecy. The
gradual and incremental acceptance of the O-ring anomalies
was the 'produced culture'; the scarcity of resourcing and IChemE - why the evidence of risk in the SRBs was originally
accepted in the selected design; - why it was assessed as safe when the shuttle was
declared operational in 1984; - why it continued to be assessed as safe; and
- why the final launch took place despite some key
engineers having and expressing misgivings. More risk was accepted incrementally over a long period.
The risk was seen as acceptable (and accepted) and
anomalies were explained for each case after launch and
recovery. Each successful launch reinforced this. Those
involved in decisions on the SRB and the launch acted and
made decisions that made sense to them (was normal)
at each relevant time. Morton Thiokol, Marshall (The
Marshall Space Flight Center (MSFC), NASA's rocketry
and spacecraft propulsion research centre, who had
technical oversight o f Morton) and others followed the
NASA rules, arrangements and structures for the twin key
safety management system procedures — the Acceptable
Risk Process (ARP) and the Flight Readiness Review (FRR).
There were compounding errors e.g. flawed base data
on O-ring temperature limits, no effective demonstration
of the correlation of temperature data against O-ring
previous failures and in communications such as on the
understanding of O-ring redundancy between Marshall
and Morton and the way that the O-ring risk was
categorised. The O-rings and the launch decision
The problem with the O-rings was documented from
1977, long before the first shuttle flight in 1981. Evidence
accumulated from 1977 to 1985. D uringafinal teleconference
running up to around midnight of the day before the launch,
engineers from Morton Thiokol, the SRB manufacturer, and
NASA managers debated whether the launch should go ahead
because o f the predicted very low temperatures expected
and the likely effect on the O-rings. As the Commission,
the Committee, the press and others investigated “ ...they
created a documentary record that became the basis for
the historically accepted explanation o f this historic event;
production pressures and managerial wrongdoing. " 4[p“ xivl The
Rogers Commission "...found that NASA middle managers
had routinely violated safety rules requiring information about
the O-ring problems be passed up the launch decision chain
to top technical decision makers..." ibid[pxxxivl The top-down
pressures on NASA included competition, scarce resources
and production pressures. These led finally to a flawed and
deliberate launch decision. Accepting more risk • Redundancy misunderstood
The baseline for the redundancy misunderstanding
was that the SRB seal design was seen as a significant
improvement over previous designs such as the earlier
US Titan rocket which only had a primary seal. Failure of a
primary was not seen as so significant when a secondary
was in place to protect against this. The problem arises
through dependency such as the cold temperature issue.
In the process sector nowadays, the triggering of any safety
or protectives system - such as a pressure relief valve - is a
safety event in itself. In the latter case, maintenance could
be a common cause factor affecting both operational and
safety valves.
NASA processes, procedures and structures incrementally
accommodated the O-ring anomalies to align with the
overall goal — of timely and repeated successful shuttle
launches and recoveries. These weak signals were seen
but were expected and on a case-by-case basis accepted
- engineers did risk assessments and communicated the
results to managers. The latter were also mostly engineers
but with different goals and priorities set by the culture of
production. Hindsight does not show so clearly that the
context for tuning in to weak signals was against a much
wider range of anomalies detected after each launch. © Institution of Chemical Engineers
0 2 6 0 -9 5 7 6 /1 6 / $ l 7.63 + 0.00 Loss Prevention Bulletin 251 October 2 0 1 6 | 29 Everett Historical / 5hutterstock.com • • Structural secrecy NASA generally expected these and was vigilant for them. A large organisation generating huge amounts of
information, specialised engineering roles and language,
the acceptance o f risk on a case-by-case basis against
established (but flawed) technical criteria and in accord
with established risk processes — all of these conspired to
prevent key technical information from flowing through the
management chain. No individual was hiding anything but
the organisation's own structure was acting as a barrier. There is also the well-rooted view that the transition from
an experimental space vehicle to an operational one was
somehow also deviant. In terms of the overall space shuttle
programme, this was simply an in-built project milestone and
the criteria for passing this were met. Hindsight suggests
this was a flawed decision and that such an inherently risk
enterprise could never be truly seen as operational. Therefore,
the original programme could perhaps be criticised but in
that context, the decision was rational. In its own terms the
mission was a success story. NASA have also been accused
o f being too 'can do' but if that is reworded as 'being good
at solving problems' then it doesn't sound so damming, and
problem-solving is what NASA engineers, managers and
others were very good at. Culture and control were also
eroded by the need to be business-like and put work out
to contract. However, the latter was not 'wrong' in itself.
Provided that safety, quality and sufficient technical oversight
were maintained, this can and did work. The larger problem
was that of the ensuing organisational and project complexity
— complex organisations can produce surprises, and tightlycoupled systems such as those involved in space flight are
particularly prone to this. Oversight
The final barrier should have been the safety oversight
but NASA's safety programme was famously described as
'silent'. In fact, this was drastically reduced and especially
after the shuttle programme entered its operational phase.
Internal regulation was also subject to the effects of
interdependence, i.e. being part of the same organisation
the internal bodies were regulating. The external regulator
was even smaller and had a narrow scope. These bodies
had in truth little chance o f findingthe O-ring issue and not
least because it was seen and maintained as an acceptable
risk. Design and culture
Design is an inherently uncertain process, the more so in
areas o f risky technology such as innovative space missions.
However, designers in any industry make trade-offs all the time
and also are conservative — adoptingthe solid fuel option for
the SRBs was conservative at the time because it was a better
tried and tested approach. The fact that there were known
risks associated with this was in that sense good because they
were ‘Known knowns' and could in principle be managed.
New designs would potentially have 'Unknown unknowns'. For
the SRBs and the shuttle as whole such ‘Unknown unknowns'
were bound to emerge in such a risky area of technology but © Institution of Chemical Engineers 0260-9576/16/$ 17.63 + 0.00 Cost cutting and mission safety
One widely-held view of key contributing causes to the
accident were NASA cost / safety trade-offs, prompted
by budget cuts and other pressures on the organisation.
These decisions are held to have adversely affected safety
programmes, hardware testing and technical design. Vaughan
found it difficult to find concrete evidence that the first two
affected mission safety but she investigated the extensive
paper trail for the third. The example she chose was the
original award of the SRB contract to Morton Thiokol and the
consequent decision to not pursue a proposed safety feature, IChemE 30 | Loss Prevention Bulletin 251 October 2016 escape rockets. Her conclusion is that despite their apparent
salience in hindsight "...these were not the c o s t/s a fe ty tradeÂ
offs they appeared to be after the tragedy. "4tp4231
The SRBs were a cheaper option. Rockets using solid fuel
have fewer moving parts and so are cheaper to use than
liquid fuelled ones even though solid fuel is more expensive.
However, solid fuel rockets could not be shut down after
ignition which had major implications for mission safety.
Previous rockets had escape rockets to allow crews to escape
during the dangerous first two minutes of SRB-assisted ascent.
Orbiter was too large for this option without significantly
reducing its payload so the proposed escape rockets were
scrapped.
On the face of it, this looked like a pure cost or business
decision that compromised safety but in fact NASA had done
an extensive assessment o f the option and concluded that
escape rockets were simply not viable. Any trigger event that
could provide warning that escape was necessary would in
effect be the event itself or closely co-incident with it. There
was also no practical means identified which would both
cover all scenarios during the first two-minute ascent and also
significantly increase crew survivability.4[p424] NASA concluded
that instead "...that first stage ascent must be assured. 'bid In
other words they just needed to get this stage right — for
example, through conservative design and other tried and
tested means. All design involves trad e-offs o f course, but this
example just became more visible than most after the disaster.
The same argument is made in the choice o f a segmented
over a seamless design for the SRB. Straightforwardly, if a
design with no joints is selected, then joints cannot fail — and
a joint failed so. But NASA had had the four contract bids and
proposals assessed by a source Evaluation Board (SEB) against
four 'mission suitability' criteria. There were three segmented
designs and one seamless / monolithic one proposed by
Lockheed.
However, Vaughan points out that segmented SRBs
were more widely used at the time so the bid ratio looks
understandable in this 'social context'.4[p4301 Her closer
examination of the SEB assessment also shows that the
Lockheed seamless design was rejected not just because it
was more expensive than Thiokol's but because the design
was inadequate in ways that were significant and not easily
correctable. The Thiokol design had issues but these were
assessed as ‘readily correctable' and the segmented design
itself as 'not sacrificing performance quality'. This was
confirmed by a subsequent further Governmental Accounting
Office (GAO) review after a Lockheed protest that the costs
were miscalculated. The GAO agreed a reduction in the
original $122 million cost estimates for Lockheed (but did
not find any new issue with the Thiokol design) but this was
still $56 million more than Thiokol's. The original SEB bid
assessment was repeated and found still valid.
Vaughan acknowledges that her analysis o f the cost / safety
trade-offs is ne...
Â
Attachments:
-----------