Past Entries
Part 1: (The Basics of) Rocketry and Spaceflight
Introduction
We've been working with rockets for a century now, but we still have yet to make them safe. Rockets are fickle things and it truly does not take all that much to cause them to fail. Although over the years we have gotten better safety records on rockets, make no mistake: a couple of mistakes, a small bout of carelessness, and we're right back to the era of exploding rockets on every other launch. It's important to understand why accidents happened in the past to avoid them in the future.
When you're firing off an unmanned rocket, one concern matters above all: to finish the mission. Anything else - cost overruns, flight delays, reuse of rocket components, developing new technologies, and so on - comes second. Full stop. In the case of manned missions, only one concern is more important: the survival of the crew. Some more accounting-minded individuals believe that astronauts should be factored in as just another cost - and as an engineer, I can certainly respect and appreciate that viewpoint. The issue with that perspective is that it almost always underestimates the true cost of losing personnel. The first factor is that astronauts are not just average individuals, but rather highly trained, difficult to replace mission specialists, and the second is the public perception factor - losing astronauts really drives a many-billion-dollar stake into the heart of public perception of any space program. In either case, losing rockets and astronauts is a painful, expensive endeavor that is never cost-effective to accept in the pursuit of one's objectives. The many failures we will discuss here will show that to be so.
But let's face it: we live in the real world, one of deadlines, schedules, budgets, national ambitions, and expectations. We cannot simply set aside any amount of money and any amount of time to allow rockets to develop under the safest possible conditions. For one, there are missions valuable enough that time and money are secondary - national security payloads that would keep soldiers alive, maintenance missions on astronomically expensive scientific equipment that would be lost if lives and rockets were not to be risked to save it, objectives of national pride, and many others. We cannot be so unrealistic as to believe that we can always put conditions of absolute safety first, because that's not the real world. But nevertheless, we have to do what we can to ensure that in this real world, we still are safe enough so that we never lose a mission and never lose an astronaut. It is impossible to reach 100% and stay there forever, but no goal less than that is acceptable. An aircraft that aims for and achieves only a 95% success rate would be a complete and utter failure.
There are many means by which a mission can end in failure, and not all of them are as spectacular and visible as that of a rocket that explodes in flight. An upper stage which malfunctions well beyond what is seen by onlookers, dropping the satellite into an unworkable orbit, is just as much of a failure as the rocket that exploded in midair. A capsule carrying people that burns up on reentry is as much a failure as one where the capsule is stranded in the middle of space and can't be saved before its crew runs out of oxygen. And so on. Not every failure looks like an explosion, and not every failure is one that will cause the mission to fail, but all failures need to be prevented. We will look at a rather wide range of failures in our discussion today.
General Process: Failure Investigation
Investigating rocket failures is not unlike a police investigation - albeit, with the purpose of finding a cause of failure rather than evidence of a crime. You gather all the evidence that is available to you, you enlist all the experts that are necessary, and bit by bit, piece by piece, put together the causes, both primary and secondary, that allowed the failure to occur. Broadly speaking, the questions that such an investigation intends to answer are:
1. What was the primary cause of the failure?
2. What other factors contributed to the failure?
3. How could that failure have been prevented?
4. What should we do in the future to ensure that this does not happen again?
5. If there were people on board, could we have saved them? If so, how?
The data that is analyzed varies from mission to mission, but is generally quite wide-ranging. The telemetry (measurements and data) transmitted by the rocket electronics, cameras on the rocket and on the ground (both official and amateur footage), other devices tracking the rocket (our GPS and ground-based missile detection radars are not for nothing, you know), written communications before, after, and during the flight, debris that can be recovered from the flight, statements from workers involved in building and operating the rocket, and so on - all of these are gathered and analyzed by experts to make a report on all of the causes of the failure, both primary and secondary. Depending on the organization launching the rocket and the organization owning the payload, this may or may not all be disclosed to the public. In general, government agencies have to disclose more than private companies, and national security missions tend to be a bit tight-lipped about failures. But failing to be so thorough in your investigation, whether or not the public has to know of all the causes, is the biggest failure of all, and can often be fatal.
The standard procedure after a failure is to ground your craft until after the root causes of the failure are determined. Often the problem is a recurring one and launching a second craft can lead you to failing in the exact same way a second time. This will absolutely, definitely lead to long, unpleasant delays in launch and almost certainly cost you some flights that you would have had. It's not pleasant to go through, but it is absolutely a necessity. This delay can be a few months to a few years depending on the failure and the organization, but until you can have confidence that the craft is ready to fly, from a safety perspective it is reckless to launch again.
Spacecraft
I'm going to focus on a few launch vehicles - one Russian, the rest American. There are many failures from many different organizations, and it simply isn't feasible to cover them all. But just four rocket families - the Space Shuttle, Atlas, Falcon 9, and Proton - will help to shed light on many of the factors that allow rockets to fail - sometimes fully, sometimes partially. Here, I will briefly touch upon the craft, their design, and their history.
Space Shuttle
This was NASA's successor to the Apollo Moon program, a very complicated craft with a broad set of objectives. In the aftermath of Apollo, the US became quite wary of spending money on spaceflight - Apollo was already highly unpopular before the Moon landing for its astronomical cost, and afterwards the price of launches simply had to come down. It was a very ambitious design - a spaceplane on the back of a giant pair of boosters that promised to be a reusable, cost-effective, and safe means by which to accomplish many wide-reaching goals, being a transporter of both people and cargo.
Ultimately, it did not make space cheap or safe. Nevertheless, for just over 30 years, it was the workhorse of NASA's space capabilities and is one of the most consequential craft ever built. It pains me to talk about its failures, because I really do like the craft and its design - I just can't say that they are practical. Its launches are among the most beautiful for any rocket I have ever seen.
Out of 135 Shuttles launched, only two missions ended in failure - one a failure in launch, another a failure of reentry. For all its faults, the Shuttle has a 98.5% success rate - not one to sneer at in the rocketry business. But on each disaster, seven people died and the Shuttle was grounded for about three years. While for rocketry a 98.5% success rate is fantastic, it is by no means safe; as I mentioned before, an aircraft with that success rate would never be used for civilian purposes. In addition, only six Shuttles were ever built, only five ever flew - so 40% of the Shuttle fleet was lost to accidents. Despite having a rather impressive success record, losing two Shuttles was quite painful to the program's health.
We will be talking about both failures. The first was the loss of the Challenger, which exploded mid-flight on the 25th launch of the Shuttle on January 28, 1986. The second was the loss of the Columbia, the first Shuttle ever flown, on the 133th flight of a Shuttle (and the 28th flight of Columbia) upon reentry on February 1st, 2003, almost exactly 17 years after the first accident. These missions have the benefit of highly public disclosure of the disaster reporting, performed largely with the cooperation of Congress. The Challenger was investigated by the Rogers Commission, producing a document that was highly critical of an insufficiently strict safety culture in NASA that allowed the accident to occur. The Columbia disaster was investigated by the Columbia Accident Investigation Board, which produced one of the most consequential reports ever produced on rocket safety and the conditions that lead to rocket failure. By virtue of a wealth of information, these two will be focused on in the most depth.
Atlas
The Atlas family of rockets is one of the oldest American rockets ever produced. Created first as a liquid-fueled ICBM by General Dynamics, it quickly proved to be better suited for spaceflight than for maintaining a nuclear arsenal. Over the course of the sixty years of its existence - from its birth as an ICBM in 1957 to present day - it has gone through many iterations and many owners. The most modern version is the Atlas V, operated by United Launch Alliance, a rocket operating with the Russian RD-180 and one of the best success records that any rocket could ever hope to achieve.
We will be focusing on two craft: the Atlas-Centaur (AC) and the Atlas V, two rather modern iterations of the Atlas, and two failures each. The Atlas-Centaur failures caused a mission failure; the Atlas V failures were near-misses that shed a lot of light on how serious rocket safety can be. By virtue of being privately owned, neither rocket has nearly as much publicly available information about their failure - but that is part of what is of interest.
Falcon 9
The workhorse of SpaceX, the American darling of the New Space industry. A rocket that promises to be cheap, reusable, and capable of unlocking economies of scale that will cause space to become a highly lucrative economy. A rocket that, by all means, moves quickly in its development, in pursuit of its founder's lofty goals of attempting to colonize Mars.
Out of 37 launches, 34 were fully successful. Out of the three that were not, one was a partial failure (a secondary payload was deemed to be too much of a risk to the ISS to be launched), one was destroyed mid-flight, and one exploded during fueling while on the launch pad. The partial failure is not particularly interesting; the other two will be discussed here.
Proton
The Proton line of rockets, beginning in 1965 as a massive ICBM, is a family of heavy rockets operated by the USSR and later Russia. Though not known for perfect reliability, this rocket has been a workhorse of the Soviet/Russian space fleet, with over 400 launches over its 52-year history.
Overall, the family of rockets has an 88% launch success rate. Many of these failures were, of course, in the early years of Proton, in which the design was not yet well-developed. Yet even modern Proton-M rocket, operated since 2001, has had its share of failures, with roughly one failure in most years and about a 90% success rate over its 99 launches. Nevertheless, it has been a very valuable and important rocket, helping Russia's space program survive the end of the Soviet Union (and a sudden, total collapse of government funding) through providing launch services for commercial and foreign government customers.
The Proton's failures are many, with some commonality between each specific failure but no single technical cause. It would not be particularly interesting to discuss each of these failures individually - but rather, we will be discussing the circumstances that led to such a safety record in more recent years. Just a few weeks ago, the Proton-M returned to flight after a year out of service - which revealed some of the many causes that had led to such frequent rocket failures. This launch proceeded without a hitch - which I will link below.
Shuttle Disasters
On January 28, 1986, the Space Shuttle Challenger was lost mid-flight in an explosion that destroyed the craft and killed the entire crew. The Rogers Commission Report was created to explore the causes of the accident shortly after the disaster itself occurred. One of the commission members was esteemed physicist Richard Feynman, who was notorious for being a justified pain-in-the-ass who helped get to the bottom of the root cause and many of the secondary reasons that led to the accident.
The direct cause of the failure was determined to be the failure of the O-rings of the right solid rocket booster, which caused hot gas to leak out onto the external tank (the fuel tank on which the shuttle Orbiter sits), causing it to explode the entire rocket. The O-ring functioned poorly under cold temperature and failed on the rather cold day of the launch. With the design of the Shuttle as it was, there was no feasible way by which the crew could have survived. As with aircraft, part of the reason that in-flight catastrophes generally result in the death of all crew members is because there is often nothing you can do to save people when you are in midair.
The secondary causes were many, and rather damning. One of the causes was of course the design of the Space Shuttle, which proved to be quite a bit more complex than was justifiable, making accidents like this more likely. But even worse was the culture of NASA, which ignored many warnings that the O-ring structure was dangerous from engineers at the company that built them. The pressures of a desire to have a large launch rate led to a deep carelessness that allowed such a detail to slip through. It didn't help that that specific launch had already been delayed multiple times before - leaving NASA in quite a hurry to launch this one. That hurry led to deadly carelessness.
Almost exactly 17 years later, on February 1, 2003, the Space Shuttle Columbia disintegrated on reentry, scattering debris for thousands of miles, destroying the craft, and killing everyone on board. The contingencies put in place after Challenger led NASA to start possibly the biggest, most ambitious failure investigation ever conducted on a mission failure, creating the seminal work in rocket safety known as the Columbia Accident Investigation Board report.
Some say that NASA did not learn its lesson after Challenger - I cannot say that I share their view. The Columbia disaster was a far different beast, with a rather unintuitive cause of failure and a 17-year record of safety to fall back on. The CAIB report that was created in its wake helped lay bare many of the rather minute and in-depth factors that could ultimately lead to such a disaster. If you ever happen to be building your own rocket, the entire report with its thousands of pages of appendices is a must-read.
In order to prevent it from overheating, the external tank is covered in a spray-on foam. That foam has a tendency to flake off the tank in flight. What happened was that a piece of that foam struck the left wing of the orbiter at a relative speed of 500 mph - causing a slight, yet ultimately significant, structural failure. At the end of its 15-day mission, the Columbia orbiter burnt up on reentry as a result of this failure.
How this failure escaped notice is not a particularly simple task to explain. The ground crew, using the camera footage recorded by the Shuttle, did notice that foam hit the left wing - but noted that that foam should not have damaged the craft. Few at NASA believed that to be a significant risk - and the calls to investigate that issue were rather muted at best. That was an ultimately disastrous mishap.
The problem did, however, go somewhat deeper than that. The properties of that spray-on foam were pretty poorly understood, and the modelling that predicted that the foam would not be so damaging was inaccurate. So despite that foam strike (whose effect would not have been particularly visible), the crew was given the go-ahead to land the Shuttle.
It is often said that the Shuttle suffered from being too complex - I hope this sheds at least some light as to why that is considered to be the case. Many of its parts were poorly understood, insufficiently developed or tested, or simply ill-suited overall for a spacecraft. This, along with an unjustified confidence in the safety of their rockets, led NASA to these two unfortunate failures. The foam-caused failure specifically should be a lesson, that no issue should be considered too small in rocket design, that any possible mode of failure should be investigated lest there be a possibility that it could cause a failure to occur. The failure of rockets is almost always caused by something that looks small and almost trivial - but no such detail should be left to allow the craft to launch and end in disaster. There are many years of experience upon which one can learn how such failures are to be prevented.
Atlas
This entry is based in large part on this article, a review of two of the last launches of Atlas by General Dynamics. Read the full thing, and the comments, if you are so inclined - it will go into more detail than I will.
The two flights of interest are AC-070 and AC-071, two flights of the Atlas-Centaur rocket. General Dynamics was working in the sphere of commercial launch, an often brutal market in which cost is often king. In the rush to make their prices more competitive, a series of mistakes undermined the business.
These two flights are interesting for one particular reason: they have the same cause of failure. The RL-10 engine which powers them is a particularly interesting upper stage engine, which conserves fuel by flowing hydrogen through the pumps that power the engines, rather than expending fuel to power them. Only problem was, without expending fuel at the start it was rather hard to get the engines to start running. But until now, it hadn't been a problem in the long history of a well-regarded American engine.
On AC-070, one of the two RL-10s failed to start, leaving the cargo in a useless orbit. General Dynamics rushed to try to find the issue that would have caused that engine to fail to start. They quickly pinned it down: the brushes used to clean the engines before flight flaked slightly, adding some debris into the engines that made it hard for them to start. So they quickly patched this issue by baking out those particles before launch, then quickly launched out AC-071... which failed in the exact same way. Now things became quite troublesome.
This time, General Dynamics went into far more depth to try to explore every possible cause of the failure, as they should have done the first time through. Eventually, they found the real issue: a leaky valve in the engine which, when combined with the colder-than-usual liquid hydrogen fuel (a recent change made as a result of a need to be more competitive by improving engine performance), constricted the flow of fuel through the engine and led to a failure to start.
The fault here was two-fold. The first was of course that they did not properly consider how their desperation to remain competitive through cutting corners would endanger their craft, and the second was their failure to do a proper failure investigation. The comments of the above-linked article speak from the experience of one employee who noted how one VP within the company pushed the foreign object debris theory in pursuit of a monetary bonus rather than give the care that the issue deserved. The result was a rapid, and unfortunate, end to the credibility of General Dynamics as a launch company, despite having a great rocket that under different management went on to do more great things.
The successor of the Atlas-Centaur, the Atlas V, was operated by Martin Marietta, later Lockheed Martin, and later United Launch Alliance. In 71 launches, 70 were perfect successes, and one was a partial failure that the customer declared a success. This might as well be a 100 percent success record over its history - but two near-misses are a good example of how even such a successful rocket needs to be mindful of the possibility of failure.
The first event was the partial failure on June 15, 2007 - which launched two National Reconnaissance Office (NRO) NROL-30 satellites, into a slightly lower-than-desired orbit, as a result of a leaky valve on the second stage which leaked fuel and caused the second stage to shut off 4 seconds too soon. The second was the March 16, 2016 launch of the OA-6 ISS resupply mission, in which the first stage shut down four seconds early as a result of an improper mixing ratio within the RD-180 engine and the upper stage had to burn for an extra minute to make up the loss of thrust. The NRO satellites were stocked with enough extra propulsion to easily make up the difference, and there was enough margin on the upper stage of the Atlas (the Centaur) to make the mission a success. But both could have, under different circumstances, led to a failure, and led to a brief grounding of the Atlas V rocket.
Both missions were pretty much successful, but these missions do give some important lessons that need to be learned. The first is to mind your margins; either of these missions could have ended in failure if they had allowed a slight bit less margin of failure on their craft. The second is that issues can come up years or even decades after a part was developed; the RD-180 mixing valve failure only came up ten years after the engine was first used. It is important to take these almost-failures seriously and to count your blessings; not every failure will cause you to fail the mission, but if you don't take your near-misses as seriously as your actual mission failures, you will have some unfortunate results in the end.
Falcon 9
Two failures are considered here: the 2015 CRS-7 mission, which blew up mid-flight, and the Amos-6 mission, which blew up on the pad during a preflight firing test of the rocket. As with most commercial customers, SpaceX is pretty tight-lipped about the causes of their rocket failures. But it's hard not to notice that their tendency to move quickly probably plays a part.
The CRS-7 mission was the third in a series of attempts to resupply the ISS that ended in failure. The rocket exploded mid-flight, with SpaceX diagnosing the cause as a support strut holding the helium tank in the second stage that collapsed below its intended maximum strength.
As it was a NASA cargo, NASA played their part in this investigation - and took their frustrations out on SpaceX in private for the failure. They paid SpaceX 80% of the contract, as was standard procedure, and quietly settled with them for the loss of the cargo. While it's hard to know what happened behind the scenes, SpaceX did announce that they changed suppliers for the strut, and that issue didn't reoccur on the next launch, so it was likely properly fixed.
The second failure was the loss of the Amos-6 mission while performing a static fire test (firing the rocket while holding it down to simulate the actual launch) in an explosion that destroyed the craft, the pad, and the payload. This was a far more damning loss than the first for multiple reasons, among them that losing a rocket on the pad was a first in US launches in decades.
It's not exactly clear what caused this failure. The leading theory is another failure in the helium tank of the upper stage, caused by liquid oxygen freezing on contact with the colder helium and destroying part of the helium tank. Yet this one was plagued by endless conspiracy theories including UFOs and ULA snipers, many parroted by SpaceX CEO Elon Musk himself. It is still not fully clear what caused this failure - which should give some pause to anyone who wishes to use their rockets. The reason very well could have been an unacceptable operator error.
Although SpaceX is a well-regarded innovator in the rocketry field, whose lofty ambitions include rapid reusability of rockets, Mars colonization, and order-of-magnitude reductions in launch costs, they are in a more precarious situation than their rather impressive fanbase would ever be willing to admit. Their margins are so thin that it is likely that the company is losing money, in hopes of recouping their costs if and when their lofty ambitions come to fruition. But the direct costs of losing these missions, plus the years' worth of delays that come with pulling their only rocket off of launch duty (which customers tend to be very unhappy about), are a danger to the company's survival. In moving quickly to innovate, it is important not to lose track of the most important factor of all: finishing the mission.
Proton
This Russian workhorse has had quite a few failures. The most recent iteration, the Proton-M, has had one about every 1.5 years. Yet despite its lower-than-average record, its powerful yet relatively cheap capabilities make this an important Russian workhorse. We will only briefly cover a few actual failures themselves, looking more so at the political and economic history of this rocket that led it to develop as it did.
The collapse of the Soviet Union was not kind to government-based programs, and rocketry was no exception. Those that survived the end of the USSR were generally those that retooled as services for foreign buyers, and Proton happened to be a rocket that found a niche in the commercial satellite launch business. Although it did not match the safety record of more reliable launchers, it could beat them easily on cost - and its ability to launch heavy craft was nothing to sneer at. For years it brought in billions' worth of rocket business to an otherwise dangerously troubled Russia. This was an impressive performance that saved Russian space from the depths of hell. But necessity and responsibility often forced Proton to sacrifice reliability in favor of launch volume.
As the Russian economy recovered, there was another important mission that was essentially left to Russia: maintaining the International Space Station. Without the Space Shuttle, the US was hardly in a position to launch all that many of the missions it wanted itself, and was forced to rely on contractors - both springing upstarts within the US, and even more so on the troubled but well-developed Russian space industry. So for years, Russia launched many rockets in volume, losing some but always keeping a large launch record. Until 2016, when Russia took some of its rockets off the market for maintenance, Russia generally launched somewhere between one-third and one-half of all rockets worldwide. In 2016 that went down to around one-fourth, as Russia essentially cut its launches by half for implementing some much-needed reforms.
A need for reform was seen best after a particularly unfortunate failure in 2013, in which a sensor installed upside-down caused a Proton rocket to try to correct its course - by turning upside down, forcing the operators to command the rocket to self-destruct and destroying a valuable GLONASS (Russian GPS) cargo. This led to the start of a reform of the Russian space industry over the next few years.
In June 2016, the Proton launch of Intelsat 31 suffered an anomaly in which the second stage underperformed - but the Briz-M upper stage made up the difference to get the rocket into the correct orbit (one of the few times the Briz-M saved the rocket rather than being the problem itself). One issue led to another and soon the Proton investigation revealed a deep and troubling trend within the Russian space industry: falsifications of certification for some parts, engines made with substandard parts to save money, and many other similar issues. A short investigation soon became a full-blown catalyst for reform.
The truth of the matter, however, was that the issues discovered had existed for decades. That they were finally addressed had less to do with the lack of knowledge of the issues than with the burden that the Russian space program had to bear. In truth, this is one of the few cases where there was, in fact, something more important than ensuring mission success - ensuring the survival of the multi-hundred-billion-dollar science project known as the ISS. The common man will never know just how close the ISS was to going out of commission within the past few decades - saved only by the Proton and Soyuz workhorses of Russia. As the commercial space industries started to recover, Russia was able to dedicate more effort to internal improvements - as it is doing now.
However, these failures do not come without their own fair share of cynicism. Despite the necessity of performing a difficult task, a factor often underappreciated by even the Russian people themselves, the subpar record of the Proton has had its toll on both the reputation of Russian space ventures and on public confidence that Russian space will once again see its glory days. And yet, it is finally starting to take care of the ugly work that needs to be done to reach that goal - removing bloat in the overly large space industry, embracing modern manufacturing, weeding out corruption, and hammering out the issues within their crafts. The Proton M returned to flight just recently and launched its payload (Echostar 21) without a hitch - only time will tell whether the reforms in progress will be able to fix the troubles within Russian space that have been allowed to fester and grow over the past three decades.
Conclusion
As the alarming length of this piece might indicate, the process of discovering, diagnosing, and fixing rocket failures is a gargantuan one. And yet even this behemoth only barely scratches the surface of all the issues that are ingrained into the process of dealing with rocket failures. The political, economic, and technical aspects of this field are many - and only years of experience can shed light on the true scope of work that is needed to be done in order to ensure that rockets launch successfully. This is where the real difficulty of rocketry lies, and where the mythos of "rocket science" should belong. If all this were to be summarized in a very short message, it would be this: mind your circumstances, but remember to always be vigilant.