Tuesday, May 18, 2010

The Human Face of “Mission Critical”

One of the interesting aspects of testing is that we have different ranges and levels of testing that need to be performed. On one end, there is the somewhat trivial level of testing a vanity web site to make sure that information is displayed correctly. If it is wrong, it’s a nuisance, but most people’s lives will not be greatly impacted. On the other side, there is “mission critical” applications, those that, if something goes wrong, can have devastating impact, i.e. loss of life.

This point was brought home to me last night as I was discussing one of my assignments I’m working on fir the Black Box Software Testing: Foundations class I’m currently working on through the Association for Software Testing. As we have discussed those applications that are mission critical, I’ve realized that, perhaps outside of testing routers for Cisco, most of the applications I have worked with would not have a direct effect on people’s lives if they were to stop working (outside of the annoyance factor, virtualization software, capacitance touch devices, video games and immigration software are not genuinely life or death issues). My father, on the other hand, actually did work on application programs that were truly mission critical, in the sense that they had to do with the mixture of drugs and medications used in a neonatal intensive care nursery.

My father lived a dual life in the medical field. On one end, he was a pediatric physician (now somewhat retired, but he volunteers with a free clinic a few nights a week still). At the same time, he was and still is an avid programmer (he dates back to the days of the warehouse sized supercomputers and paper punch card process loads). During his years as an active physician, he always wanted to help bring the power of computing and calculation to the hospital floor, and he worked hard to develop programs that did this. Interestingly enough, some of the programs he wrote in the late 70s and early 80s are still being used today.

While I was talking about testing needs and the idea of what constitutes “good enough” testing, he shared a story with me about a critical bug that was found nine years after a program had been coded, tested and put into regular use. This program took readings from monitoring equipment used in the pediatric Intensive Care Unit and, by entering data about the patient, their vitals, and monitoring other conditions status details like heart rate, pulse and breathing, the doctor or nurse would enter in values and the program would calculate a mixture of Intravenous Fluid that would combine necessary nutrients and medications to be mixed for that patient (most of which were premature babies or other infants with early health issues). My dad said they discovered that, due to a variable that in a very rare instance could get overwritten due to a loop in the code, that a dangerous amount of potassium had been mixed and the effect could have resulted in the infant’s death had it not been caught in time.

This was a discovery made after nine years of active use, in which the system worked (as it would have seemed at the time) flawlessly for many thousand patients over nearly a decade. My dad went in and fixed the code, the nursing staff ran tests, the group reviewed the work and determined they could put it back into production, and that program is still in use today, 19 years after that discovery.

We have to realize that we can test for many days, weeks or months, and we can be aggressive and focused, but we cannot test everything and we cannot guarantee that all issues are addressed. In many cases, good enough is OK, but there are some instances where good enough just won’t do. My dad’s story of such an incident was sobering, and it gave me a clear picture of the human costs of a difficult to find error. I may never get the chance to test something that would be this important to the lives of other people, but if I do, I will remember this story from my dad, and the definite human price that can be paid when even seemingly simple programs that have been used for years ultimately show a problem.

No comments: