Saturday, August 27, 2011

How Fragile is Your System?

My kids had a somewhat sad, but I think important, lesson taught to them this week. We have been involved in a process since March with our main fish tank in our house; we have a 65 gallon tank that has a colony of convict cichlids (Archocentrus Nigrofasciatus). In March, a large clutch of babies was spawned, and we've been actively working to raise them. The tank is well maintained, it has a lot of filtration, and it has a large capacity air pump feeding air stones (read: a lot of bubbles in the tank background).

Recently the kids were playing upstairs, and as they are oft to do, roughhousing and jumping around. Unbeknownst to them, they knocked the air line from the air pump to the air stones. Earlier in the week, they helped me clean out the tank and do a water change, too. They didn't realize it, but these events conspired to make a fatal flaw in our system apparent.

I came home from work Thursday night and the kids said "hey dad, you should go up and feed the fish, they look really hungry!" I thought this was an odd statement, so I went up and inspected the tank. What I saw was all of the fish hovering at the surface of the water, some on their sides, their gills pumping rapidly. As I looked and saw the tank, I heard the air pump running louder than normal. I looked down and saw that the air hose was no longer attached. I quickly reattached it, which allowed the bubbles (and air) to flow back into the tank. However, this was only part of the problem. I also noticed a tell tale sign of what also wasn't normal; there was no splashing water from the filter return. The water was just running under the surface because the hose had slipped down below the water line. Add to that the fact that we also have about 70 juvenile fish, each now averaging an inch in length or a little more, along with the older denizens in the tank. Individually, any of these would not be a problem. Taken together, though, what resulted was a massive failure to the system.

We had to act quickly, so I hooked up the water siphon to the bathroom sink and drained out about 20 gallons of the tank water. this served two purposes. first, it allowed for agitation of the surface by the filter return, thus creating an oxygen exchange again. This mixed with the bubbles from the air filter would allow the oxygen to return to the water. Second, I turned on the hose and briskly sprayed water all around the tank (the motion of which would help oxygenate the fishes gills and help them to breathe more naturally. Finally, I surveyed the damage after I saw the fish were moving on their own accord and had perked up considerably. The result was 15 dead juveniles. They had suffocated from lack of oxygen.

When all was calmed down, I gathered my kids together and taught them a bit more about the aquarium ecosystem and how it worked, about oxygen exchange, about water column movement and also overcrowding and overpopulation for a small environment. They had asked me why no fish had died when we had power outages. Then not only did the air pump stop working but the water pump and heater had, too. Why didn't the fish die then? I explained that the fish population was smaller then, that there weren't so many babies, and those babies hadn't grown to a size to make a major difference to the resources of the tank. Now however, they were large enough to make such an impact.

With that, we had to make a game plan, and part of that plan is to find a new home for about 70% of the fish, as soon as they are big enough "to let go". To let go means I plan to donate them to various fish stores in the county.  The remaining 30% I will let grow as long and as big as they want to get.

Our software ecosystems are similar. Lots of new features may seem innocent by themselves and in isolation, but all it takes is an environmental change to have something catastrophic appear. we may have little to no idea what will cause that to happen, but one day all of our tests pass and everything looks great, and then one seemingly small change takes place, and suddenly everything has gone horribly wrong. I've had that experience with my own tests, and sometimes it's just a little nudge that sends everything over the edge. So what can we do? Just like I needed to educate my kids about oxygen in the water and surface agitation, I also have to be aware of the unique aspects of what makes my software environment "breathe". I may not be able to know every contingency, but knowing the big and important ones will help to stave off disasters.
Post a Comment