Just one of those things
I’ve been neglecting this blog, lately, partly because I’ve been busy working on a software development process assessment for a client. In doing this, one of the meetings I observed was a post mortem of a release failure. The developers involved noted that they’d seen some connection drops by the webserver on the integration environment. The developers, however, didn’t trust that the integration environment adequately represented the production environment. They’d seen a similar problem some months prior, and didn’t know if anyone had fixed it. Therefore they didn’t know whether these problems were the result of the code they were deploying, or, as one developer put it, “just one of those things.”
Let me say that it’s never “just one of those things.” As Jonathan Kohl and I discussed, unexplained issues need to be investigated and understood. Computers aren’t truly random. Things don’t “just happen.” Unresolved problems will come back and bite you at the worst time. Intermittent failures on an integration server make it impossible to trust your integration.
But intermittent failures can make it clear that there is a problem. Your job is to solve that problem. Jonathan has some pointers to help you do so.
(Surely: “Computers areN’T truly random.”)
I saw this today helping test an ex-colleague’s project. There was some bizarre new-user-to-the-site behaviour – thankfully I could reproduce it by switching browser profiles … but you hit the nail on the head here: “don’t ignore the ‘intermittent failures’ or they’ll come back to bite you later”.
Oops, good catch! Typo fixed.
Yeah, it seems that the harder it is to reproduce a problem, the harder it bites you later.