Test Side Story

Exploring Feynman

On my intention to start exploring Richard Feynman (1918-1988)

The Plan

I’m planning to do a little blog series on the late Richard Feynman to record some of my impressions and learnings while I work myself through this intriguing oeuvre of his. No, that summer heat is not getting to me, yet. I’m not exactly planning on processing his massive back catalog – I’m not really into path integral formulation or the behavior of subatomic particles, let alone the superfluidity of supercooled liquid helium. I do value the sparse free time that I have – time is on my side, yes it is. Rather, I’d like to document my exploration of his more popular works, audio and video recordings.

Exploratory learning, as you wish. Dipping into it all and savoring the juicy bits, spitting out the others. And relate things to testing, of course.

Why Richard Feynman?

Feynman intrigues me, and I have nothing but deep respect and admiration for the man. He was witty, brilliant and had this perpetual curiosity to discover new things (Tuvan throat-singing, anyone?). He opposed rote learning or unthinking memorization as teaching methods – he wanted his students to start thinking, for a change. How great is that?

On occasion, he was a totally nutty professor – a flamboyant geek. But he also happened to build a truly astonishing career which eventually earned him the Nobel prize in physics in 1965.

I’m planning to gradually learn about him and post my progress here. Stay tuned!

Collateral features

About collateral features – things that were not expected, but do provide value in the end

Last year, James Lyndsay introduced me to his “Nr1 diagram of testing” (© Workroom Productions): a deceivingly simple model that tries to capture the essence of testing.

The circle on the left represents our expectations – all the things we expect. This is the area where scripted tests or checklists come into play to tell us about the value that is present. The right hand circle represents the actual deliverable – what the software really does. This never completely matches what was originally expected, so this is where we use exploratory testing to find out about risk.

The diagram divides the world into four distinct regions:

The overlap. These are the things we both expected and got.
The region outside the circles. That is all we didn’t want, and didn’t receive in the end. A pretty infinite set, that is.
The left-hand arc. This is what we expected to get, but didn’t receive. This means that the deliverable turned out less valuable than we had hoped.
The right-hand arc. The software system under test does all kinds of things things we didn’t expect. There’s clearly some risk involved, here, and it’s up to us to discover just how much risk there is.

Simplicity is of course the main strength of such a model. It can certainly help to identify or classify things in our quest to quickly grasp the essence of things.

These four regions got me thinking. I’d like to expand on that. What about things that we expected and received, but do not prove value, for instance? Unneeded features – not too unrealistic. Or – even better – what about things that were not expected, but are there and actually do provide value in the end? Immediately the term “Collateral features” came to mind: no matter how hard we try to create designs for certain uses, people will always utilize them in their own way. These unintended uses can be strange sometimes, but some them are downright brilliant.

Take a look at Alec Brownstein. While most people use Google AdWords to promote their business (after all, that’s what it was designed for), Alec used it to get a job. He was trying to land a job as a copywriter with some of the top ad agencies in New York City. He assumed that the creative directors at these top agencies would “vanity google” their own name. So he bought AdWords of the names and put a special message in to each of them. The rest is history. He now works for Y&R, after a total investment of $6 (story here).

Collateral features also emerged in the microblogging world. Because Twitter provided no easy way to group tweets or add extra data, the twitter community came up with their own way: hashtags. A hashtag is similar to other web tags- it helps add tweets to a category. Hashtags weren’t an official feature, but they sure made their way into the daily twitter work lexicon of billions of people.

In an article in Forbes magazine called You Can’t Predict Who Will Change The World, Nassim Nicholas Taleb pointed out that things are all too often discovered by accident—but we don’t see that when we look at history in our rear-view mirrors. The technologies that run the world today (like the Internet, the computer and the laser) are not used in the way intended by those who invented them.

There will always be unforeseen usage of your software. Some prove risky, others do contain value. Some of these collateral features even replace the intended use to become main features and make their way in the ‘expected’ circle. Ultimately, your customers make the final call. They decide how to use your product or service. Not you, not your marketeers.

– “But no user would ever do that!”

– “Fair enough. Wanna bet?”

Metrics – perverse incentives?

Trivia time! What do following events have in common?

In the American Southwest in the 1850s there was a high reward for the scalps of members of violent and dangerous Indian tribes. This led scalp hunters to slaughter thousands of peaceful agricultural Indians and Mexican citizens, women and children alike, for their valuable scalps.
In Vietnam, under French colonial rule, there was a program paying people for each rat pelt handed in. It was originally intended to exterminate rats, but it led to the farming of rats instead.
In the 19th century, palaeontologists traveling to China used to pay peasants for each fragment of dinosaur bone that they produced. The measure was an instant success! It took them a while to discover that peasants dug up the bones and then smashed them into multiple pieces to maximise their earnings.

All these are examples of perverse incentives: measures that have unintended and undesirable effects which go against the interest of the incentive makers. They become counterproductive in the end.

I’m probably suffering from an acute case of testing analogitis again, because over the years I have seen these things happen in testing as well:

Managers evaluating testers by the amount of bugs found.
This resulted in the submission of tons of trivial and low-priority bugs. People that used to thoroughly investigate bugs and put a lot of time in pinpointing started lowering their standards.
Managers evaluating testers by the amount of test scripts executed.
This resulted in testers only focusing on scripts, not allowing themselves go off-script and investigate. This often meant going against their intuition for suspicious “smells” in the software, and it certainly did not encourage the use of exploratory testing.
Managers evaluating testers by the amount of “rejected” bugs.
The rationale behind this was: less rejections mean more valid bugs, better bug descriptions and better researched bugs. But the result of the metric was that testers were reluctant to enter complex, difficult or intermittent bugs out of fear of them being rejected. But these are the bugs we want the team to tackle, right?
Managers evaluating testers by the quality of the software.
First of all, what is quality? If we use Jerry Weinberg’s definition, “value to someone (who matters)”, it becomes clear that any manager’s assessment of quality is highly subjective. If the rewards for testers depend on the quality of the software, that is highly unfair. We are no gatekeepers of quality; we cannot assure quality, because we do not control all aspects of it. The only thing such an incentive achieves is a highly regulated cover-your-ass culture with formal hand-offs, and certainly not team collaboration, continuous improvement or better software.

These are all examples of metrics used as incentives for testers, but in most cases they just ended up creating a blame culture where quantity and pathetic compliance is valued above quality and creativity.

Dear managers, I’d say: focus on collaboration and team achievements, set goals for the team. Make the whole team responsible for the quality and the product. Then see what happens.

A Eurostar interview

A while ago, there was this little announcement on the Eurostar blog:

“As a new addition to the EuroSTAR community, we will be interviewing prominent testers from across the globe”

I thought that was pretty cool. There is lots to learn from experienced people. It’s nice to hear all these different takes on the sofware testing craft. They already published interviews with Isabel Evans, Mats Grindal, Tim Koomen, Michael Bolton, Martin Pol and Anne Mette Hass. Interesting stuff.

Several months later, I received an email from Kevin Byrne from the Qualtech/Eurostar team asking if I would be interested in doing an interview with them on testing (and other things as well). It took me a while to properly connect the term “prominent tester” with my own name. But I was honoured of course, so I accepted their offer.

And there it is. They even call me a ‘prominent Belgian tester’ in the introduction, which made me smile because it reminded me of the phrase “being big in Belgium” – often used interchangeably with being “big in Japan”, meaning as much as “totally unimportant”.

In the 1992 movie Singles, Matt Dillon plays in a band that claims to be “big in Belgium” – subtext: “what a bunch of forgettable losers”. Similarly, the legendary rock group Spinal Tap (the 1984 mockumentary This is Spinal Tap is hilarious, by the way) ended up being big in Japan, which basically meant “pathetically uncool and ridiculed at home”.

But I digress. I might not be all too prominent, but I am a Belgian tester allright. Here’s the interview:

http://www.eurostarconferences.com/blog/2010/5/18/an-interview-with-zeger-van-hese.aspx

On Google Insights, Indians and testing

I have been playing around with Google Insights for Search lately. It’s a nifty tool that allows you to compare search volume patterns across specific regions, categories, time frames and properties. The search results are presented in a textual manner, but there is also a map representation allowing you to drill down on regions and countries.

An explorer’s and system thinker’s walhalla! It didn’t take long before I found myself throwing some testing-related lingo at it:

I wonder what happens when “Software testing” is thrown into the mix?

Mmm… and what about “Agile testing”?

wOOt! “Testing conference”?

This can’t be. Let’s try ISTQB…

Same old, same old. Is this some kind of caching problem? Let’s park the testing stuff and throw in Alaska’s finest, “Sarah Palin”.

Okay, this actually makes sense. Let’s zoom in on the US. Drill baby, drill!

And sure enough, the top number of searches comes from the place she could see Russia from on a clear day.

Google Insights wasn’t messing with me. It’s real. The highest search volumes of almost all software testing-related terms seem to come out of India. Look who’s on a quest for knowledge.

Are Indian testers heavier google-users than the average Westerner? Is that because other sources of testing-related information are lacking? I’d love to hear the opinion of Indian testers on this.

Although this trend is remarkable, it’s not surprising. If the nature of many Indian testing blogs is something to go by, a lot of Indian testers *are* inquisitive and critical. It’s the birthplace of Weekend Testing, too. And the sapient tester virus is spreading rapidly: if you take a look at the blogs of Ajay Balamuragadas, Dhanasekar S, Parimala Shankaraiah, Pradeep Soundararajan, Shrini Kulkarni, Manoj Nair, Debasis Pradhan, Santhosh Tuppad, Sharath Byregowda, Mohit Verma, Jaswinder Kaur Nagi, Santosh Shukla, Nandagopal and Madhukar Jain, their enthousiasm and sheer passion for the craft are contagious.

I like the way many of them are taking skill development into their own hands. Bhārat, the home of continuous learning and improvement!

I wonder how long it would take to put Belgium on that Google Insights testing map. I’m afraid this won’t be happening anytime soon – but I’m pretty confident that we will get the term “governmental crisis” up there in a heartbeat.

Volcanic systems thinking

General systems thinking and the effects of a volcano outburst

I believe testing is about looking at things from as many perspectives as you can. Testing is also about relating things to one another, seeing things in a greater context. In that sense you could say that testing is applied systems thinking.

Years ago, Michael Bolton pointed me to the amazing book “Introduction to general systems thinking” by Jerry Weinberg. Actually, he pointed me to practically every publication by Jerry Weinberg – I’m still trying to prioritise my reading list. The book taught me that taking a holistic view of a system within its environment, may enable us to see patterns of behavior/actions and recognize interactions and interdependencies among its components. That way, we can better understand the system, maybe even predict how it will evolve over time.

The recent volcanic eruptions of the Eyjafjallajökull in Iceland provide a great example of (and also an exercise in) real-life general systems thinking and how several systems are possibly interconnected. I started wondering… What can the possible consequences of the recent volcanic eruption be, worldwide? There is a fair chance that other volcanoes nearby will erupt too. Scientists say history has proven that when the Eyjafjallajökull volcano erupts, the Katla volcano follows — the only question is how soon. And Katla, located under the massive Myrdalsjokull icecap, threatens disastrous flooding and explosive blasts when it blows.

If we can rely on history repeating itself, we’re onto something big. Let’s look at a true story. In June 1783, the eruption of the Icelandic Laki volcano was the start of a chain of unlikely events that affected everyone’s lives:

The immediate impact was catastrophic: around 25% of the Icelandic population died in the famine and fluorine poisoning after the eruptions ceased. Around 80% of sheep, 50% of cattle and 50% of horses died because of dental and skeletal fluorosis.

The rest of Europe soon followed:

A thick, poisonous smog cloud floated across the jet stream, resulting in many thousands of deaths throughout 1783 and the winter of 1784. Inhaling sulfur dioxide gas caused victims to choke.
The thick fog caused boats to stay in port, unable to navigate.
Weather patterns started changing across western Europe:
- The summer of 1783 was the hottest on record at that time.
- Severe thunderstorms with hailstones even killed cattle.
- In 1784, a most severe winter caused 8,000 deaths in the UK.
- During the melting that followed in spring, all of Europe reported severe flood damage.

And these were only the short term effects. The meteorological impact of the Laki eruption resonated on, contributing significantly to several years of extreme weather in Europe and the rest of the world:

In New Orleans, the Mississipi river froze, and ice started appearing in the gulf of Mexico.
African and Indian monsoon circulations weakened, leading to precipitation anomalies over the Sahel that resulted in low flow in the river Nile.
In France, a surplus harvest in 1785 caused poverty for rural workers.
For years afterwards, severe droughts followed. There were a series of bad winters and summers, including a violent hailstorm in 1788 that destroyed crops.
All this contributed significantly to the build up of poverty and famine that eventually triggered the French Revolution in 1789.

Wow. Time-out. Really? Could it be that the eruption of an Icelandic volcano lies at the basis of the French revolution, six years later? Since all these seemingly independent systems (meteorological, economic, agricultural and sociological) *are* connected, that’s perfectly plausible. Apparently, the indirect and long-term consequences of the eruption have greater impact than the initial effects of the event. Even art was affected. The most beautiful sunsets started appearing in late 18th-century paintings.

And what were the effects of the French revolution again? The abolition of Feudalism, for starters. The creation of a new order based on the famous ‘Declaration of the Rights of Man’. The main theme of the French Revolution, ‘Liberty, Equality and Fraternity’, later became one of the most famous political dogmas across the world. You could say that the Revolution paved the way for democracy. It brought about a lot of economic and social reforms, not only in France, but across Europe. Culture was also affected, at least in the short term, with the revolution permeating every creative endeavour. It changed the face of Europe: national identities joined forces, everywhere. In short: the revolution helped shape the future course of the world.

If we now think about the current eruptions of that rather friendly tongue-twister called Eyjafjallajökull, we notice that there are many more systems in play than there were in 1783. There’s commercial aviation and worldwide travel now; both of them fuel our economies and came to an abrupt stop. Airline stocks dropped rapidly. All this resulted in massive economic losses and temporary unemployment. Several events are being cancelled because people cannot get there in time. People are trapped abroad or forced to stay at home. And these are only the short-term effects. I wonder what this year’s summer will be like. I already look forward to walking or skating across the channel to visit London. Or maybe throw a snowball or two at the gendarmes of Saint-Tropez.

And that’s just Eyjafjallajökull. Not Katla, not Laki. Just sayin’.

Failure is always an option – part 2 (wartime failures)

Wartime failures

In my search for information on failed software development projects, I was frequently reminded of the fact that it’s not always software projects that fail. In many cases, I even wondered why these projects were even started in the first place. Some of them seem to come straight from a Monty Python movie – downright absurd. Needless to say that their eventual cost far outweighed the benefits, if any.

I discovered* that wartime was a true breeding ground for many beautiful and poetic failures. Anything goes when there’s an enemy waiting to be crushed in the most creative ways possible:

The Acoustic Kitty project:
A CIA project in the 1960s attempting to use cats in spy missions. A battery and a microphone were implanted into a cat and an antenna into its tail. Due to problems with distraction, the cat’s sense of hunger had to be addressed in another operation. Surgical and training expenses are thought to have amounted to over $20 million. The cat’s first mission was eavesdropping on two men in a park. The cat was released nearby, but was hit and killed by a taxi almost immediately. Shortly thereafter the project was considered a failure and declared to be a total loss.
Operation Cornflakes:
A World War II mission in 1944 and 1945 which involved tricking the German postal service Deutsche Reichspost into inadvertently delivering anti-Nazi propaganda to German citizens through mail. The operation involved special planes that were instructed to airdrop bags of false, but properly addressed mail in the vicinity of bombed mail trains. When recovering the mail during clean-up of the wreck, the postal service would hopefully confuse the false mail for the real thing and deliver it to the various addresses. The content was mainly anti-Nazi-propaganda. In addition, the postage stamps used were subtly designed to resemble the standard stamp with Adolf Hitler’s face, but a close examination would reveal that his face is made to look like an exposed skull or similarly unflattering imagery. The first mission of Operation Cornflakes took place in February 1945, when a mail train to Linz was bombed. Bags containing a total of about 3800 propaganda letters were then dropped at the site of the wreck, which were subsequently picked up and delivered to Germans by the postal service. Not too sure how many German families were converted by these letters.
The Bat Bomb project:
Bat bombs were bomb-shaped casings with numerous compartments, each containing a Mexican bat with a small timed incendiary bomb attached. Dropped from a bomber at dawn, the casings would deploy a parachute in mid-flight and open to release the bats which would then roost in eaves and attics. The incendiaries would start fires in inaccessible places in the largely wood and paper construction of the Japanese cities that were the weapon’s intended target. Eventually, the program was cancelled it became clear that wouldn’ t be combat ready until mid-1945. By that time it was estimated that $2 million had been spent on the project. It is thought that development of the bat bomb was moving too slowly, and was overtaken in the race for a quick end to the war by the atomic bomb project.
Project Pigeon: .
During World War II, Project Pigeon was B.F. Skinner‘s attempt to develop a pigeon-guided missile. The control system involved a lens at the front of the missile projecting an image of the target to a screen inside, while a pigeon trained to recognize the target pecked at it. As long as the pecks remained in the center of the screen, the missile would fly straight, but pecks off-center would cause the screen to tilt, which would then, via a connection to the missile’s flight controls, cause the missile to change course. Although skeptical of the idea, the National Defense Research Committee nevertheless contributed $25,000 to the research. Skinner’s plan to use pigeons in Pelican missiles was considered too eccentric and impractical; although he had some success with the training, he could not get his idea taken seriously. The program was canceled on October 8, 1944, because the military believed that “further prosecution of this project would seriously delay others which in the minds of the Division have more immediate promise of combat application.”

It’s probably no coincidence that the majority of these projects involved animals. In that case, failure is certainly an option – I heard that working with animals is highly unpredictable, hard to manage and time-consuming.

Strange, isn’t that what they say about software development too?

*source: wikipedia

Failure is always an option – part 1 (chaos)

About the Chaos report

One of the most popular reports people use to showcase failure of software development is the chaos report from The Standish Group. The Standish Group collects information on project failures in the software development industry in an attempt to assess the state of the industry.

In 1994, they reported a shocking 16 percent project success rate, another 53 percent of the projects were challenged (not on time, over budget and with fewer functions than originally specified), and 31 percent failed outright. Although the newer reports show better numbers, the overall results still paint a dire picture:

	1994	1996	1998	2000	2002	2004	2006	2009
Successful	16%	27%	26%	28%	34%	29%	35%	32%
Challenged	53%	33%	46%	49%	51%	53%	46%	44%
Failed	31%	40%	28%	23%	15%	18%	19%	24%

There aren’t a whole lot of other statistics out there on this topic, so obviously these numbers get big play. Guilty as charged, your honor. I have used them myself, in a presentation or two.

I won’t be doing that again.

I realized that I have some serious problems with these metrics. They measure a project’s success by solely looking at whether the projects were completed on time, on budget and with required features and functions. But what they do not take into account are things like quality, risk and customer satisfaction. Could it be that an extremely unstable, unusable and frustrating piece of sofware that was delivered on time and on budget qualifies as a success? I beg to differ.

The Standish Group’s methods are not fully disclosed, and the bits that are disclosed are apparently deeply flawed. Their figures are misleading, one-sided and meaningless – the results are completely unreliable. They present their figures as absolute facts, but I lack clear context. The most famous sceptics of the report are Jørgensen and Moløkken. They emphasize its unreliability and question the claim of a “software crisis”:

” Even the definition of challenged projects is not easy to interpret. It is defined as “The project is completed and operational but over budget, over the time estimated, and offers fewer features and functions than originally specified.” The problem here is the use of “and” instead of “or”, combined with the following definition of successful projects: “The project is completed on-time and on-budget, with all features and functions as initially specified.” Consider a project that is on-time, and onbudget, but not with all specified functionality. Is this project to be categorized as challenged or successful? Our guess is that it would be categorized as challenged, but this is not consistent with the provided definition of challenged projects. ”

In the comments section of an interview with The Standish Group’s Jim Johnson, Jørgensen brought up his critique of the CHAOS report and asked Johnson two very fair questions. Johnson’s reply is pretty enlightening, to say the least. Here are a few excerpts:

…We are an advisory research firm much like a Gartner or Forrester. Neither they nor we can afford to give our opinions away for free. We have facilities, utilities, and personnel and we must, the same as you, be able to pay our bills. Just because someone asks a question, does not mean we will respond with an answer. In fact, we most likely will not…

…Our current standard answer to a CHAOS inquiry is, first: please purchase our new book, ”My Life is Failure” in our online store. If that does not satisfy you, then you need to join CHAOS University. If you do not find your answer or answers there then you need to purchase our inquiry services. Then we will work to answer your questions…

…It is strange that Jørgensen has never applied or professed interest in joining us. Some answers can be found if you join us at CHAOS University 2007 or one of the many outreach events. So you can contribute to the CHAOS research by providing funding or sweat, but short of that you will and must be ignored by design…

Don’t get me wrong. I think there *are* lots of failing software development projects, but in other numbers and for other reasons than the ones Standish brings forth: deliveries that do not bring any value to its users, software that was poorly tested or poorly designed, resulting in failures in production.

The problem I have with the Chaos Report is that they claim to be some kind of “industry standard”, projecting a false image of the dire state of the software industry, based on poor metrics. And I certainly don’t believe in the “quality is dead” mantra that resonates from their reports. Sure, there’s plenty of chaos out there, but I like what Henry Miller said about that : “Chaos is the score upon which reality is written”.

I’m with Henry on this one.

A lesson learned from James Bach

About “Secrets of a Buccaneer-Scholar” (James Bach)

I just finished James Bach’s “Secrets of a buccaneer-scholar” and it hit home in a weird way. I’m not an unschooler or a high school dropout, but I could still relate to a lot of things in his book. It was a tremendous read, giving me instant flashbacks to the days of yore.

As a young kid, I constantly skimmed through encyclopedia volumes that were lying around the house. I wasn’t “studying” from them, I was just fascinated by what I thought was all the knowledge of the universe compiled into 14 volumes. I let my mind wander while looking at the pictures, jumping randomly from subject to subject. When something looked fascinating enough to stay with it for a while, I dove in and read through the whole entry. I didn’t understand all of it, but I didn’t really mind. Most of the time it was just superficial browsing anyway – I blamed it on my short attention span. But as I was doing it more frequently, it became more systematic. Once in a while, I came across things that I previously ignored, but all of a sudden did seem interesting enough to investigate. Things I had previously read helped me to understand new things as well. I learned that I remembered lots of information without trying to. It just sticked because it was so damn interesting. I did the same thing with all the world maps and globes I could get my hands on. They really got my imagination running. The result is that I’m still bursting with trivia that spill out on the most inconvenient moments. It’s great in the occasional quiz, though.

I always thought that was a bit awkward. Not many kids I knew read encyclopedias and atlases in their spare time. It’s not that I didn’t enjoy school, but this kind of exploratory learning felt more natural to me. There was hardly any effort involved. It was pretty chaotic, but it was a learning style that fit me like a wet suit.

As an adult, I am facing the same problems: I like to learn and educate myself, but in an almost impractical and inefficient way. I see interesting ideas and sources of knowledge everywhere, and this overwhelms me – so many things to learn, so little time! I purchase far more books than I can read (thanks for that, Paypal & Amazon). I start reading books but do not necessarily finish them. My reading isn’t very linear. I tend to get distracted often and feel the need to switch to something else. I procrastinate more than I would like. At this very moment, I’m trying to read nine books at the same time.

I used to feel bad about all this inefficiency. Until I finished James Bach’s book, a couple of hours ago.

It put things in perspective. It all makes a bit more sense to me now. Apparently it *is* okay and natural to let your mind wander. Allow yourself to be distracted. James calls it the “Follow your Energy”-heuristic: go with the flow of what engages your curiosity. Stick with what is fun and fits the natural rhythms of your mind. But in order to be more in control of your learning, combine it with the “Long Leash”-heuristic. Let your mind drift off, but in a controlled manner – keep it on a long leash. Remind yourself that you are on a mission and gently pull on the leash to regain focus again.

These are just a couple of examples, but there’s more where that came from. In a way, a lot of the principles or heuristics described in the book reminded me of the young kid trying to work his chaotic way through that wealth of interesting information out there.

James Bach describes his pattern of learning with the “SACKED SCOWS” acronym:

Scouting Obsessively (…I discover the sources and tools I will need)
Authentic Problems (… engage my mind)
Cognitive Savvy (…means working with the rhythms of my mind)
Knowledge attracts Knowledge (…the more I know, the easier I learn)
Experimentation (…makes learning vivid and direct)
Disposable Time (…lets me try new things)
Stories (…are how I make sens of things)
Contrasting Ideas (…leads to better ideas)
Other Minds (…exercise my thinking and applaud my exploits)
Words and Pictures (…make a home for my thoughts)
Systems Thinking (…helps me tame complexity)

According to James Bach, a Buccaneer-Scholar is

“anyone whose love of learning is not muzzled or shackled by any institution or authority; whose mind is driven to wander and find its own place in the world”.

So, am I a Buccaneer-Scholar? Maybe, I wouldn’t know. I wasn’t a rebel kid at war with the educational system – I actually enjoyed most of my time at school. I am not radically unschooling my kids, as James is doing. I wasn’t a whizz-kid either. I don’t think that’s the point. But I do love to learn new stuff, and preferably in ways that do not really make sense. At least, they didn’t until today.

Thank you, James.

Testing analogitis (a phantom menace)

A story about the phantom of Heilbronn and testing

Blame this post on my wandering mind. I’m suffering from a severe case of analogitis. I’m starting to see testing analogies everywhere.

On April 25, 2007, a 22-year-old female police officer was fatally shot in Heilbronn, Germany. The analysis of the DNA that was found at the crime scene revealed some astonishing information. The DNA belonged to a woman and started popping up in several seemingly unrelated cases, some of them dating as far as fifteen years back. Traces were found:

on a cup after the killing of a 62-year-old woman in Idar-Oberstein, Germany
on a kitchen drawer after the killing of a 61-year-old man in Freiburg, Germany
on a syringe containing heroin near Gerolstein, Germany
on the leftovers of a cookie in a trailer that was forcefully opened in Budenheim, Germany
on a toy pistol after the robbery of Vietnamese gemstone traders in Arbois, France
on a projectile after a fight between two brothers in Worms, Germany
on a stone after a burglary in Saarbrücken, Germany
after a burglary at an optometrist’s store in Gallneukirchen, Austria
after 20 burglaries and thefts of cars and motorbikes Germany and Austria
on a car used to transport the bodies of three Georgians killed in Heppenheim, Germany
after a burglary in a disused public swimming pool in Niederstetten, Germany
after four cases of home invasion in Quierschied, Tholey and Riol, Germany
after an apartment break-in in Oberstenfeld-Gronau, Germany
after the robbery of a woman in a club house in Saarhölzbach, Germany
in the car of an auxiliary nurse who was found dead near Weinsberg, Germany

The so-called “phantom of Heilbronn” was born. An unknown woman was scattering her DNA all over the place, committing murders, breaking into houses, eating cookies, drinking beer, toting toy guns, she was even shooting up heroin. Profilers could tell that she was quite a busy lady, but she had no identity – the police had absolutely no clue. No Mentalist or CSI Heilbronn coming to the rescue – things were looking bleak.

But in March 2009, the case took a new turn. Investigators discovered the very same DNA sequence on the burned body of a male asylum-seeker in France – an anomaly, since the sequence was of a female. They eventually found out that the phantom serial killer did not actually exist and that the laboratory results were due to contamination of the cotton buds used for DNA probing. They discovered that the cotton swabs were already contaminated before shipping, and that they all came from the same factory. That factory, in turn, employed several Eastern European women who fit the type the DNA was assumed to match. The cotton swabs were not intended for analytical, but only medical use. They were sterile, but not certified for human DNA collection.

At this point you are probably wondering what all this has to do with testing. Bear with me a little.

Reading the Phantom of Heilbronn story reminded me of a phenomenon I briefly described in my 2007 paper “Software testing, profession of paradoxes?“. When we investigate something, the outcome of our investigations is sometimes influenced by the observation itself. The very act of testing influences its outcome. This axiom is also often referred to as the ‘observer effect’.

The idea is that since any form of observation is also an interaction, the act of testing itself can also affect that which is being tested. For example:

When log files are used in testing to record progress or events, the application under test may slow down drastically
The act of viewing log files while a piece of software is running can cause an I/O error, which may cause it to stop
Observing the performance of a CPU by running both the observed and observing programs on the same machine will lead to inaccurate results because the observer program itself affects the CPU performance
Observing (debugging) a running program by modifying its source code (e.g., adding extra output or generating log files) or by running it in a debugger may cause certain bugs to diminish or change their behavior, creating extra difficulty for the person trying to isolate the bug (also known as a ‘Heisenbug’)

Paradoxically, software testing is not always considered as the best way toward better quality. Just like in the Phantom of Heilbronn case, the advanced state of the technology might work against us, rather than with us. In 1990, Boris Beizer described the “complexity barrier principle” (1990): software complexity (and therefore that of bugs) grows to the limit of our ability to manage that complexity.

Sometimes, testing is not the best option. Sometimes, testing and fixing problems does not necessarily improve the quality and reliability of the software. Oh sweet paradox, how can I embrace thee? Let me count the ways…

	Mnemonics (Toolbox #… on A lesson learned from James…
	Peter on Exploring Rapid Reporter
	The Power of Doubt… on The Power of Doubt – Bec…
	Testing Bits – 11/13… on The Power of Doubt – Bec…
	Zeger Van Hese on Eurostar 2016 sketchnotes
	Dan Billing on Eurostar 2016 sketchnotes
	Visualising systems:… on A pictorial challenge: Deconst…
	Great Resources \| Up… on Rapid Software Testing –…
	Testing Bits – 1/25/… on DEWT 5 – sketchnote…
	DEWT5 Report \| DEWT on DEWT 5 – sketchnote…
	Testing Bits – 11/30… on My Eurostar 2014 closing …
	Testing News –… on Let’s Test 2014 –…