Software development – Test Side Story

It’s… thinking thursday

The Challenge

A couple of minutes ago, Michael Bolton tweeted

Thinking Thursday. Test this sentence: “In successful agile development teams, every team member takes responsibility for quality.”

My initial reaction was: “A Michael Bolton challenge – where’s the catch?” This is actually a sentence that shows up regularly in agile literature. Heck, I even said it myself a couple of times. What I really wanted to say at the time, was probably something along the lines of “In agile development, producing quality software should be a team effort – lots of collaboration and communication. No blaming or finger-pointing individuals.”

I tweeted some replies, but soon realised that I would hit the 140 character limit head on.

The Test

But then I thought – why not give these kinds of agile creeds Weinberg’s “Mary had a little lamb”-workout, usually reserved for demistifying ambiguous requirements. I used it earlier: stress every word in turn and see where the ambiguities are.

In?
Does this mean that outside agile development teams, no team members take responsibility?

Successful?
Does this imply that in unsuccessful agile development teams, no one takes responsibility for quality, or that some individuals take the blame? Successful to whom, and compared to what? What is meant with “success”, really? On time, within budget? Satisfied customers? All of these combined?

Agile?
What “Agile” definition are we talking about? Capital A, small a? A mindset, a methodology?And what about successful waterfall teams? Do some individuals take responsibility there? I would like to think that in successful teams, all team members would like a part of the praise. What about those other kinds of development teams out there?

Development teams?
Are we talking about developers only here? What about the tester and product owner role? Or all the other roles that played an important part in developing the product? “In agile teams, testers *are* part of the development team”, you say? I agree, as are the product owners. But in that case, we should think about another label for the team.

Every?
Really? *Every* team member? Can all team members be equally responsible for quality? As Michael Bolton contends, testers do not assure quality. Do testers hire the programmers? Fix problems in the code? Design the product? Set the schedule? Set the product scope? Decide which bugs to fix, write code?

Team member?
What about people that played a part in successfully delivering the product, but that are not considered as core team members? Who are the people that make up the team? Is that defined up front? Aren’t those team boundaries pretty dynamic?

Takes Responsibility?
Doesn’t *taking* responsibility sound a bit too negative? Isn’t “responsibility a two-sided sword? Receiving praise when the quality is applauded, taking the blame when quality turns out to be sub par?

Quality?
Quality, to whom? Qualitative, compared to what? What is quality, anyway?

Is there a problem here?

Well… The sentence under scrutiny sounds comfortably familiar, and in that sense it was a good thing to think it through in a little more detail. It sure leaves a lot to interpretation. Some of the terms used in it are highly subjective or their definitions simply not generally agreed upon.

Back to twitter

Later on, in a response to a tweet from Shrini Kulkarni, Michael said that his purpose was “exploring what bugs me (and others) about it”.

Actually, nothing bugged me about it *before* the exercise, but now it dawned upon me that the wording of that good agile practice does not do the practice justice. It is too vague; it does need rephrasing.

How about a Frustrating Friday challenge: make this sentence fresh and ambiguity-free.

You could postpone it to Semantic Saturday, if you wish. Your call.

Rebel rebel – the Danish Alliance @ Eurostar 2010

Something way cool happened at Eurostar this year. A group of like-minded people got together after the conference to do a mini-CONFERence in a more intimate setting. They called themselves the Danish Alliance (or Oprørsalliancen, when they felt like badly pronouncing Danish words). The concept was based on the Rebel Alliance, started by Matt Heusser at StarEast last year. I had been thinking about a localized version of the Alliance before, but it was the ever energetic Shmuel Gershon who put his efforts into organizing the first Alliance on European soil. Of course, this little guerilla conference couldn’t have happened without the generous help of the Eurostar folks, who set us up with a superb meeting room. Need I say that they ROCK?

The ingredients were simple:

A handful of passionate testers
A safe setting
Drinks
Pizza
Music
Chocolates & cookies

Throw all these together and stir gently. Observe.

Whatever happens, happens. There was no agenda, really. In this case we mingled first, talked and drank a bit until pizzas arrived. Major epiphany: Denmark has pizzas that come in the size of a small wallaby. After that, there were some lighting talks, timed by quality ~~gate~~timekeeper Michael Bolton (who definitely should get into the timekeeping business whenever he gets out of the QA business). You can see (transcripted!) videos of the talks in Shmuel’s write-up of the event.

‘Talks’ don’t have to be ‘talks’, per se. James Lyndsay did a call to action to test one of his new black box testing machines. Andy Glover (the Cartoon Tester) got us drawing abstract concepts. Dorothy Graham even gave us a Sound of Music flashback by singing about her favorite techniques. Anything goes.

Discussions continued until the wee hours. I thought it was wonderful. This is the kind of stuff that doesn’t regularly happen during the day at conferences. Sure, the Eurostar programme was great, again (and I’ll be writing more about that later), but the real conferring often happens outside the track sessions and tutorials. It feels great to connect with other people that are all driven by the same thing: a passion for their craft.

So thank you Shmuel Gershon, Jesper L Ottosen, Joris Meerts, Dorothy Graham, James Lyndsay, Bart Knaack, Martin Jansson, Henrik Andersson, Michael Bolton, Andy Glover, John Stevenson, Rob Lambert, Carsten Feilberg, Ajay Balamurugadas, Markus Gaertner, Henrik Emilsson, Julian Harty, Rob Sabourin, Rikard Edgren, Lynn McKee and Rob Lugton. The force will be with you, always.

The importance of discussion

Feynman on the importance of discussion

While I was on holiday, I immersed myself a bit more in the Feynman universe. And I must say – the combination of simmering French sun, lazy poolside-lounging and Feynman’s scientific and philosophical subjects worked surprisingly well. The result was like a tasty cocktail – the kind that gives you a light buzz in the head and that leaves you wanting for more.

Consuming too much of it would have probably given me a nasty headache too, but that didn’t really happen. The only lasting thing I got out of it was the desire to write some of the stuff down before I forget. So here goes…

In his 1964 lecture called “The Role of Scientific Culture in Modern Society”, Feynman states:

“I believe that we must attack these things in which we do not believe.”

“Not attack by the method of cutting off the heads of the people, but attack in the sense of discuss. I believe that we should demand that people try in their own minds to obtain for themselves a more consistent picture of their own world; that they not permit themselves the luxury of having their brain cut in four pieces or two pieces even, and on one side they believe this and on the other side they believe that, but never try to compare the two points of view. Because we have learned that, by trying to put the points of view that we have in our head together and comparing one to the other, we make some progress in understanding and in appreciating where we are and what we are. And I believe that science has remained irrelevant because we wait until somebody asks us questions or until we are invited to give a speech on Einstein’s theory to people who don’t understand Newtonian mechanics, but we never are invited to give an attack on faith healing or astrology–on what is the scientific view of astrology today.”

“I think that we must mainly write some articles. Now what would happen? The person who believes in astrology will have to learn some astronomy. The person who believes in faith healing will have to learn some medicine, because of the arguments going back and forth; and some biology. In other words, it will be necessary that science becomes relevant. The remark which I read somewhere, that science is all right so long as it doesn’t attack religion, was the clue that I needed to understand the problem. As long as it doesn’t attack religion it need not be paid attention to and nobody has to learn anything. So it can be cut off from modern society except for its applications, and thus be isolated. And then we have this terrible struggle to explain things to people who have no reason to want to know. But if they want to defend their own points of view, they will have to learn what yours is a little bit. So I suggest, maybe incorrectly and perhaps wrongly, that we are too polite.”

It strikes me how relevant this out-of-context quote still is after almost fifty years.

We cannot overestimate the importance of a critical mindset. Testers may need that even more than anybody else. Sometimes we just need to attack common beliefs that have become axioms in a way. I think it was Mark Twain who once said “It ain’t what you don’t know that gets you into trouble. It’s what you know for sure that just ain’t so.”

So, we need more discussions in our line of work – they’re a surefire way to advancing the testing craft. True, there’s plenty of discussions and controversies within testing already – the different schools of testing come to mind. But what I feel lacking sometimes, is a desire to understand where the “other side” is coming from. Why are they thinking the way they think? What are their beliefs and motives? Can we prove their beliefs to be false?

I think I’ll make this my personal mantra:

Attack, but don’t attack what you don’t understand
Be credible
Be reasonable

Metrics – perverse incentives?

Trivia time! What do following events have in common?

In the American Southwest in the 1850s there was a high reward for the scalps of members of violent and dangerous Indian tribes. This led scalp hunters to slaughter thousands of peaceful agricultural Indians and Mexican citizens, women and children alike, for their valuable scalps.
In Vietnam, under French colonial rule, there was a program paying people for each rat pelt handed in. It was originally intended to exterminate rats, but it led to the farming of rats instead.
In the 19th century, palaeontologists traveling to China used to pay peasants for each fragment of dinosaur bone that they produced. The measure was an instant success! It took them a while to discover that peasants dug up the bones and then smashed them into multiple pieces to maximise their earnings.

All these are examples of perverse incentives: measures that have unintended and undesirable effects which go against the interest of the incentive makers. They become counterproductive in the end.

I’m probably suffering from an acute case of testing analogitis again, because over the years I have seen these things happen in testing as well:

Managers evaluating testers by the amount of bugs found.
This resulted in the submission of tons of trivial and low-priority bugs. People that used to thoroughly investigate bugs and put a lot of time in pinpointing started lowering their standards.
Managers evaluating testers by the amount of test scripts executed.
This resulted in testers only focusing on scripts, not allowing themselves go off-script and investigate. This often meant going against their intuition for suspicious “smells” in the software, and it certainly did not encourage the use of exploratory testing.
Managers evaluating testers by the amount of “rejected” bugs.
The rationale behind this was: less rejections mean more valid bugs, better bug descriptions and better researched bugs. But the result of the metric was that testers were reluctant to enter complex, difficult or intermittent bugs out of fear of them being rejected. But these are the bugs we want the team to tackle, right?
Managers evaluating testers by the quality of the software.
First of all, what is quality? If we use Jerry Weinberg’s definition, “value to someone (who matters)”, it becomes clear that any manager’s assessment of quality is highly subjective. If the rewards for testers depend on the quality of the software, that is highly unfair. We are no gatekeepers of quality; we cannot assure quality, because we do not control all aspects of it. The only thing such an incentive achieves is a highly regulated cover-your-ass culture with formal hand-offs, and certainly not team collaboration, continuous improvement or better software.

These are all examples of metrics used as incentives for testers, but in most cases they just ended up creating a blame culture where quantity and pathetic compliance is valued above quality and creativity.

Dear managers, I’d say: focus on collaboration and team achievements, set goals for the team. Make the whole team responsible for the quality and the product. Then see what happens.

A Eurostar interview

A while ago, there was this little announcement on the Eurostar blog:

“As a new addition to the EuroSTAR community, we will be interviewing prominent testers from across the globe”

I thought that was pretty cool. There is lots to learn from experienced people. It’s nice to hear all these different takes on the sofware testing craft. They already published interviews with Isabel Evans, Mats Grindal, Tim Koomen, Michael Bolton, Martin Pol and Anne Mette Hass. Interesting stuff.

Several months later, I received an email from Kevin Byrne from the Qualtech/Eurostar team asking if I would be interested in doing an interview with them on testing (and other things as well). It took me a while to properly connect the term “prominent tester” with my own name. But I was honoured of course, so I accepted their offer.

And there it is. They even call me a ‘prominent Belgian tester’ in the introduction, which made me smile because it reminded me of the phrase “being big in Belgium” – often used interchangeably with being “big in Japan”, meaning as much as “totally unimportant”.

In the 1992 movie Singles, Matt Dillon plays in a band that claims to be “big in Belgium” – subtext: “what a bunch of forgettable losers”. Similarly, the legendary rock group Spinal Tap (the 1984 mockumentary This is Spinal Tap is hilarious, by the way) ended up being big in Japan, which basically meant “pathetically uncool and ridiculed at home”.

But I digress. I might not be all too prominent, but I am a Belgian tester allright. Here’s the interview:

http://www.eurostarconferences.com/blog/2010/5/18/an-interview-with-zeger-van-hese.aspx

Failure is always an option – part 2 (wartime failures)

Wartime failures

In my search for information on failed software development projects, I was frequently reminded of the fact that it’s not always software projects that fail. In many cases, I even wondered why these projects were even started in the first place. Some of them seem to come straight from a Monty Python movie – downright absurd. Needless to say that their eventual cost far outweighed the benefits, if any.

I discovered* that wartime was a true breeding ground for many beautiful and poetic failures. Anything goes when there’s an enemy waiting to be crushed in the most creative ways possible:

The Acoustic Kitty project:
A CIA project in the 1960s attempting to use cats in spy missions. A battery and a microphone were implanted into a cat and an antenna into its tail. Due to problems with distraction, the cat’s sense of hunger had to be addressed in another operation. Surgical and training expenses are thought to have amounted to over $20 million. The cat’s first mission was eavesdropping on two men in a park. The cat was released nearby, but was hit and killed by a taxi almost immediately. Shortly thereafter the project was considered a failure and declared to be a total loss.
Operation Cornflakes:
A World War II mission in 1944 and 1945 which involved tricking the German postal service Deutsche Reichspost into inadvertently delivering anti-Nazi propaganda to German citizens through mail. The operation involved special planes that were instructed to airdrop bags of false, but properly addressed mail in the vicinity of bombed mail trains. When recovering the mail during clean-up of the wreck, the postal service would hopefully confuse the false mail for the real thing and deliver it to the various addresses. The content was mainly anti-Nazi-propaganda. In addition, the postage stamps used were subtly designed to resemble the standard stamp with Adolf Hitler’s face, but a close examination would reveal that his face is made to look like an exposed skull or similarly unflattering imagery. The first mission of Operation Cornflakes took place in February 1945, when a mail train to Linz was bombed. Bags containing a total of about 3800 propaganda letters were then dropped at the site of the wreck, which were subsequently picked up and delivered to Germans by the postal service. Not too sure how many German families were converted by these letters.
The Bat Bomb project:
Bat bombs were bomb-shaped casings with numerous compartments, each containing a Mexican bat with a small timed incendiary bomb attached. Dropped from a bomber at dawn, the casings would deploy a parachute in mid-flight and open to release the bats which would then roost in eaves and attics. The incendiaries would start fires in inaccessible places in the largely wood and paper construction of the Japanese cities that were the weapon’s intended target. Eventually, the program was cancelled it became clear that wouldn’ t be combat ready until mid-1945. By that time it was estimated that $2 million had been spent on the project. It is thought that development of the bat bomb was moving too slowly, and was overtaken in the race for a quick end to the war by the atomic bomb project.
Project Pigeon: .
During World War II, Project Pigeon was B.F. Skinner‘s attempt to develop a pigeon-guided missile. The control system involved a lens at the front of the missile projecting an image of the target to a screen inside, while a pigeon trained to recognize the target pecked at it. As long as the pecks remained in the center of the screen, the missile would fly straight, but pecks off-center would cause the screen to tilt, which would then, via a connection to the missile’s flight controls, cause the missile to change course. Although skeptical of the idea, the National Defense Research Committee nevertheless contributed $25,000 to the research. Skinner’s plan to use pigeons in Pelican missiles was considered too eccentric and impractical; although he had some success with the training, he could not get his idea taken seriously. The program was canceled on October 8, 1944, because the military believed that “further prosecution of this project would seriously delay others which in the minds of the Division have more immediate promise of combat application.”

It’s probably no coincidence that the majority of these projects involved animals. In that case, failure is certainly an option – I heard that working with animals is highly unpredictable, hard to manage and time-consuming.

Strange, isn’t that what they say about software development too?

*source: wikipedia

Failure is always an option – part 1 (chaos)

About the Chaos report

One of the most popular reports people use to showcase failure of software development is the chaos report from The Standish Group. The Standish Group collects information on project failures in the software development industry in an attempt to assess the state of the industry.

In 1994, they reported a shocking 16 percent project success rate, another 53 percent of the projects were challenged (not on time, over budget and with fewer functions than originally specified), and 31 percent failed outright. Although the newer reports show better numbers, the overall results still paint a dire picture:

	1994	1996	1998	2000	2002	2004	2006	2009
Successful	16%	27%	26%	28%	34%	29%	35%	32%
Challenged	53%	33%	46%	49%	51%	53%	46%	44%
Failed	31%	40%	28%	23%	15%	18%	19%	24%

There aren’t a whole lot of other statistics out there on this topic, so obviously these numbers get big play. Guilty as charged, your honor. I have used them myself, in a presentation or two.

I won’t be doing that again.

I realized that I have some serious problems with these metrics. They measure a project’s success by solely looking at whether the projects were completed on time, on budget and with required features and functions. But what they do not take into account are things like quality, risk and customer satisfaction. Could it be that an extremely unstable, unusable and frustrating piece of sofware that was delivered on time and on budget qualifies as a success? I beg to differ.

The Standish Group’s methods are not fully disclosed, and the bits that are disclosed are apparently deeply flawed. Their figures are misleading, one-sided and meaningless – the results are completely unreliable. They present their figures as absolute facts, but I lack clear context. The most famous sceptics of the report are Jørgensen and Moløkken. They emphasize its unreliability and question the claim of a “software crisis”:

” Even the definition of challenged projects is not easy to interpret. It is defined as “The project is completed and operational but over budget, over the time estimated, and offers fewer features and functions than originally specified.” The problem here is the use of “and” instead of “or”, combined with the following definition of successful projects: “The project is completed on-time and on-budget, with all features and functions as initially specified.” Consider a project that is on-time, and onbudget, but not with all specified functionality. Is this project to be categorized as challenged or successful? Our guess is that it would be categorized as challenged, but this is not consistent with the provided definition of challenged projects. ”

In the comments section of an interview with The Standish Group’s Jim Johnson, Jørgensen brought up his critique of the CHAOS report and asked Johnson two very fair questions. Johnson’s reply is pretty enlightening, to say the least. Here are a few excerpts:

…We are an advisory research firm much like a Gartner or Forrester. Neither they nor we can afford to give our opinions away for free. We have facilities, utilities, and personnel and we must, the same as you, be able to pay our bills. Just because someone asks a question, does not mean we will respond with an answer. In fact, we most likely will not…

…Our current standard answer to a CHAOS inquiry is, first: please purchase our new book, ”My Life is Failure” in our online store. If that does not satisfy you, then you need to join CHAOS University. If you do not find your answer or answers there then you need to purchase our inquiry services. Then we will work to answer your questions…

…It is strange that Jørgensen has never applied or professed interest in joining us. Some answers can be found if you join us at CHAOS University 2007 or one of the many outreach events. So you can contribute to the CHAOS research by providing funding or sweat, but short of that you will and must be ignored by design…

Don’t get me wrong. I think there *are* lots of failing software development projects, but in other numbers and for other reasons than the ones Standish brings forth: deliveries that do not bring any value to its users, software that was poorly tested or poorly designed, resulting in failures in production.

The problem I have with the Chaos Report is that they claim to be some kind of “industry standard”, projecting a false image of the dire state of the software industry, based on poor metrics. And I certainly don’t believe in the “quality is dead” mantra that resonates from their reports. Sure, there’s plenty of chaos out there, but I like what Henry Miller said about that : “Chaos is the score upon which reality is written”.

I’m with Henry on this one.

	Mnemonics (Toolbox #… on A lesson learned from James…
	Peter on Exploring Rapid Reporter
	The Power of Doubt… on The Power of Doubt – Bec…
	Testing Bits – 11/13… on The Power of Doubt – Bec…
	Zeger Van Hese on Eurostar 2016 sketchnotes
	Dan Billing on Eurostar 2016 sketchnotes
	Visualising systems:… on A pictorial challenge: Deconst…
	Great Resources \| Up… on Rapid Software Testing –…
	Testing Bits – 1/25/… on DEWT 5 – sketchnote…
	DEWT5 Report \| DEWT on DEWT 5 – sketchnote…
	Testing Bits – 11/30… on My Eurostar 2014 closing …
	Testing News –… on Let’s Test 2014 –…