Agile Testing Days 2010 – Day 1 (Agile transitions)

Agile Testing Days 2010 – Day 1

After a great experience at the Agile Testing Days last year, I decided to answer their call for papers early. By the time the full program was announced (somewhere in april), I had almost forgotten that I participated. So it was a pleasant surprise to see my name listed among all those great speakers. I decided to break out of my comfort zone for once and in the last minute I “prezi-fied” my existing presentation. Confidently stressed, I flew east to Berlin to be part of what proved to be a wonderfully memorable conference. 

October 3

It was sunday October 3, which meant I arrived on the 20th anniversary of the German unification. The last time I had been in the city centre, Berlin was still a divided city. I was 16, and overwhelmed by the contrast between the neon-lit Ku’damm and the clean but spookily deserted East. Going through checkpoint Charlie to the East – and happily back again, while others desperately wanted to but couldn’t – still ranks among the most awkward moments in my otherwise pretty uneventful youth. Sure, the Alexanderplatz, Ishtar gate and Pergamon museum impressed me, but why a country would deliberately lock up its people was totally beyond my 16-year-old self.

So, with a few hours of daylight left, I headed to some sites that I still remembered from the days of yore. The Brandenburger Tor was now the backdrop for big festivities: music, beer, bratwurst and parachute commandos executing a perfect landing at Helmut Kohl’s feet at the Reichstag. No concrete walls to be seen. Unter den Linden completely opened up again. It felt great. Sometimes nostalgia isn’t what it used to be.

October 4

© Stephan Kämper

The morning of tutorial day, the Seminaris Hotel conference lobby was buzzing with coffee machines and activity. I had enrolled for Elisabeth Hendrickson‘s “Agile transitions” tutorial, which turned out to be an excellent choice. Eight people were taking part in the WordCount experiment, of which Elisabeth recounts an earlier experience here. After a round of introductions, we divided roles within the WordCount company: tester – developer – product manager – interoffice mail courier (snail mail only) – computer (yes, computers have feelings too) or observer. Strangely enough, I felt this natural urge to be a tester. I didn’t resist it, why should I? Elisabeth then proceeded to explain the rules. We would play a first round in which we had to stick to a set of fixed work agreements, like working in silos, formal handoffs and communicating only through the interoffice mail courier. The goal of the game was basically to make our customer happy by delivering features and thus earning money in the process.

We didn’t make our customer happy, that first round. On the contrary – confusion, chaos and frustration ensued. Testers belting out test cases, feeding them to the computer, getting back ambiguous results. Developers stressed out, struggling to understand the legacy code. Our product manager became hysterical because the customer kept harassing him for a demo and no-one was responding to his messages. The mail courier was bored, our computer felt pretty abandoned too. It all felt wonderfully unagile.

In round 2 we were allowed to change our work agreements any way we wanted, which sounded like music to our agile ears! We co-located immediately and fired our mail courier. We organised a big kickoff-meeting in which the customer would explain requirements and walk us through the application. We already visualised the money flowing in. In theory, theory and practice are the same. In practice – not so much. We spent a whole round discussing how we would work. We lost track of time. There were no new features, and no money. We felt pretty silly.

Round 3 was slightly better. We were able to fix some serious bugs and our first new features were developed, tested and working. But just when we thought we were on a roll, our customer coughed up some examples that she really wanted to pass too. They didn’t. 

Pressure was on in round 4, which was going to be the last one of the day. Would we make history by not delivering at all? Well, no. We actually reinvented ATDD, by letting the customer’s examples drive our development. This resulted in accepted features, and some money to go with that. We managed to develop, test and demo some additional functionalities too. A not-so-epic win, but a win nontheless. Wordcount was still in business. If there would have been a round 5, I’m pretty sure WordCount Inc. would have made a glorious entrance at the Nasdaq stock exchange.

Elisabeth did a great job facilitating the discussions in between rounds and playing a pretty realistic customer. All the participants made for a very enjoyable day too. The day really flew by and ended with a great speaker’s dinner at the borders of the Schlachtensee. A Canadian, an American, a German and a Belgian decided to walk back to the hotel instead of taking the bus. It sounds like the beginning of a bad joke, but that refreshing 5km walk through the green suburbs was actually the perfect closure of a terrific day. And without a map, I might add. As the rapid Canadian pointed out later: documentation is overrated.

What a picture can tell you – an exercise

Shortly after I posted the pictorial challenge on my blog, I had a conversation with Thomas Ponnet on Twitter:

ThomasPonnet: I could srt deconstructing but 4 wht reason? So far thr’s no context so therefore thr cnt B a story IMO. Interesting though

TestSideStory: I think there are clues in the pic that might give some context away.It’s just an exercise in seeing/interpreting signs imo

ThomasPonnet: I’m difficult on purpose 😉 w/out context, no, w/out oracle I can’t infer, I can only guess, do testers do that? Yes,for fun

TestSideStory: Testers do guess. They call that making an hypothesis 🙂 Then they see where they get from there.

About the hypothesis thing: I meant that when I said it. We make guesses all the time. We make hypotheses, we assume some things and we act accordingly. We perform experiments to see whether we can confirm our hypotheses. If not, time to re-model. 

We construct models in our mind, but are these models ever correct? And even if they prove to be incorrect, that is often more useful than  having no models at all. Remember the old adagio “when you’re lost, any old map will do” .

Back to the challenge. Quite a lot of people visited, but no-one actually rose to the challenge, which leads me to assume that people either :

  • were confused
  • were not interested
  • didn’t see the point
  • didn’t have the time
  • couldn’t care less

After this sobering insight I decided to eat my own dog food and have a go at it myself. Click on the thumbnail for a larger resolution picture.

Can we derive any context from this picture?

On the denotation side: it’s just a little boy is sitting on top of a house. We only see the upper floor. On the balcony, there’s a little blue bike, a blue baby bath, a blue screen door to keep the mosquitoes out, a birdcage with two birds in it, a pot with a plant and some broken but repaired windows. On top, we see two old publicity signs. One says “Zanchetti”, the other is only partially visible and reads “La Mejor Ropa de Tr…”

Any connotations? Situational context? La mejor ropa… Ropa means clothing or clothes in Spanish. So the publicity signs seem to point us to a  spanish-speaking country. Somewhere in Mexico, maybe? Or Spain? Latin-America? South-America? 

Let’s google the two terms on the signs.

Zanchetti mejor ropa de

Mmm… primarily hits from Argentina, some from Chile and Colombia too. Maybe we should narrow down the search. The two signs seem to belong together, so Zanchetti is probably a clothing factory. Let’s try another search and see what happens.

Zanchetti ropa

The first result from this search leads us to an “indumentaria online” (clothing online) site (thanks, Google translate!), which basically seems to be a collection of stores that sell working clothes. So we can also complete the publicity signs by now: “Zanchetti, la mejor ropa de trabajo”. The last store in the list rings a bell:

ZANCHETTI HNOS.
Vieytes 1876 (1275) Bs.As.

The Zanchetti brothers are in Argentina, allright. Buenos Aires, to be exact. Enter Google Maps, that trusted friend for the geographically challenged.

This is the result.

Note that the address shown isn’t actually Vieytes 1876, which is a smaller street in an other part of the city.

Of course, we can’t just assume that Vieytes 1876 is the address where the picture was taken. Publicity signs are typically constructed on tall buildings in commercial areas, not too far from the neighborhood of the business itself.

The building looks old, and the publicity signs are weary, almost from decades ago. On an other clothing website, it says that the Zanchetti brothers started their business in 1962. It sure looks like it stems from that era. The fact that they once decided to place the signs here, indicates that this probably used to be a big commercial or industrial site. The faded signs also suggest that this area is no longer mainstream, and deteriorating. The building looks like a residential building now, so its function may have changed over time. Could it be that this once was a thriving part of the city, but that the city has now evolved elsewhere, leaving this area to deteriorate?

In spite of the building, the boy on top of the roof doesn’t look poor. He is meticulously dressed in what looks like a sports outfit. This may indicate that his parents are not too wealthy, but that they take pride in giving their children the best life possible (or this indicates that the boy is a small time drug dealer succesfullysupporting himself – but I give him the benefit of the doubt). He looks comfortable up there, as if this is his usual hide-out/vantage point. He’s ignoring the rooftop view, probably because he’s pretty familiar with the surroundings. 

The blue bicycle stowed away on the balcony is likely the boy’s, and the fact that it’s up there and not downstairs ready for use, is maybe an indication that the boy and his family only live in (or own) the upper floor. This probably means that there’s not too much room in the apartment. So maybe the rooftop is where he goes to have some time for himself. He sure looks a bit lonely up there.

But there’s other interesting stuff on the balcony.

The blue baby bath. It indicates that there is at least one other (younger) child in the family. We can’t say for sure if it’s still a baby. After all, the bath could be an old one, waiting to be discarded. 

The blue screen doors with what appears to be children’s stickers on the inside. The bird cage with what looks like parakeets (there is towel on top of the cage, which indicates that they are covered up at night and spend the night outside too). The broken and poorly mended glass in the doors. All these things imply a rather poor but caring and happy family.

Any things I missed?

A pictorial challenge: Deconstruction / Denotation / Connotation

I’ve been thinking about deconstruction (a term first introduced by the French philosopher Jacques Derrida) and its applicability to software testing lately. According to wikipedia, this is

an approach used in literary analysis, philosophy or other fields to discover the meaning of something to the point of exposing the supposed contradictions and internal oppositions upon which it is founded – showing that those foundations are irreducibly complex, unstable, or impossible

Deconstruction is also an essential part of semiotics and media studies, where it is used to pick images apart through the use of fine detail. We are surrounded by images in every day life, and we think we are able to read and understand everything we see, taking our visual literacy and cognitive abilities for granted. Deconstructing can help us to understand our decoding processes and how we derive meaning from all that surrounds us.

Within deconstruction, we have denotations and connotations

  • A ‘denotation’ is the first level of signification, a literal meaning on a basic, dictionary level. This is the ‘obvious’ or ‘commonsense’ meaning of something for someone: this thing is pink, it is a bicycle.
  • The term ‘connotation’ is at the second level of signification. It refers to ‘personal’ associations (ideological, emotional) and is typically related to the interpreter’s class, age, gender, ethnicity and so on. Connotations are numerous, and vary from reader to reader. The above mentioned bicycle is rather small, so it probably belongs to a teenager. It is pink, so perhaps it is a girl’s. But it is flashy and eyecatching too and might therefore connote that its owner is just an extrovert. If you once fell off a bicycle, you may even associate this bicycle with negativity and pain.

I think a large part of what we do as testers is deconstructing, in a way. We try to make sense of something by uncovering meaning (intended or unintended). We aim to derive meaning from different angles. We deconstruct by applying factoring (Michael Bolton defined this as “listing all the dimensions that may be relevant to testing it”) to objects, images and software – it can be useful to list as many different hypotheses as possible.

So, what about *your* deconstructing skills?

Since testers do seem to like challenges – here’s one for you all to enjoy.

Click on the thumbnail for a larger picture.

What can you find out about this picture?

What does it tell you?

What story does it tell?

Can you derive context?

What are you assuming? Why?

Which heuristics did you use?

I’m not revealing the copyright details just yet – no spoilers. Additional info will be added later. Enjoy!

Children’s own pass/fail criteria (and nursery rhymes)

One month ago, my oldest daughter (6) started taking on rope skipping. The last time I had seen her practising, two weeks ago, she was still having trouble getting the rope neatly over (and under) herself, but yesterday she was able to complete several jumps in one go in a fluent movement. It was the first time I had seen her do that, so I was pretty impressed.

She was clearly in learning mode. I sat down to observe her more closely. 

– “Wow, where did you learn all that?”

– “I’ve been watching older girls do that in school, daddy. Watch”.

She started jumping and counting out loud.

– “One, two, three, four, five, six, …”

She tripped on the rope.

– “Woohoo! Six!”

– “You go, girl!”

– “Again! One, two, three, four, five, nooooo…”

– “Five is good”.

– “No, daddy, five is not good. Again!”

She repeated the process a couple of times. She jumped seven (“Yes!”), four (“Nooo!”), five (“Pfff!”), six (“Yippie!”). I started noticing a pattern. It struck me that she alternated frustration with joy, and she let it depend on the number of jumps. Time for some questioning.

– “Why are you happy with anything above or equal to six, but unhappy with anything lower?”

– “It has to be at least six, daddy”.

– “Why six?”

She seemed really annoyed that I didn’t see her point. She thought I was pulling her leg.

– “Because I’m _six_ years old, daddy. Didn’t you know? What else could it be?”.

I was totally flabbergasted. She managed to impose some totally arbitrary pass/fail criteria on herself. Where did that come from? I thought that using pass/fail tests actually sabotages kids’ natural learning processes? But this appeared to come out of herself. No-one told her that she had to make at least six.

I wondered – maybe she just chose her age as a starting point, just to set some initial learning goals for herself? Was she planning on raising the bar later on when reaching six would have become too easy? Unfortunately I didn’t have the chance to follow-up on that – lunchtime!

Flash forward to work. All this reminded me of commonly defined pass/fail criteria such as

“90% of all tests must pass”

Really? 

In “Are your lights on?”, Jerry Weinberg uses the well-known “Mary had a little lamb” nursery rhyme to show how a seemingly straightforward statement is prey to multiple interpretations, depending on which word you emphasize. An invaluable heuristic when looking at requirements. Why not try that on the familiar pass/fail criterium stated above?

“90%”? What if the tests that would have revealed some serious errors happen to be in that 10% you so confidently dismissed? Why not 89 or 91?

“All”? You know “all” possible tests that can be performed? Are they all documented? Some of them might still be residing in your head. What if in the meanwhile we performed some more important tests that revealed serious risks? Are these tests part of “All”?

“Tests”? Do you only count scripted tests, or do you also take exploratory ones into account? What about important usability issues some users might have found? Or acceptance test checklists? Or automated checks? 

– “Must”? What if not all 90% passes? Does this mean your solution is without value? The customer might value other things than you do. Is it up to you to decide how much value is in there?

“Pass” ? What about behavior that is totally acceptable for your client, but that we find annoying? Pass or fail? What about tests that pass all steps, but that reveal important problems as a side-effect? Sometimes a test’s pass/fail decision is not binary.

My daughter went to school this morning and – for the first time –  took her own jump rope with her. I wonder how many % of her rope jump cases will pass this time.

Delivering the message

We are testers. We communicate, collaborate and report. We are the headlights of a project. We light the way, break the news, deliver messages – and they’re not always pretty. How do we make sure our point is taken? It all boils down to adapting our style of delivery depending on the content and the audience. Context is indeed everything.

The delivery of a speech is mostly done through the nonverbal channel (whereas the content itself is verbal). This includes all speech elements other than the words themselves: eye contact, voice, articulation, gestures, facial expressions, body language – even appearance. Using all of these effectively requires timing and practice, lots of practice. An additional problem is that there is a wealth of delivery styles to choose from. How do we get the message across in the best possible way?

A really smart person once said: “Smart people learn from their mistakes, but REALLY smart people learn from other people’s mistakes.”

I think that’s true. In the video below, councillor Phil Davison – a candidate for the position of Republican treasurer for Stark County, Ohio –   delivers a passionate nomination speech in front of a receptive audience. His delivery style is – well… – pretty peculiar. 

Spoiler: he wasn’t  elected.

The importance of discussion

Feynman on the importance of discussion

While I was on holiday, I immersed myself a bit more in the Feynman universe. And I must say – the combination of simmering French sun, lazy poolside-lounging and Feynman’s scientific and philosophical subjects worked surprisingly well. The result was like a tasty cocktail – the kind that gives you a light buzz in the head and that leaves you wanting for more.

Consuming too much of it would have probably given me a nasty headache too, but that didn’t really happen. The only lasting thing I got out of it was the desire to write some of the stuff down before I forget. So here goes…

In his 1964 lecture called “The Role of Scientific Culture in Modern Society”, Feynman states:

 “I believe that we must attack these things in which we do not believe.”

“Not attack by the method of cutting off the heads of the people, but attack in the sense of discuss. I believe that we should demand that people try in their own minds to obtain for themselves a more consistent picture of their own world; that they not permit themselves the luxury of having their brain cut in four pieces or two pieces even, and on one side they believe this and on the other side they believe that, but never try to compare the two points of view. Because we have learned that, by trying to put the points of view that we have in our head together and comparing one to the other, we make some progress in understanding and in appreciating where we are and what we are. And I believe that science has remained irrelevant because we wait until somebody asks us questions or until we are invited to give a speech on Einstein’s theory to people who don’t understand Newtonian mechanics, but we never are invited to give an attack on faith healing or astrology–on what is the scientific view of astrology today.”

“I think that we must mainly write some articles. Now what would happen? The person who believes in astrology will have to learn some astronomy. The person who believes in faith healing will have to learn some medicine, because of the arguments going back and forth; and some biology. In other words, it will be necessary that science becomes relevant. The remark which I read somewhere, that science is all right so long as it doesn’t attack religion, was the clue that I needed to understand the problem. As long as it doesn’t attack religion it need not be paid attention to and nobody has to learn anything. So it can be cut off from modern society except for its applications, and thus be isolated. And then we have this terrible struggle to explain things to people who have no reason to want to know. But if they want to defend their own points of view, they will have to learn what yours is a little bit. So I suggest, maybe incorrectly and perhaps wrongly, that we are too polite.”

It strikes me how relevant this out-of-context quote still is after almost fifty years.

We cannot overestimate the importance of a critical mindset. Testers may need that even more than anybody else. Sometimes we just need to attack common beliefs that have become axioms in a way. I think it was Mark Twain who once said “It ain’t what you don’t know that gets you into trouble. It’s what you know for sure that just ain’t so.” 

So, we need more discussions in our line of work – they’re a surefire way to advancing the testing craft. True, there’s plenty of discussions and controversies within testing already – the different schools of testing come to mind.  But what I feel lacking sometimes, is a desire to understand where the “other side” is coming from. Why are they thinking the way they think? What are their beliefs and motives? Can we prove their beliefs to be false?

I think I’ll make this my personal mantra:

  • Attack, but don’t attack what you don’t understand
  • Be credible
  • Be reasonable

Feynman on naming

How Feynman’s take on naming things is applicable to testing

Feynman’ s father Melville played a big role in shaping little Richard’s way of thinking. He used to read bedtime stories from the Encyclopedia Britannica.

“See this Tyrannosaurus Rex over here? It says here that this thing was 25 feet high, and the head was six feet across. That means that if it stood in our front yard, it would be high enough to put his head through the window, but not quite because the head is a little bit too wide, it would break the window as it came by”.

He always tried to translate things into some kind of reality, so little Richard would be able to figure out what it really meant, what it was really saying.

Melville would also take his kid for long walks in the Catskill mountains, telling him about nature and explaining that in order to really *know* something, you should start observing and noticing instead of merely naming (a thing most of his classmates seemed to do):

“You can know the name of a bird in all the languages of the world, but when you’re finished, you’ll know absolutely nothing whatever about the bird… So let’s look at the bird and see what it’s doing — that’s what counts. I learned very early the difference between knowing the name of something and knowing something.”

From “The pleasure of finding things out” (1981)

I think the above quote illustrates a phenomenon that occurs all too often in software testing: the nominal fallacy. Basically, this means applying a label or name to something and thinking you have explained it.

What about boundary value testing (or domain testing), for instance?

“First, we identify the boundaries, then we identify tests at each boundary. For example, one test each for >, =, <, using the first value in the > range, the value that is equal to the boundary, and the first value in the < range”.

A pretty straightforward and effective technique, right? We think we master it, until we realise that most textbooks are only talking about known and visible boundaries. What about the boundaries that are not known, not even by the developers that wrote the code? Most of the time, the software’s complexity stretches far beyond our imagination.

In case you need convincing: let’s revisit the ParkCalc exercise that took place a couple of months ago (here‘s a good write-up of that event by Selena Delesie). Matt Heusser put out a challenge to test ParkCalc, a small app on the Grand Rapids Airport website to calculate parking fees. The site provided entry and leaving times, date pickers and five different kinds of parking to choose from. Quickly, an instant test-flashmob formed via twitter. Testers from all over the globe jumped on the bandwagon. James Bach joined in and challenged everyone to beat his highest total parking fee. Extreme testing ensued. What followed was a good exercise in factoring, investigation and on-the-fly test design. And it happened to illustrate the complexity of boundary value analysis as well. 

To get an even better idea of this complexity, there’s always Cem Kaner‘s online course on domain testing. Do we know boundary value analysis because we know its most basic definition?

I’m not trying to open Pandora’s box here, but these nominal fallacies also apply to a testing certification that mainly focuses on definitions. Naming things isn’t enough. As Feynman put it: knowing something requires practice and observation. The benefit? No need to memorize anything anymore when real understanding kicks in. 

Of course, all this isn’t new. Half of the starr-cross’d lovers were already aware of this, way back in the late sixteenth century:

“What’s in a name? That which we call a rose
By any other name would smell as sweet.”

(Juliet Capulet)

Exploring Feynman

On my intention to start exploring Richard Feynman (1918-1988)

The Plan

I’m planning to do a little blog series on the late Richard Feynman to record some of my impressions and learnings while I work myself through this intriguing oeuvre of his. No, that summer heat is not getting to me, yet. I’m not exactly planning on processing his massive back catalog – I’m not really into path integral formulation or the behavior of subatomic particles, let alone the superfluidity of supercooled liquid helium. I do value the sparse free time that I have – time is on my side, yes it is. Rather, I’d like to document my exploration of his more popular works, audio and video recordings.

Exploratory learning, as you wish. Dipping into it all and savoring the juicy bits, spitting out the others. And relate things to testing, of course.

Why Richard Feynman?

Feynman intrigues me, and I have nothing but deep respect and admiration for the man. He was witty, brilliant and had this perpetual curiosity to discover new things (Tuvan throat-singing, anyone?). He opposed rote learning or unthinking memorization as teaching methods – he wanted his students to start thinking, for a change. How great is that?

On occasion, he was a totally nutty professor – a flamboyant geek. But he also happened to build a truly astonishing career which eventually earned him the Nobel prize in physics in 1965. 

I’m planning to gradually learn about him and post my progress here. Stay tuned!

Collateral features

About collateral features – things that were not expected, but do provide value in the end

Last year, James Lyndsay introduced me to his “Nr1 diagram of testing” (© Workroom Productions): a deceivingly simple model that tries to capture the essence of testing.

The circle on the left represents our expectations – all the things we expect. This is the area where scripted tests or checklists come into play to tell us about the value that is present. The right hand circle represents the actual deliverable – what the software really does. This never completely matches what was originally expected, so this is where we use exploratory testing to find out about risk. 

The diagram divides the world into four distinct regions:

  • The overlap. These are the things we both expected and got. 
  • The region outside the circles. That is all we didn’t want, and didn’t receive in the end. A pretty infinite set, that is.
  • The left-hand arc. This is what we expected to get, but didn’t receive. This means that the deliverable turned out less valuable than we had hoped.
  • The right-hand arc. The software system under test does all kinds of things things we didn’t expect. There’s clearly some risk involved, here, and it’s up to us to discover just how much risk there is.

Simplicity is of course the main strength of such a model. It can certainly help to identify or classify things in our quest to quickly grasp the essence of things.

These four regions got me thinking. I’d like to expand on that. What about things that we expected and received, but do not prove value, for instance? Unneeded features – not too unrealistic. Or – even better – what about things that were not expected, but are there and actually do provide value in the end? Immediately the term “Collateral features” came to mind: no matter how hard we try to create designs for certain uses, people will always utilize them in their own way. These unintended uses can be strange sometimes, but some them are downright brilliant.

Take a look at Alec Brownstein. While most people use Google AdWords to promote their business (after all, that’s what it was designed for), Alec used it to get a job. He was trying to land a job as a copywriter with some of the top ad agencies in New York City. He assumed that the creative directors at these top agencies would “vanity google” their own name. So he bought AdWords of the names and put a special message in to each of them. The rest is history. He now works for Y&R, after a total investment of $6 (story here).

Collateral features also emerged in the microblogging world. Because Twitter provided no easy way to group tweets or add extra data, the twitter community came up with their own way: hashtags. A hashtag is similar to other web tags- it helps add tweets to a category. Hashtags weren’t an official feature, but they sure made their way into the daily twitter work lexicon of billions of people. 

In an article in Forbes magazine called You Can’t Predict Who Will Change The World, Nassim Nicholas Taleb pointed out that things are all too often discovered by accident—but we don’t see that when we look at history in our rear-view mirrors. The technologies that run the world today (like the Internet, the computer and the laser) are not used in the way intended by those who invented them.

There will always be unforeseen usage of your software. Some prove risky, others do contain value. Some of these collateral features even replace the intended use to become main features and make their way in the ‘expected’ circle.  Ultimately, your customers make the final call. They decide how to use your product or service. Not you, not your marketeers.

– “But no user would ever do that!”

– “Fair enough. Wanna bet?”

Metrics – perverse incentives?

Trivia time! What do following events have in common?

  • In the American Southwest in the 1850s there was a high reward for the scalps of members of violent and dangerous Indian tribes. This led scalp hunters to slaughter thousands of peaceful agricultural Indians and Mexican citizens, women and children alike, for their valuable scalps.
  • In Vietnam, under French colonial rule, there was a program paying people for each rat pelt handed in. It was originally intended to exterminate rats, but it led to the farming of rats instead.
  • In the 19th century,  palaeontologists traveling to China used to pay peasants for each fragment of dinosaur bone that they produced. The measure was an instant success! It took them a while to discover that peasants dug up the bones and then smashed them into multiple pieces to maximise their earnings.

All these are examples of perverse incentives:  measures that have unintended and undesirable effects which go against the interest of the incentive makers. They become counterproductive in the end.

I’m probably suffering from an acute case of testing analogitis again, because over the years I have seen these things happen in testing as well:

  • Managers evaluating testers by the amount of bugs found.
    This resulted in the submission of tons of trivial and low-priority bugs. People that used to thoroughly investigate bugs and put a lot of time in pinpointing started lowering their standards. 
  • Managers evaluating testers by the amount of test scripts executed.
    This resulted in testers only focusing on scripts, not allowing themselves go off-script and investigate. This often meant going against their intuition for suspicious “smells” in the software, and it certainly did not encourage the use of exploratory testing.
  • Managers evaluating testers by the amount of “rejected” bugs.
    The rationale behind this was: less rejections mean more valid bugs, better bug descriptions and better researched bugs. But the result of the metric was that testers were reluctant to enter complex, difficult or intermittent bugs out of fear of them being rejected. But these are the bugs we want the team to tackle, right? 
  • Managers evaluating testers by the quality of the software.
    First of all, what is quality? If we use Jerry Weinberg’s definition, “value to someone (who matters)”, it becomes clear that any manager’s assessment of quality is highly subjective. If the rewards for testers depend on the quality of the software, that is highly unfair. We are no gatekeepers of quality; we cannot assure quality, because we do not control all aspects of it. The only thing such an incentive achieves is a highly regulated cover-your-ass culture with formal hand-offs, and certainly not team collaboration, continuous improvement or better software. 

These are all examples of metrics used as incentives for testers, but in most cases they just ended up creating a blame culture where quantity and pathetic compliance is valued above quality and creativity.

Dear managers, I’d say: focus on collaboration and team achievements, set goals for the team. Make the whole team responsible for the quality and the product. Then see what happens.