Trivia time! What do following events have in common?
- In the American Southwest in the 1850s there was a high reward for the scalps of members of violent and dangerous Indian tribes. This led scalp hunters to slaughter thousands of peaceful agricultural Indians and Mexican citizens, women and children alike, for their valuable scalps.
- In Vietnam, under French colonial rule, there was a program paying people for each rat pelt handed in. It was originally intended to exterminate rats, but it led to the farming of rats instead.
- In the 19th century, palaeontologists traveling to China used to pay peasants for each fragment of dinosaur bone that they produced. The measure was an instant success! It took them a while to discover that peasants dug up the bones and then smashed them into multiple pieces to maximise their earnings.
All these are examples of perverse incentives: measures that have unintended and undesirable effects which go against the interest of the incentive makers. They become counterproductive in the end.
I’m probably suffering from an acute case of testing analogitis again, because over the years I have seen these things happen in testing as well:
- Managers evaluating testers by the amount of bugs found.
This resulted in the submission of tons of trivial and low-priority bugs. People that used to thoroughly investigate bugs and put a lot of time in pinpointing started lowering their standards. - Managers evaluating testers by the amount of test scripts executed.
This resulted in testers only focusing on scripts, not allowing themselves go off-script and investigate. This often meant going against their intuition for suspicious “smells” in the software, and it certainly did not encourage the use of exploratory testing. - Managers evaluating testers by the amount of “rejected” bugs.
The rationale behind this was: less rejections mean more valid bugs, better bug descriptions and better researched bugs. But the result of the metric was that testers were reluctant to enter complex, difficult or intermittent bugs out of fear of them being rejected. But these are the bugs we want the team to tackle, right? - Managers evaluating testers by the quality of the software.
First of all, what is quality? If we use Jerry Weinberg’s definition, “value to someone (who matters)”, it becomes clear that any manager’s assessment of quality is highly subjective. If the rewards for testers depend on the quality of the software, that is highly unfair. We are no gatekeepers of quality; we cannot assure quality, because we do not control all aspects of it. The only thing such an incentive achieves is a highly regulated cover-your-ass culture with formal hand-offs, and certainly not team collaboration, continuous improvement or better software.
These are all examples of metrics used as incentives for testers, but in most cases they just ended up creating a blame culture where quantity and pathetic compliance is valued above quality and creativity.
Dear managers, I’d say: focus on collaboration and team achievements, set goals for the team. Make the whole team responsible for the quality and the product. Then see what happens.
I agree with your view on metrics for tester performance becoming perverse incentives. Those are some great examples you gave of incentives not working out as designed. A quick google search of “firefighters starting fires” shows a whole page of stories where firefighters have started fires to receive bonus pay. I have been fortunate enough not to have to deal with metrics being used to evaluate my performance as a tester but I have heard the horror stories of those who have.
Check out this Dilbert Strip:
http://bit.ly/2PZGig
-Michael
Hi Michael, thanks for your comment. That Dilbert strip says it all, doesn’t it? By the way, Good to see another new tester blogger out there. Interesting stuff!
Those are some good examples when using metrics. There are also other situations when managers don’t evaluate or don’t create good criteria. Like forcibly pushing a QA Lead just like that, or thinking themselves how to test and imposing on testers. Also situations of not carrying at all about testers. The metrics appear for the need to set goals for testers, if you have for example saboteur testers who want to earn more money and blame other that don’t use scripting or other stuff. Managers come with different metrics (you have a good list BTW) to set apparently order. I think is very important for a tester to have passion and be persuasive to bypass all of this unpleasant situations. Of course is very hard sometimes.
So bad metrics sometimes appear because managers need to give reports to higher management and to set order if you have a larger team. But this is only apparent, if you are mall-intended and you plan the team structure in advance you can justify everything by bullshit so metrics in any situation is relative.
Sebi
http://www.testalways.com