Metrics – perverse incentives?

Trivia time! What do following events have in common?

  • In the American Southwest in the 1850s there was a high reward for the scalps of members of violent and dangerous Indian tribes. This led scalp hunters to slaughter thousands of peaceful agricultural Indians and Mexican citizens, women and children alike, for their valuable scalps.
  • In Vietnam, under French colonial rule, there was a program paying people for each rat pelt handed in. It was originally intended to exterminate rats, but it led to the farming of rats instead.
  • In the 19th century,  palaeontologists traveling to China used to pay peasants for each fragment of dinosaur bone that they produced. The measure was an instant success! It took them a while to discover that peasants dug up the bones and then smashed them into multiple pieces to maximise their earnings.

All these are examples of perverse incentives:  measures that have unintended and undesirable effects which go against the interest of the incentive makers. They become counterproductive in the end.

I’m probably suffering from an acute case of testing analogitis again, because over the years I have seen these things happen in testing as well:

  • Managers evaluating testers by the amount of bugs found.
    This resulted in the submission of tons of trivial and low-priority bugs. People that used to thoroughly investigate bugs and put a lot of time in pinpointing started lowering their standards. 
  • Managers evaluating testers by the amount of test scripts executed.
    This resulted in testers only focusing on scripts, not allowing themselves go off-script and investigate. This often meant going against their intuition for suspicious “smells” in the software, and it certainly did not encourage the use of exploratory testing.
  • Managers evaluating testers by the amount of “rejected” bugs.
    The rationale behind this was: less rejections mean more valid bugs, better bug descriptions and better researched bugs. But the result of the metric was that testers were reluctant to enter complex, difficult or intermittent bugs out of fear of them being rejected. But these are the bugs we want the team to tackle, right? 
  • Managers evaluating testers by the quality of the software.
    First of all, what is quality? If we use Jerry Weinberg’s definition, “value to someone (who matters)”, it becomes clear that any manager’s assessment of quality is highly subjective. If the rewards for testers depend on the quality of the software, that is highly unfair. We are no gatekeepers of quality; we cannot assure quality, because we do not control all aspects of it. The only thing such an incentive achieves is a highly regulated cover-your-ass culture with formal hand-offs, and certainly not team collaboration, continuous improvement or better software. 

These are all examples of metrics used as incentives for testers, but in most cases they just ended up creating a blame culture where quantity and pathetic compliance is valued above quality and creativity.

Dear managers, I’d say: focus on collaboration and team achievements, set goals for the team. Make the whole team responsible for the quality and the product. Then see what happens.