At Agile-Carolinas, Laurie Williams gave a great presentation last night on how she does Planning Poker. (Suggestion: Google her. She is a professor at NC State, but that is just the start. She is very good.)
Some of the discussion made me realize that there are many issues out there.
So, here is how I recommend doing it. This is overly simplified. Later I will discuss some small differences between what I recommend and what Laurie recommends.
The BasicsAssume for now we are in Release Planning (before starting Sprints).
Assume the user stories have been created. Some are epics.
Assume business value has been addressed (somehow).
Assume the stories are on cards.
Assume 50-400 cards.
Assume our domain is software development (not an essential assumption).
Gather the team of pigs, including the PO and SM. Assume the team size is 7.
Assume the PO does not know how to do 'real work' (ie, implement a story).
The team selects the smallest story from the set of user stories.
If the story is not 'silly small' (less than 4 ideal hours of work) and not 'too big' (umm, I won't define that here), then this story becomes 'the reference story'.
Give the team planning poker cards with the modified Fibonacci numbers. One set per person.
The PO explains the most important story. The whole team discusses, asks questions, etc.
The SM looks for 'enough discussion roughly equally from all'.
If the team has enough info to estimate it, then each implementor chooses the Planning Poker card (number) that represents the relative size-complexity of this story versus the reference story.
Each person is thinking of the full effort for both stories from all the skill sets, not just the effort for 'my own' skill set (eg, if he is a tester).
Each person is thinking about the 'definition of done' for the team. And the effort to complete the story in accord with that DOD.
The PP card is selected privately. Meaning people do not influence each other. We want each person's independent opinion.
The SM waits until everyone has selected a PP card/number. And then says something like "1-2-3, show" and everyone reveals his or her number at the same time.
The highest and the lowest numbers talk. Usually each person explains the assumptions they used to select that number. Typically, if they were business assumptions, they all look to the PO to 'answer'. If they were technology assumptions, they agree on which assumption is correct or they choose an assumption (perhaps a compromise of the two opinions).
Typically the team votes again with the PP cards.
If the result is that the team selects PP cards within 3 consecutive cards (eg, some 3's, some 5's, some 8's), then the team adds up all the cards, and divides by the number of voters. To arrive at an integer (no decimal places), representing the average opinion. Note that typically the final number will not be a Fibonacci number (eg, a 6, not an 8 or 5).
The studies on and methods around wide-band delphi estimation propose a fancier calculation (which I have forgotten), but averaging seems to be accurate enough and fast enough. So we do that.
The idea is that they have discussed and agreed on the story that much (feature and basic implementation). They do not have to fully agree on the relative size (effort).
So, you see that the team reaches a degree of consensus (ie, everyone in a 3 Fibonacci number range), but complete consensus is not forced. Or, we might say, differences of opinion are respected. So, we do not require that the team reach consensus down to all voting one Fibonacci number (the same PP card).
We are kind of fuzzy about whether we mean size, complexity or effort. My answer is 'yes'. (A bit over-simplified.)
In practice, it takes some time for the first few stories to be estimated. Maybe 10-20 minutes each. But as the team moves along, many of the basic assumptions have been decided, and things speed up. Some times perhaps too much. We want discussion and learning.
Now, two important feedback loops.
1. Product Owner: We do not want a stupid person (meaning: a person who has no clue about the 'real work' of implementation) anchoring the team on an estimate. So, we typically do not recommend that the PO vote. But, technology projects are all about cost-benefit trade-offs. So, what we usually recommend is that, after a story has been estimated, the PO can say 'well, I don't want to spend that much on that feature; what can we do?' And the team with the PO might modify the feature or the nature of the implementation so that the cost-benefit ratio is more reasonable.
2. At the end. After the team has completed Planning Poker (for all the cards?), the team should look back at all the numbers on all the stories. By now, the team has learned a lot. Do any of the story points on any of the cards now look 'stupid'? (That's the technical phrase.) If so, the team can refactor those story points.
***
Enough today. More later.
One thing I must emphasize. The value of Planning Poker is in the discussion-learning (80%) and only slightly in the final story points themselves (maybe 20%).