Points over time

We’re generally bad at estimating

Intuitively we all think we are more than capable to estimate how long something is going to take or how long that couch is. The internet howerver is full of stories proving that to be false time and time again.
People in general are bad at estimating things in absolute measures such as hours, meters, kilograms and so on. In regards with development planning these estimation mistakes happen mainly because of two factors. The planning fallacy, failing to consider previous experience in attempting a similar task when delivering an estimate, and optimism bias, assuming that no complications will happen during the completion of the task.
The good news is that we can do better by following a few steps or tricks if you will. The focus of this post is mainly around tech and software development planning but I think it can easily be translated for other uses too.

Rely on previous experience

We should rely on previous experience, focus on the general high level view “In the past, a task similar to this took X effort to complete.” rather than on specifics “What are the problematic points when approaching a task like this?”. Moreover, resist the temptation to divide-et-impera the task. To some extent it can be helpful but unfortunately for everyone, the whole is rarely the sum of its parts.
Using historical data is not a surefire way to make sure the estimates are completely accurate every time either but it is one that assures us our estimates are not completely unrealistic. It provides some boundaries and we’re mainly aiming for consistency rather than direct accuracy so things should even out if tor definitions are sound.

Estimate for someone else

Weird as this might sound, people in general are better at estimating when putting themselves in someone else’s shoes rather than when accounting for their own capacity. As such, pick a team member and try and estimate as if you were that person. This seems to be the case because when we’re estimating from this outer perspective we’re more likely to rely on our previous experience as an information pool essentially being more objective.

The uncertainty cone

In the beginning of any work process there are possibly dozens of unknowns that need to be baked into the estimation. As work progresses more and more things will be clarified and the estimation range will narrow. If you were to plot this range over time you’d end up with a cone, hence the name. Since we do our estimation at the start of the task, usually before any work has been done, we need to account for this range of possibility.
Estimating three scenarios forces us to challange our optimisim bias and come up with slightly better estimations. We should then estimate the worst, most likely and best case scenarios and then average the three to come up with the final estimate. This is probably the most intuitive step in getting to a better estimation.

What’s wrong with estimating in hours?

Nothing, estimating in hours is not inherently wrong, just less accurate than estimating in points.
Let’s take a look at the picture to the left. If someone were to ask “How many glasses tall is the bottle?” you’d quickly be able to say that the bottle is two glasses tall. Most importantly, regardless of how many people were asked that, all the answers would be the same.
Now imagine you were asked this instead: “How many centimeters tall is the bottle?”. It’d be hard to get the right answer and even if you did, someone else’s might be slightly or wildly different.

What is a story point?

A story point is a relative measure of size and complexity.
Some will say that my definition of a story point lacks the third component: unkowns. Whilst I guess this would be a techincally correct sort of point, I also think it misses the main reason for estimating in size and complexity.
No one will ever be able to reliably account for unknowns. That’s why they’re called unknowns. Obviously everyone should bake in their estimates an error margin but the thing is most of us do that intuitively. In my experience, asking team members to also account for unkowns on top of that just leads to estimation inflation with no argumentable explanation. Any reason given to justify the margin alloted to “dealing with unkowns” is completely subjectively justifiable. It is not condusive of debate but rather judgement.

What is the reference item?

The reference item is the item defined as always being worth 1 story point.
The reference item is different for each team. How it is defined is not within the scope of this post and I will cover that in the future going over a few processes to come up with a definition that works for everyone in the team.
What is adamant for any reference item however is that everyone within the team that uses it has a clear view of what it looks like, what size and complexity it has, what effort it requires. Ideally, everyone has attempted and/or done this item (or items very similar to it) in the past.
The reference item should be as clear as that wine glass in the picture before to everyone in the team.

How do we relate items?

It might feel natural or intuitive to take the reference item and figure out how long it would take you to complete it, then estimate how long the current item would take to complete and divide the two to get the story point count for the current item. This is a flawed approach. First due to all the reasons mentioned before and second due to the fact that you’re now esentially estimating in hours, not points.

The way to relate items is to first figure out how many times more effort the current item requires than the reference one. This way time is never part of the process and everything is kept relative to the reference only.

I.e.1: If the current item was twice as big and four times as complex as the reference then it would require 2 * 4 = 8 times as much effort to complete hence it would be worth 8 story points.

I.e.2: If the current item was ten times as big and half as complex as the reference then it would require 10 * 0.5 = 5 times as much effort to complete hence it would be worth 5 story points.

Isn’t the time to complete the item part of how many story points it has?

No. How long it takes to complete an item is not part of how many story points an item has. A story point is just a number showing how many times more effort the current item requires when compared with the reference.

Sprint velocity, team velocity

Sprint velocity is the number of story points the team delivers within a given sprint.
Team velocity is the average of sprint velocities over a set number of sprints, usually calculated per quarter (6 two week sprints).
These will vary sprint to sprint due to multiple factors such as team members being away, events taking place, items being over or under estimated and so on. That’s why a larger number of sprints is required to get a more accurate view of what capacity the team has.
When estimating longer term plans (quarterly or yearly road maps) the team voleocity should be used and when trying to see how a dropped in request could be dealt with it is in my experience better to rely on the sprint velocity of a sprint similar with the upcoming one in terms of context and team member availability.

Weren’t story points supposed to fix estimation variation?

Not really, an estimation, be it in absolute or relative units, is still an estimation and variations will happen from person to person. Story points are not meant to fix all variations, just the ones coming from using a reference that isn’t clearly visible to everyone.
Let’s go back to the previous analogy. Say there was some variation in the percieved height of the bottle (current item) due to unknowns, then obviously everyone would give slightly different answers to the question “How many glasses tall is the bottle?”. However, this will be caused by the difference in the percieved height of the bottle alone since the glass (reference item) is very well defined and visible to everyone.
Besides that, if the bottle (current item) is really fuzzy, even defining a very clear and visible reference won’t be able to compensate for the differences in estimation and it would still happen that looking back once the work has been completed we’d realise we under or over estimated the story. However, since the reference isn’t that granular this kind of thing wouldn’t happen often.

How do we know how long an item will take to complete?

How long it takes to actually code and/or deliver a story is not important, not from a team’s perspective at least. The only thing we need to know is our team’s velocity and make sure that each sprint we commit no more points than it allows when adjusted for events and other non-recurring things taking up capacity.
By doing this everything should be deliverable at the end of sprint. As such, everyone outside the team knows when to expect to take delivery of the work without needing to know how long specifically each item would take and nor would we need to know that.

Special case: Urgent out of scope story (creepers)

Sometimes it might happen that we would have to deal with an urgent story regardless of our previous commitment or on short notice. Very important here is the distinction between this and bugs as the latter do not fall in this special case explanation since they assume something that was delivered isn’t working as described in the acceptance criterias whereas the first category is about new features.
An “urgent out of scope story” or “creeper story” is a story that we have to commit to after the planning meeting has concluded and the sprint has started or needs to be completed before the end of sprint due to outside dependencies/commitments.
For these creeper stories we might need to know how long it would take to complete. The only way to answer this is by using the team or sprint velocity metrics. If we divide the sprint’s length in hours by it, then the resulting value is a factor that we can use to convert story points to the number of hours it should take to complete.

I.e: If our team velocity was 24 story points and our sprint was two weeks long (80 hours) then it means we, on average, would be able to deliver a creeper story worth 5 story points in about 16.5 working hours, so between 2 and 3 days.

This doesn’t make points have a certain time interval associated, it is simply a way to convert our team’s story points to a time interval within the context of the current sprint and team velocity.

References

Daniel Kahneman: Putting Your Intuition on Ice ( link to podcast )
Daniel Kahneman: Thinking, Fast and Slow ( link to Amazon )
Ben Yagoda: The Cognitive Biases Tricking Your Brain ( link to the article )
Dan Radigan: Story points and estimation ( link to the article )