What does it mean to be correct?

Meets the spec
Bug free
Stable
Long uptime

E.G. for robots:

Doesn’t collide with things
Doesn’t get stuck navigating
Picks intended item & places in correct bin
Reports errors

Correctness is a bar to hit. And how much value you get from correctness is very non-linear. If your app is 50% bug free, it is an unusable buggy piece of crap. Even 80% bug free is probably still worthless. But somewhere you get to a point where it is correct enough to use and you jump up to close to 100% value.

It is basically impossible to get to 100% correct for any reasonably sized system. But getting closer and closer to 100% correct is very expensive. In general it is much easier to go from 0% correct to 90% correct than from 99% correct to 99.9% because you’ve fixed all the easy stuff. You have to start hunting down very rare and tricky issues.

There’s a Correctness Sweet Spot

If you combine these plots you get a plot of how much value you get from working on correctness vs. how much effort you put into it. And it shows that there is a definite sweet spot where putting more effort into correctness doesn’t get you much more value.

You want your sweet spot a little in from the cliff so you have some wiggle room in case you have a small regression in correctness, or you have misestimated where your cliff is.

Different industries and problems have different cliffs in different places. A recipe app has a much lower bar of correctness than a surgical robot or rocket. Also certain parts of a given product have different cliffs. (Hardware generally has a higher bar of correctness than software because changing it is difficult, and collision systems have a higher bar of correctness than business logic because hurting people is bad.)

It successfully packages 49 out of 50 gift boxes, and only catches on fire twice a week

How do you get correctness?

Activities that give correctness are things like:

Code review
Unit tests
QA
Manual testing
Release testing

In general, getting correctness is a defensive activity: it is about setting up technology and processes to prevent anything that is correct from going bad so you can slowly ratchet up correctness over time.

Being Good

What does it mean to be good?

Fits a need
Easy & joyful to use
Creates a ton of value
Feels like it costs way less than its worth
Solves a problem people care about

Being good is about meeting a market need, creating value and delighting customers. Unlike correctness, there is no upper bound on how good you can make something, and it is always more valuable to make something better.1

Like correctness, it gets harder and harder to make a good thing better, but not as steeply. Instead of hunting the 1 in 10,000 bug you get to explore the space of all the things you could change about your product that would make it more useful to people.

Which means that the value-to-effort curve ends up having diminishing returns, but not asymptotic returns, which is way better.

How do you get good?

In order to make your product good, you follow the product development cycle.

Making a guess

To make a guess you think about your product and the people who will use it, using all the information that you have and you guess what the product should be like. This can be things like “there should be a button to log out”. Or things like “If the robot detects a person in the chair, it should skip vacuuming it”

The more you understand your customer the better your guesses will be. Some people are better guessers than others. (We usually call them product managers). And sometimes your customers will have good guesses. (Though more often you will have to try and guess at the need that made them think of that feature and figure out a good way of meeting it.)

But the difference between great guessers and average guessers is, in my experience, pretty small. Because the average guess is wrong. Wildly wrong. The best guessers are probably right 60% of the time. This is why so much of startup advice is to ship your MVP as fast as possible. Taking any reasonable guess and trying it out is much more efficient than agonizing over the “perfect” guess.

Trying it out

Trying it out means building a version of your guess and using it. There is a spectrum of ways to try it out.

In the beginning you can easily tell if a guess is good or bad by trying it yourself.

Robots are hard and expensive to deploy so I’ve often hired a team of test users to fake-use the product. (At Robust AI we have a little fake warehouse next to our desks and hire pickers to come and pick fake e-commerce orders with our robots twice a week)2.

Early on, many guesses are so obviously bad that trying them yourself immediately gives good feedback. But later, each change is a smaller step and until you start watching customers use it you will have some uncertainty (because your testing might be different in some way from the real customer use). Eventually you get to the point where the uncertainty is larger than the size of the change and you can’t tell if you are making it better or worse. That means that there is an upper bound on how good you can make a product without watching real customers try it.

And, for the same reason that humans are pretty bad at guessing what product features will be good, we are also pretty bad at guessing how our testing will be different from reality.

All of that together means that until you are testing with customers, there is a ceiling of how good you can make your product, and that ceiling is pretty low.

Iterations: more is better

Why do people tell startups to “iterate like crazy”? In each iteration you make some amount of progress up your ‘good product’ ramp. That amount of progress in each iteration can be “engineering limited” or “guess-quality limited”.

If you have a fixed release cycle then you take the things you know from watching customers, order them by how much goodness you think they add divided by how much effort they take and work off the top of the list until your next release.3

Example:

Show an arrow indicating where the pick is.
Add happy beep after correct scan
Faster navigation in open corridors
Stop parking motion earlier
Give humans more clearance when passing
More confident motion through doors.

If you do this, you will always feel “engineering limited” because that list of things is much larger than you have time to do.4 But generally, iterations longer than a week are guess-quality limited. That is because if you get to watch a customer use the product after just doing item (1) and (2) you will probably immediately have a new guess for an improvement that is way more valuable than item (4) or (5) on your list.

When you have long iteration cycles you are trapped doing the low value stuff on your list because you only learn about new high-value stuff by iterating.

With slower iterations you are “guess-limited”. Or maybe more accurately “learning-limited”. A quarter long iteration is not much better than a one month iteration, but takes three times as long.

Somewhere around 1-2 week iterations you tend to stop being learning limited.

Chuck Rossi, the release manager when I worked at Facebook5, observed that there seemed to be a fixed amount of change that could fit in a single deployment. So to increase company velocity he increased deployments of the website from weekly to daily, and then to twice a day and took deployments of the mobile apps from six to four to two week cycles.6

Iterate like crazy!

Correct vs. Good: What’s the difference?

What does it mean to be correct?

There’s a Correctness Sweet Spot

How do you get correctness?