Here are the four data sources I use, along with how they contribute to my efforts to engage in the scientific method by forming hypotheses, testing them, observing the results, deciding what to do, then forming new hypotheses to continue learning:
At Chegg, we executed an NPS survey in 2010, and the NPS was 60, which is very good for a startup. Our proxy metric for textbook rental was the percent of students who returned the next semester to rent another textbook. But this required us to wait a full semester to evaluate if our product was getting better.
If you were writing essays on Substack, what data would you rely on to determine the quality of an essay? Below, I share the data for the last thirteen essays I have written, from most current to mid-February:
I\u2019ll start with an unusual request. If you have read at least two of my \u201CAsk Gib\u201D essays, please click the link below to provide feedback for this product newsletter series. There are only three questions, and your feedback is incredibly helpful to me:
Did you complete the survey? Please do it before you read this essay, as it provides helpful context. You\u2019ll also be able to look at the real-time results at the end of this essay, and it\u2019s more fun if your feedback is reflected in the results. It only takes one minute!
What surprised me was the emotion behind some of the comments. As I dug deeper, I realized that NPS takes on almost mythical status as a \u201Csingle source of truth\u201D within some organizations because it presents a single number. It appears to be quantitative data, but it\u2019s not. It\u2019s a numerical representation of qualitative data.
NPS is a qualitative measure that describes what customers say \u2014 not how they behave. But NPS surveys have been helpful to me, so I sometimes use them when I don\u2019t have better proxy metrics for product quality.
In this essay, I outline how NPS works, how I used it at both Netflix and Chegg, and then share the challenge of finding meaningful proxies for product quality using data from my \u201CAsk Gib\u201D essays.
Bain & Company created the Net Promoter Score in 2003 to measure brand loyalty and a product\u2019s potential for word-of-mouth growth. Below I share the first question in an NPS survey, using an example from a recent \u201CAsk Gib\u201D essay:
Respondents answer the question on a zero to ten scale. A Net Promoter Score is calculated by taking the percentage of customers who rave about the service (9s & 10s) and subtracting the percentage of \u201Cdetractors\u201D \u2014 those who give a rating from zero to six.
Twenty-six readers answered the question. 88% percent were promoters, and none were detractors, so the NPS score is 88. Most consider a score of 50 to be very good/great and 70 to be \u201Cworld-class.\u201D High scores suggest the potential for strong word of mouth growth and retention.
So why the controversy about NPS? In this case, it reduces twenty-six responses to a single number, giving it the weight of quantitative data, which it\u2019s not. It\u2019s what folks say \u2014 it\u2019s not a measure of their behavior. And often, what folks say and what they choose to do, is different. Last, this is just twenty-six responses from a population of 5,000 subscribers. It doesn\u2019t represent the readers who stopped reading before the end of the essay, and there\u2019s lots of other potential bias.
For me, NPS is like a compass \u2014 it\u2019s directional. NPS doesn\u2019t present a complete story. I constantly remind myself, \u201CIt\u2019s just a number.\u201D I\u2019m happy with high scores and bummed about low scores, but with a small number of respondents, the NPS can be very \u201Cnoisy,\u201D so I don\u2019t lose sleep when I receive a low score. But I look at the comments to see how I can edit the essay to make it better. I also apply the learning to future \u201CAsk Gib\u201D essays.
I engage in qualitative \u2014 focus groups, one-on-ones, usability, ethnography \u2014 to hear how people think and react to the work. I also use these tactics to get the \u201Cvoice of the customer\u201D in my head and to discover new ideas.
I execute surveys to capture who the customer is and how to think about them \u2014 by demographics, competitive product usage, entertainment preferences, etc. NPS is one tool in this survey toolkit.
I A/B test hypotheses formed via the inputs above to see what works or doesn\u2019t. For me, A/B tests are the \u201Cbig dog.\u201D It\u2019s the only method that helps me reliably measure how different hypotheses affect customer behavior. A/B tests also help me measure trade-offs between customer delight and margin.
Unfortunately, A/B tests are not feasible for all products and organizations, so I\u2019m occasionally forced to rely on other data sources. At both Netflix and Chegg, however, most of our consumer insights came from combining the four sources above.
At Netflix, we collected NPS data but rarely looked at it. We had a much better proxy metric for product quality\u2014 the percentage of members who canceled each month. Monthly customer retention is a much better proxy metric for product quality than NPS. It measures what customers do, not what they say.
We needed a proxy metric that would give us faster insight, so we implemented ongoing NPS surveys. NPS improved as our selection, pricing, and delivery speed got better, and eventually, we drove NPS into the high 70\u2019s. Along the way, we shared this data with potential investors \u2014 our increasing NPS score was one of the reasons they gave us additional funding rounds.
Another fun fact: Netflix used NPS when the service launched in South America, and the score was very high. So they were surprised when they had poor retention during their first few months. It turns out that Brazilians are very generous \u201Cgraders\u201D (while folks in Australia and Germany are stingy with their 9\u2019s and 10\u2019s). This is one of many reasons you have to be careful not to compare NPS from one industry or country to another.
I chose not to include the email open rate for each essay as it\u2019s always around 50%\u2014 there\u2019s little variation. And as much as I\u2019d like to A/B test the three possible landing pages for \u201CAsk Gib\u201D (required sign-up, essay-specific landing page, and a full list of essays), neither Substack nor I have the tools to do this.
NPS provides an easy-to-understand signal to me, and the \u201CWhat\u2019s good?/What could be better?\u201D verbatims are incredibly helpful. The comments help me to understand what qualities in an essay inspire a share.
Likes (\u201CHearts\u201D) are a straightforward feedback system. Hearts give a sense of the quality of the essay but give no insight into the \u201Cwhy.\u201D There\u2019s also secondary value in hearts because they help readers decide whether an essay is worth reading. (NPS scores are not nearly as well understood as hearts, and many folks find my publicizing my NPS scores annoying.)
Sign-ups after one day is also a reasonable proxy. A great essay can inspire folks to sign up for the newsletter. But this proxy is noisy\u2014 I get many new sign-ups after I publish an essay after a long gap in my writing. There\u2019s pent-up demand.
So what\u2019s the best essay on the list above? It\u2019s tricky, but note that the \u201CProject v. Outcome-based Roadmaps\u201D essay tops most of the data points I listed, and the qualitative from the NPS survey gives me a strong sense of \u201CWhy?\u201D The essay finds a middle ground between the abstract and the \u201Creal world,\u201D has strong examples, and provides a framework that readers can apply to their jobs tomorrow.
In writing this essay and looking closely at all of the quantitative and qualitative data, I find shares to be the most helpful proxy metric. The NPS data reinforces my confidence in shares as a proxy \u2014 there\u2019s a reasonable correlation between shares and NPS, especially if you toss out the NPS scores with a lot of noise. (Ignore the NPS scores with fewer than 20 responses). The other benefit of shares is it\u2019s not annoying and requires little time/effort by you \u2014 my \u201CAsk Gib\u201D readers.
The value of humility in describing my successes and failures as subscribers (and I) learn from both. If you only talk about success, you aren\u2019t believable. If you dwell too much on failure, you lose credibility.
Does stuff go wrong with NPS? Absolutely. I\u2019ve learned to be careful about comparing results from one talk or essay to another. And if there\u2019s a small sample, NPS is really noisy \u2014 both high and low. So when I get a high or a low NPS score, I have to nicely remind myself, \u201CIt\u2019s just a number,\u201D as I dig into the qualitative to form hypotheses for my next experiment.
Thanks for participating in my \u201CAsk Gib\u201D surveys. While I used NPS as a secondary data source at Netflix, NPS did help us to get funding at Chegg. I\u2019ve found NPS to be a reasonable proxy metric for product quality for my talks and essays \u2014 though I still consider multiple data sources.
This has been a fun essay for me to write, and I learned a lot along the way. Many thanks to Duncan Schouten, a product manager at XWP in Vancouver, who sifted through all of my \u201CAsk Gib\u201D data. He nicely captured the spirit of this exploration:
90f70e40cf