Features Pricing Blog
March 14, 2018
The Future is Statistical

This post is the first chapter of our new ebook - The Complete Guide to Statistics for the SaaS Executive (follow the link to download the book).


In my time at Google, I only once spoke to Eric Schmidt. But that one conversation changed my life. Talking about the future of mathematics, Schmidt told me,

“The world will be inherited by statisticians”

As a math geek, I was fortified, but his statement was already being proved all around me. Everything in tech is based in math—we are all part of a data machine. Statistics allows us to draw insight from the wealth of data this field is generating.

Stats has been applied to medicine, politics, and finance. Now it's tech's turn. Statistics is becoming more and more integral to the companies we run and the products we make. The technologies of the future that are starting to come online now—artificial intelligence, cryptocurrencies, predictive analytics—are all, deep down, based on statistics. They are at their core the regressions, probabilities, and p-values that you learned about in high school.

This is why we are writing this guide. To help SaaS executives get a better understanding of statistics so that they can a) understand their business now, and b) prepare their business for the future. In this first chapter of the complete guide to statistics for the SaaS executive, we want to show you exactly how the world will be inherited by statisticians.

Artificial intelligence: how recommenders use math to know your mind

Amazon and Netflix seem to have windows to your soul, picking out just the right movie for a rainy day or a book about business you haven't yet heard of.

recommendations.png

They can do this because they do have a window into your soul—your actions. Every action you take on Google, Facebook, YouTube, Amazon, etc. is fed into their artificial intelligence algorithms to learn more about you and offer more of what (they think) you want.

AI isn't only the basis of recommender systems. The natural language processing of Siri or Alexa, the self-driving cars of Google or Tesla, even game-winning abilities of IBM's DeepBlue or DeepMind's AlphaGo—AI is now embedded in a massive range of technologies we already take for granted.

AITimeline.png

Source: CBInsights

At the heart of this artificial intelligence is machine learning—the ability of programs to learn from previous data to predict future events. And machine learning, when you break it down, is just very fast, very advanced statistical modeling.

machinelearningdiagram.png

Source: CBInsights

Support vector machines, linear regression, k-means, naive Bayes—all these are statistical techniques that you would learn in a college math class. Even the much-vaunted neural networks effectively work on probability. Amazon's machine-learning recommender is built on a similar statistical concepts, albeit with multiple equations and far more variables. In particular it is built on one of the simplest—regression analysis.

In statistics, regression analysis determines the relationships between points in a data set. The relationship is described by an equation which can be used to predict future outcomes. The equation is graphed to display a curve emerging from a smattering of data. A basic example is the linear correlation between height and weight.

linearregressionexample.png

Source: PSU

Regression can be used for one independent variable and one or more dependent variables, growing more and more complex as more variables are added. In the height and weight example, weight (y) is dependent on height (x). 

Here's an image from the patent for how Amazon's recommendation technology works:

amazonrecdiagram.png

Source: Patents

The vital component here is the “recommendation process.” This takes the inputs—user profiles, purchase histories, cart contents—and performs a regression analysis. As more and more variables are added, Amazon can draw patterns from the data and predict what items you might like from the trend curves.

In statistics, there's always a margin for error, and machine learning is all about reducing that margin. So if statistics is the way Amazon knows your purchase of a camping stove indicates you're likely to buy organic toothpaste, machine learning is the way those predictions get more accurate over time.

Cryptocurrencies: bit-heads or bit-tails?

Cryptocurrencies and the underlying technology of the blockchain have the potential to upend the way sensitive information is stored and transmitted. While blockchain as a system is new, it's built on the backs of preexisting concepts in mathematics:

Elliptic Curve Digital Signature Algorithm (ECDSA) is the main component of the blockchain. This allows people to sign their transactions and allows other people to verify those transactions. It is a complicated concept, but at the fundamental level is related to algebra and geometry.

Hash functions are functions that take a variable length input and allow you to output something of fixed length. They can be designed to be one-way, meaning they are incredibly difficult to reverse. The RIPEMD-160 cryptographic hash function is used in Bitcoin.

Both of these relate to an even more fundamental component of statistics that is crucial to the success of any cryptocurrency: probability. Perhaps you've already invested in a fluctuating cryptocurrency or two, experimenting in this brave new world. Without the BoA controlling your money and The Fed backing them up, how do you know your money is safe? Because of statistics.

If you've ever bet on the likelihood of Bitcoin's success, then you've used statistical probability. What you probably didn't realize is that Bitcoin's existence hinges on probabilities (as well as improbabilities), too.

Probability is easy to grasp, and factors into business all the time, from sales forecasts to risk events. You're subconsciously using probability all the time, whenever you say “the odds are...” Probability is the likelihood that a future event will occur. All probabilities are between 0 and 1 and can be expressed as fractions, decimals, and percentages.

The equation for determining the probability of an event A is:

P (A) = possible ways A can occur / total number of outcomes

The simplest example is flipping a *physical* coin. If A is landing on “heads,” then P (A) = 1/2 or 50%.

Probability in Bitcoin is not much more complicated than flipping a coin or rolling dice, though the numbers are much larger. Bitcoin wallets contain private keys which are only visible to the owner and are applied to every transaction the owner makes from that wallet. Each transaction also generates a public key, which is visible to bitcoin miners who record and publish transactions. Private keys must be unique. Otherwise, two owners with the same key would withdraw from the same pool of funds.

bitcoinearthsand.jpg

Source: WeUseCoins

The way bitcoin creators designed it, there are approximately 2^160 private keys possible, making a collision statistically improbable, though not impossible. The odds are, even in a scenario with 1 billion users with 10 wallets each, less than 0.000000000000000000000000000000000000684%.

What if someone wanted to steal your key? Could they just guess it and take all your Bitcoin? Say they had this password cracking machine, capable of 350,000,000,000 guesses per second. It will crack your Windows password in six hours. How long for your Bitcoin private key?

1.32x10^29 years. Considering the life expectancy of the earth is 7.8x10^9 years you can probably say your Bitcoin keys statistically safe.

Probability also determines the verification of transactions through the order by which they are recorded.

Confirmation is the process by which blocks containing records of transactions are added to the blockchain. Once a miner has solved a block, it must be confirmed by the network. Confirmation takes about 10 minutes. This ten-minute figure is based on a probability calculation that a block that has been mined will be found and added. About 2/3 of transactions are confirmed within that 10-minute timeframe, and about 95% are confirmed within 30 minutes.

bitcoin_blockchain_confirmations_security.png

Source: ImponderableThings

Bitcoin's security is perhaps too reliant on these improbabilities. On the one hand, credit card numbers are a lot easier to guess, making fraud a lot easier with traditional transactions. On the other hand, because banks are centralized, reliable verification of payments is transparent and direct in comparison to Bitcoin's system.

Bitcoin's future remains unclear, but it has uncovered a wealth (maybe) of opportunities to look at current transactional systems and apply our latest knowledge of data to come up with innovative solutions.

Predictive analytics: data-driven everything

Companies' access to deep troves of data has never been greater. Personal data is used by Starbucks to offer a deal on a customer's preferred espresso drink and by fine-dining restaurants to engage customers in discussing sports. In B2B SaaS, sophisticated analytics platforms inform employees about customer attributes and behaviors, sales performance and growth, and countless other metrics.

Here we can use an example that is closer to home than Amazon's AI or Bitcoin's crypto. MadKudu, a B2B SaaS company specializes in predictive lead scoring. They use the signals coming in from your visitors and correlate that with your most successful customers. They use both behavioral data—the visitor downloaded an ebook or signed up for the freemium product—and demographic data—the visitor is a CMO or comes from a company with 100+ employees.

Using tracking and data enrichment services such as Clearbit this lead scoring can be completely frictionless to an incoming visitor. If they score highly (i.e., their behavioral and demographic traits correlate with success as a customer), then a member of the sales team will be notified and the little Drift box in the corner of the screen will magic into life. If they don't score high, then the sales process becomes no-touch. In this way, MadKudu helps companies minimize their CAC and improve their unit economics. All through statistics - in this case, correlation and data mining.

According to data mining pioneer Alex Zekulin, data mining is “the process of extracting previously unknown, valid, and actionable information from large databases and using it to make crucial business decisions.”

Data_Mining_1-1.jpg

Source: Simplilearn

In statistics, conclusions are drawn using predetermined models. In data mining, algorithms search for patterns in large, complex databases to form its models. The two very similar practices are often used in tandem. Data mining is most powerful when multiple databases are combined. SaaS companies are particularly well-positioned to band together for superior data analysis.

MadKudu and Clearbit helped Geckoboard automate its inefficient lead scoring system into an entirely automated and highly accurate prediction machine.

When Geckoboard first employed Madkudu's algorithm, they were able to identify leads, but not the most valuable ones. Geckoboard's marketing lead instructed MadKudu’s algorithm to analyze leads by LTV and so the score model adjusted for that, providing them with access to the specific customers they wanted to target.

behaviorial-plus-advanced-points-based.png

Source: Clearbit

Clearbit gave the sales team information on the customers so they could segment the customers by demographics, industry, and job titles. By the end of their tinkerings, Geckoboard could predict 80% of their conversions from just 12% of their signups automatically.

Predictive-Analytics-Blog.png

Source: Amadeus

In statistics, predictive analysis uses existing models to interpret limited amounts of data to predict future outcomes. For instance, a regression model can be used to predict the probability of a sale or churn. The analysis of several groupings of data can come together to form a larger picture of a day, a person, or a company, and accurately predict what might happen next.

Dressed up, but still statistics

While at a glance, machine learning, bitcoin, and data mining all seem futuristic and fantastically complex, these techniques are just iterations of basic statistics for business. Statisticians will inherit the earth because of their curiosity about data. You're probably already curious about your own company's data, your competitor's data, and the greater trends in the marketplace.

With a grasp of the narratives our data is telling us, we can learn more about our customers, our performance, and our future. We can also make the most of cutting-edge analysis and artificial intelligence. First, we have to look a bit closer at the math.


This post is the first chapter of our new ebook - The Complete Guide to Statistics for the SaaS Executive (follow the link to download the book).

Retain cta big.png


SaaS Economist
comments powered by Disqus