In this article:
- How machine learning works in Natero?
- How does the model building process look like?
- How you can help us improve the models?
- How the model decides which factors are most relevant to churn?
How does Machine Learning work in Natero?
At Natero, we use state-of-the-art machine learning technology to build predictive models and algorithms that apply to SaaS vendor’s unique scenarios and types of data to proactively alert them to customers at-risk, as well as those that represent sales opportunities.
How we do this, in general, is to look at vast amount of historical data of your customers, and build models around these data to identify common behaviors, patterns, or attributes of those customers (e.g. size of customer, activity level, feature usage, invoice history) who have churned, converted, or expanded; and then use these models to predict if any of the above event is likely to happen to a current customer and how reliable that prediction is.
In other words, these advanced machine learning models analyze hundreds or thousands of factors to determine which are the most relevant indicators of churn, expansion, conversion, etc. Normally these factors are not apparent or easily captured by a simple business rule.
This process involves concepts such as “data preprocessing” -- the way of converting raw data to meaningful data sets; “feature selection” -- the way of finding the most useful feature in module building; “data training” -- a set of data used to fit a model that can be used to predict future business outcomes; “data labeling” - which makes data more readable and thus allows careful and meaningful use of data; as well as “feature ranking” -- the method of deciding which features are most useful to get better prediction in a model, etc..
How does the model building process look like?
Predictive analytics offers greater accuracy than those that rely on intuition or guesswork, when it comes to predicting future customer actions and business outcomes. However, it is useful only when enough historic customer data (especially product usage data) has been gathered and consumed; as well as when these data have been properly labeled for e.g. "churn", "upgrade" and "trial conversion" etc..
Natero keeps monitoring the data it has received from you until it believes that a reasonably well performing model can be built based on your account filtering criteria and churn definition. We will then reach back out to you to kick off the model building process, usually with the initial prediction results generated from multiple models for review.
We’ll then refine the models based on your feedback and hand the new prediction results over to you for further review. This review session might take a few rounds until you feel that a reasonably well-performing model can be achieved.
We’ll then turn on the predictive feature within your Natero instance for you to use all throughout the product: you’ll start receiving predictive churn alerts on your accounts; you can include predictive churn score in account health score and use the predictive churn flag in the rule builder etc..
As more data comes in, Natero’s predictive models automatically adjust for changes in customer behavior and become more accurate in their ability to predict customer actions - with absolutely no effort required from you.
How can you help us improve the models?
Natero does not apply one single predictive model to all customers, each predictive model will be highly customized to your specific business scenarios and objectives . This is because each business is different, therefore, the definition of at-risk accounts can be widely varied too.
Prior to the model building process, we usually ask questions on how you'd like to filter your accounts for churn prediction and your definition of churn scenario. For example, you can ask us to look at only paying accounts and define churn scenario as subscription cancellation. Or you can ask us to look at only trial accounts and define churn scenario as them not converting to paying accounts at the end of the trial. We'll then take these criteria to build the initial models.
During the model building process, your feedback is very important to help us adjust the model input and improve the prediction accuracy as much as possible.
Generally, we’ll start with a few best performing models (out of a pool of potential models) and include the prediction results for each. We’ll need you to do the following things:
- Mark out the accounts that are absolutely not right (won't churn) or right (will churn).
- Pick a model you’d rather see among all the models (even if it is the least worst).
You might feel compelled to understand the different factors that the models consider to predict the churn and try to hand tune the inputs to the models. However, given the nature of how the machine learning model works, we’d suggest against that.
This is because we can feed the specific flagged mis-predicted accounts in as negative samples and allow the model building techniques to do what they do the best, which is to identify the strongest correlations between the data and the outcomes.
If you try to hand-tune the models you'd ultimately end up with models that look similar to the rules you would hand build (rule-based alerts), with many of the same biases.
How does the model decide which factors are most relevant to churn?
If you are absolutely curious about what’s going on under the covers, read on. :)
When Natero dose model building, it is actually including *all* of the available data that is being sent to us. Depending on the available data, it converts each factor into a variety of machine learning "features".
So for example, it might take the individual interactions and look at things like:
- Number of interactions 1 week prior to churn, 2 weeks prior, 3 weeks prior.
- Days since last interaction.
- Ratio of interactions 2 weeks prior to 1 week prior, 3 weeks prior to 1 week prior.
Therefore, each factor converts into a large number of features.
Natero then feeds these features into a variety of classification modeling techniques to identify the models that are performing the best for your business, using a standard train/test cross-validation approach.
So what you will be seeing during the model building process are the outputs of the best performing models (for each review session), with a few factors selected by the models themselves as being the most correlated with churn outcomes.