I don’t think anyone is in any doubt that your first party data is the most important type of customer data you can have. However if you’re still not convinced, there are a plethora of articles out there you can read – Why you should care about first-party data from Forbes, First-party data is key in a new era for digital advertising from Deloitte, and The importance of first party data from our friends at PTI Digital.
But have you ever stopped to think how much first party data you should have? We’re often asked this question by our clients and have never had a response to give them – or at least, not one that we can throw out “off the cuff” without the need to perform some deep analysis on their legacy data growth, look at the correlation with their events, lead generation tactics, web traffic etc.
So, we decided it might be quite useful for the industry to be able to answer that question, to find a way to benchmark a number based on contributing factors and came up with a process for doing just that – and you can participate in the first release of this project. Just provide us with seven statistics that you’ll have at the tip of your fingers, and we’ll be able to tell you your predicted database size. You’ll then know if you’re over- or under-performing on data collection.
Note: all information provided will be treated in confidence and any benchmarking we provide to participants will be 100% anonymised.
How did we do it?
NOTE: only read this paragraph if you like the geeky stuff!
We started by taking the seven statistics for each of our clients – as this is just the first release of the project, we’re keen to keep it simple for other rights owners to participate – and applying regression analysis to identify what factors matter most in predicting first party database size. We used different types of regression analysis but, in this case, multiple linear regression was more appropriate.
The first thing we found is a handful of outliers within our sample set using the 1.5 IQR method of outlier detection – a small number of clients whose statistics did not fit the linear model we found with the majority – so these were discarded from the exercise. (With your help we may find more outliers that allow us to create specific sub-sets for them!)
We were looking for the key points which could tell us how good our regression analysis is, for example, an R value as close to 1 as possible. Without the outliers we did indeed achieve this – 0.93 R-square and 0.91 Adjusted R-square – with a significance (p-value) of lower than 0.05 (we achieved 0.00008).
We used Tableau to plot this data and provide us with the regression line. This allowed us to identify the intercept (or constant) coefficients for independent variables and create an equation of linear regression.
What were the outcomes of our analysis?
Applying this formula to our client statistics we identified their predicted marketing database size, and the extent to which they were over- or under-delivering on their data collection strategies, answering the question does the number of marketable email addresses they have reflect the target number generated by our model?
The outcome is that 44% of our clients have exceeded their predicted numbers, 35% are close to their predicted numbers, with 21% under-achieving. Knowing our clients as we do, and their data sources and data collection processes, we can indeed understand these results – and now we have empirical evidence we may be able to get more buy-in to our data growth strategies.
By the way, we’ve heard of some organisations trying to set a target KPI for first party data collection based on a rule as simple as 10% of your social followers but we can find no correlation with that across any of clients. While our use of just seven data points in this first release is also quite basic, the methodology behind it’s quite deep.
How robust is our process?
As with any type of analysis, the more data we have the more robust the process becomes; so as more rights owners provide us with their stats, the more accurate our assessments will be. And at a certain point we’ll be able to break the analysis down into different types of rights owners so that we can provide benchmarks across sectors. In the next release we’d like to progress into data depth (number of key data points) not just data quantity (number of contactable records), and will then add additional variables such as lead generation activity, rights owner status, etc.
What should you do next?
If you’re a rights owner and would like to know how much first party data you should have, please fill in this form, we’ll run your numbers in our model and let you know how you’re doing. If you’re under-achieving against your predicted numbers, you could use the outcome to secure internal support for your data collection plans. And if you’re over-achieving, you’ll have a great story for sponsors and other key stakeholders.
Once we have a significant enough quantity of participants, we’ll update your report with a benchmark across your segment of the industry – this will make our work even more relevant.
Fill in this form now and we’ll respond within 24 hours.