Data Standards – what are they and why do we need them?

Maintaining data standards is an incredibly important part of your data strategy. It involves keeping your data clean, in the correct format, and verified before you do anything with it. Imagine how important it is to have the correct email address for a fan before sending out any campaigns, or the right data points before you try and find your actionable insights. If you ever want to merge your databases together in an SCV (or any other type of data warehouse) the relevant data fields must match up. They must use the same pattern. 

According to a 2017 article in Harvard Business Review, only 3% of companies’ data meets basic quality standards. If you cross-reference that with Doug Laney’s suggestion that Your Company’s Data May Be Worth More Than Your Company, and you’re one of the 97% of laggards in this area then read on, as we provide you with some very basic principles to get you moving. 

Data Quality and the Cost to Companies 

My colleague, Manuel Meretto, wrote a blog on this subject a few months ago and asked the question What are the key issues with data quality? highlighting three areas for discussion: existence, consistency, and validity. This post touches on the consistency issue – providing you with three easy examples that demonstrate how the way we collect data will naturally differ from source to source.  The more we’re aware of this, the more we can put in place processes to either prevent it from happening or will know what changes need to occur to the data points to provide us with the consistency we need.  We call this the transformation phase of the ETL process (extract, transform, and load): the way we move data around. 

Data Transformation to Maintain Standards 

Sports organisations will have different data sources, some internal and some external, so your data is being collected in different ways and in different formats. The principle of data standardisation is that a layer of processing can be added that ensures all your data ends up in a common format. This processing can be manual or automated, but an oft-used Bill Gates quote is very relevant here:  The first rule of any technology used in a business is that automation applied to an efficient operation will magnify the efficiency. The second is that automation applied to an inefficient operation will magnify the inefficiency.  

This is one of the reasons I’d like to share these basic principles because if you can consider the steps you would take to manually correct these data points, you can then envisage the automation process. You can only automate what you can understand. 

Easy Examples to Demonstrate the Principle 

Here are those easy examples I referred to that clearly demonstrate the principle: 

Date of birth: In Europe and many other regions of the world the format DD/MM/YYYY is used but, as you’ll be aware, in the US the date and month fields are reversed to MM/DD/YYYY. If you’re not paying attention to the input format of the date of birth field in the different data collection forms around your business, you may be collecting records in different formats. Imagine if one of your customers is born on 2 November 1966 and registers on your website to be a volunteer through a form that uses the DD/MM/YYYY format, but then enters one of your competitions via an independently hosted landing page where your developer used MM/DD/YYYY. If you’re using an SCV, or even if you just use an email campaign platform that houses all your email marketing data you could end up sending that customer a birthday e-card on 11th February instead of 2nd November. Here’s another point to consider: if you ask for a customer’s date of birth and allow users to free type instead of using a calendar format for the input, you could end up with 2 November 1966, 2 Nov 66, 2 Nov 1966, or any one of several spellings and formats. 

Data Standards – what are they and why do we need them? Winners FDD
An example of a digital birthday card from Liverpool FC.

The value in a correct date of birth field can’t be understated. Firstly, there are digital birthday cards – a great tool for fan engagement that only needs to be set up once a year and will then provide you with 12 months of activity. If you include a discount coupon for your online store or a sponsor offering, these could help with your commercial objectives. Secondly, if you want to pitch to, say, Heineken for a sponsorship, they’ll inevitably want to know how many adults aged 18 to 25 you have in your database, as that information aligns with their target demographic. 

Gender: Another data area that is incredibly important to your business but prone to errors, is your gender field. Options for a customer to choose often include M or F, male or female, boy or girl, but the way in which a database may interpret this could also include a 0 and 1, or 1 and 2. If this isn’t standardised when the individual data sources merge, you’ll end up with a ‘dirty’ gender field using multiple versions of these examples when you really want completely usable information. It’s interesting to note that many organisations now use a broader gender definition than male or female. Whether you do this or not should be a decision of the entire business, not decided at an individual ‘form’ level. 

Data Standards – what are they and why do we need them? Winners FDD
An example of FA Wales’ Newsletter data collection.

Country of residence: I always recommend, when using a country of residence field in your data collection forms that you use a drop-down menu that enables the customer to select their specific country or an auto-fille function that will complete the field once someone starts typing. If you don’t, can you imagine how many different spellings of any individual country you might get? Consider someone who lives in London. Their self-type options could be UK or United Kingdom, GB, or Great Britain, or indeed they could write England. If they mistype or have a problem with spelling you could end up with United Kingdom, Grate Britain, or England.  

This principle is relevant not just for country of residence but any incidence where there are only a finite number of options to choose from – other address points, favourite player/clubs/national teams, household income range, education level, anything else you’d like to ask your fans or customers as they’re completing a form to register, subscribe, enter, purchase, respond, etc. 

From Simple Example to Fundamental Principle 

You can see from just these three simple examples that, if you don’t pay attention to your data standards, you could end up with a database that contains a lot of unusable information. This will skew your statistics and result in incorrect messaging sent to your fans. Creating a data dictionary that lists each data cell in each of your data sources, its format, and its purpose will help you stay on top of this.  

Using drop-downs, tick boxes, and auto-populating from an external data source will help maintain your data hygiene. When it comes to your email addresses, most commercial email marketing platforms will auto-cleanse your list of any bounced records as well as those of customers who have unsubscribed, but if your campaigns have a high bounce rate it could affect your sending reputation and result in your platform blocking you from using them. 

This is a simple example linked to three data points – in your organisation you’ll be dealing with many more than three – but while these may be quite basic, they represent the principle of data standardisation, a vital consideration for any data management strategy. 

If you want to talk about this in more detail, please get in touch, we’d love to hear from you.