1. Standardise, standardise, standardise. Train your staff to understand why it matters which field they post the postcode into. Spend time creating standard protocols for titling, addressing, multiple occupancy, etc and it will pay dividends in avoiding duplicates and non-matches later.
2. Design your database with its final end use in mind – optimise for querying, or matching etc. Relational and non-relational data structures, complex join sets etc govern how quickly a database can be queried. Business premises are not the same as business addresses; collect data with one in mind and it won’t easily work to analyse the other.
3. Don’t trust supplied data, no matter how prestigious the supplier. They can also contain address inconsistencies.
4. You can combine data from multiple sources, but it’s not for the fainthearted. No matter how good each source is, their differences will compromise the results or require significant investment and expertise.
5. Create a new unit of standardisation if nothing else exists e.g. geographical location, which ties together disparate datasets and allows other data to be tied to the common factor as attributes.
6. In a similar way that different datasets need standardisation to be comparable, temporal analysis requires standardisation between different time series data: addresses change through time.
7. Don’t skimp on the metadata – this helps future-proof your data for unforeseen additional uses, and ensures external experts or third parties can derive value from it.
8. Watch the resources being spent – better to feel 100% confident about 80% of your enhanced or combined dataset than 80% confident about 100% – and the underlying data often limits the accuracy achievable anyway.
9. Government or sales area boundaries bear no relation to data on the ground – that’s why data landscapes are much more meaningful than zone-defined values.
10. Location can tell you a lot more than where something is. Context, correlation with other data, transport links, topography etc will all influence phenomena on the ground, that’s why mapping it is so skilled. But to do any kind of advanced analysis, you need confidence in your underlying data.