Geofutures’ Simon Lewis explains that the success of a major new information resource demands a thorough approach to cleaning up the underlying data
For the last six months the Geofutures development team have been working on a stimulating project in partnership with The Local Data Company (LDC). Town Centre Intelligence is a web-based application designed to provide insight and information on the economic health of town centres.
LDC collects a huge wealth of data on retail premises, obtained and updated directly by their own team through street surveys. Until now the company has had a thriving business supplying clients such as Yell with data in the form of database extracts, but they rightly identified the opportunity to create even more value from this enormous resource.
Town Centre Intelligence provides subscribers – town planners, town centre managers, retailers, property investors and master planners among others – with all the retail data on 1,300 town centres and the means to sort, search and visualise it via a user-friendly map based interface. Delivered over the web, TCI delivers context-specific information on town centre performance at successively fine scales at the click of a button.

TCI offers instant online retail data, including the independent / multiple mix, shown here for Edinburgh
Geofutures’ part of the game has been the development of this online data platform. In building it, some of the thornier challenges we’ve faced have involved the database structures which underpin the application. Dealing with spatial data is our stock in trade, but it’s certainly not for the fainthearted, as some of the issues set out below will illustrate.
Creating locations
TCI is based on The Local Data Company’s data, but it also incorporates town centre boundary data from the Dept for Communities and Local Govt (CLG), and floorspace information from the Valuation Office (VOA). Look at any three organisations’ data and you’ll see that there is no such thing as a universally accepted address standard in the UK, notwithstanding BS7666, PAF and Address Point. None are wrong, they are just all slightly different, and this is the issue we addressed (no pun intended) with bringing these three sources together.
The key thing they have in common is that they refer to a place on the ground (with a few exceptions for things like house boats). Instead of trying to match between the sets, we allow a point on the ground to have multiple addresses and we match to the point. With the volumes of data involved (some 300,000 business premises records in total) we needed to build a specialised ‘data cleanser’ application to perform these matches, which also allowed us to add non-address attributes such as floorspace to these points.
You can’t provide a picture of UK town centre retailing without dealing with shopping centres, of course. In each shiny mall lurks an addressing hornets’ nest all of its own. Multi-level, multi-concession shopping centre addresses bear little relation to normal addressing, and being privately-owned estates, collecting data and taking photos is often restricted.
Beyond this, we have to deal with the granularity of different types of address. A department store and a shopping centre may both contain multiple businesses but these are treated differently in different databases: the VOA may match on one level and the Ordnance Survey on another level. TCI offers unique added-value information such as churn rates of retail premises. This calculation is deceptively complex anyway, and to achieve this within acceptable bounds of accuracy, TCI has to recognise different addressing schemas and calculate churn rates accordingly.
Creating ‘towns’
There’s lots more elsewhere on this site about the Geofutures project to define town centres for what’s now CLG (previously ODPM, DETR and DoE; it was a long project). The need for it arose because the definition of a town centre – precisely where it begins and ends – depends upon whom you ask, so no consistent and comparable boundaries could be drawn.
This was significant when the health of town centres appeared to be under threat from out-of-town retail parks, and the success of planning changes to improve this had to be evaluated against standardised boundaries. These were created by tying multiple relevant datasets to town centre locations, creating an Index of Town Centre Activity based upon economic activity, property and diversity measures, and deciding a nationally consistent threshold value which would delineate every boundary.
The boundaries are used in TCI to allow like for like comparisons between town centres (London and other major cities comprise many smaller centres, for example, and only retail data within the government boundaries are included in the application). This too requires a lengthy data matching process, using what us GISers call ‘point in poly’[gon] to link locations to towns.
Generating statistics
Town centres are vibrant, dynamic things, and to be useful to those assessing and planning them, TCI has to allow for changes over time. Towns change size, both due to physical changes in their fabric, and due to shifting town centre boundaries based on their changing index of activity (see above). As the boundary moves, a retail location may move into or out of the town centre.
The size of individual retail premises may also change due to extensions or merging / de-merging with adjacent premises. Independents and multiples are analysed and compared within TCI, including data on independents which become multiples the moment they open a second shop. All of these flexibility requirements can be met with the right data structure, but reaching this point has sometimes been an interesting journey.
In human thought processes, we move between wide helicopter views and fine-scale information all the time. For a tool to aid this process, we need it to aggregate data for us and then break them down again. The magic is in how we expand out into huge arrays of data which lend themselves to statistical modelling, and then aggregate the data back into more manageable volumes that are quick enough for downloads and interactive analysis through the web interface.
There’s more about TCI here, or please comment on this article below.