Thursday, 27 February 2014

How to pay it forward when designing your data models

With my midlife crisis almost complete, I am lucky enough to have scored an internship with Two Red Kites. Although I have pretty much kissed my previous life in Data Warehousing goodbye, I sometimes feel guilt at the power I now wield as a developer of front end systems. The decisions that I make when designing an application can have far-ranging impacts further down the line.

Attention should be paid up front in the data entry application to collecting the right data, at the right level of detail, and with input validation to ensure data quality. Without this solid foundation of data integrity, it can be difficult to construct a reliable reporting solution.

A general rule in designing a data warehouse is to store data at the finest level of granularity possible. This produces the most robust design for responding to future queries and new data elements.

However, the level of granularity is limited by the level of granularity captured in the source system. Trying to break a data element down into smaller elements is a challenging task to say the least. Data quality, inconsistent formatting, and incomplete fields all create an additional headache.

A good example is an address field. Giving customers a free-text address field can, and will, result in the same address being represented in many different ways:

River View Apartments
Apartment 4503
197 Albert Street
Brisbane
Queensland 4000

River View Apartements
4503/197 Albert Street
Brisbane
QLD 4000

4503/197 Albert St
River View Apartments
Brisbane 
Queenlsand 4000

Unit 4503
179 Albert Street
Brisbane CBD

Apartment 4503
River View Apartments
Albert Street
Brisbane Q 4000

Level 45, Apartment 3
197 Albert Road
Brisbane Q’land 4000

Customers often ask for their reports to be broken down by postcode, Statistical Local Area (SLA) or Local Government Area (LGA). The ability to assign an SLA or an LGA to an address involves identifying the exact location of the property. If we cannot extract each and every data element from that compound address field, the completeness and accuracy of such a report is questionable at best.

So here is my plea: We should design and develop our applications to collect data in a flexible and robust manner. This facilitates future reporting requirements, placing us in a position where we can make a real and positive difference to our Data Warehousing colleagues.

No comments:

Post a Comment