Info3 home page...
Services
Benefits
Retail Information
Analytics
KPI Workshop
Data Mining
Information Barriers
Contact
News
DIRTY DATA
Most Businesses Use Reports Created With Dirty Data
Most databases contain dirty data, either logical or physical. Dirty data manifests itself in multiple forms and often remains undetected. Logic errors are a result of inadequate exception handling, lack of domain definitions, poor structures and many other causes. Physical errors include overloaded values, incorrect data types and corrupted structures.

Corporations choose to ignore these issues because they do not have the technical resources to appropriately address them. We understand how dirty data issues are caused, which are important and how to resolve them, quickly and inexpensively.

Physical Errors
A database column has a data type associated with it like numeric, character, date and so on. Each data type expects to see certain characters to make up the value contained within it. A simple date, without a time component, for example is often expressed as a series of eight numbers, four for the year, two for the month and two for the day. What happens when the date value has a character like the letter ‘A’ embedded in it? What happens when the month is expressed as 25? Numeric columns usually have a range associated with them. What happens when a column contains values that don’t belong there resulting in a numeric overflow? How does your reporting solution handle it?
Logic Errors
Logic errors can be more difficult to deal with. Syntactically the database looks fine and brute force methods of data validation don’t work. Some database tables don’t have proper relationships between tables resulting in orphaned rows. This means that order line items might exist without a corresponding order or account associated with it. Values in columns might not have the proper domain associated with the value. For example, this year’s colors might consist of sapphire, navy and aquamarine. A value of blue in this column certainly leads to confusion.
Data Value Semantics
An important issue when combining data from multiple systems is conversion of data types into a single type. Some data types can be expressed in many different forms. For example a logical data type can be expressed as the following: True False; 1, 0; Yes, No, Y, N; 0, -1; T, F. Sometimes the data type doesn’t match the type of data stored within that column. Dates for example should be stored in date data types but sometimes they’re stored as characters. This single issue can be a major headache for anyone writing report that has requires a data range.
Business Processes
How companies report information affects their internal processes, and most importantly shareholder value. Internal corporate reports are often not standardized or based on consistent information, leading to problems when different groups are trying to exchange information. A sales rep might consider a sale has been made once the customer has made the commitment to buy. Shipping says the sale has been completed once the product has left the warehouse. Accounting recognizes the sale once the check has cleared the bank. Who’s right?

Business Intelligence Blog


Val Matison awarded Microsoft SQL Server MVP for 2007, 2008, 2009

© 2007 Infoinfoinfo Incorporated. All rights reserved.