Incorrect information can very easily ruin your day.
Missing the bus because of an outdated timetable, missing out on love because of an incorrect phone number, missing the point of a joke because someone told the punchline wrong – all of these day-to-day frustrations could be avoided with accurate data.
However, having accurate data is even more vital when it comes to business because the stakes are often much, much higher. Corrupt data can cost your company its reputation, customers and revenue. Gartner reported that the average financial impact of poor data quality is $15 million per year in losses.
Data Integrity Definition
In its most simplified sense, data integrity is the practice of ensuring data remains accurate, valid and consistent throughout the entire data life cycle. To understand the concept fully, you need to know that data integrity has two definitions, depending on the context in which you approach it.
First, we have logical data integrity as a process that ensures that data is kept accurate and consistent. The primary purpose of this process is to stop data from becoming compromised and, essentially, useless.
There are three basic types of logical data integrity:
- Entity integrity
When data is recorded in tables for relational databases, entity integrity ensures a primary key (id, name) for each column. No two columns have the same identity or are unnecessary (null).
- Domain integrity
In data integrity, a domain is a predefined set of allowed values that data can take on when records are created, for example, the data type, the length, the range, limitations and constraints, the date format, etc. Domain integrity outlines these values and ensures that recorded data is restricted to these formats. - Referential integrity
In relational databases, two or more tables can be linked together in a ‘relationship’ because they contain related data. These relationships are created using a foreign key (the associated table) and a primary key (the primary or parent-table).
Referential integrity requires a valid primary key to be referenced in the parent table whenever a foreign key is used, thus ensuring consistency between these tables.
Second is the product of these processes, physical data integrity as a state, i.e., a data set that is accurate and valid. Here we are concerned with storing and fetching the data to ensure it is not corrupted by events such as power outages, natural disasters, corrosion, etc. For many businesses, the introduction of cloud storage has solved the threat posed by loss of physical data integrity.
Data integrity problems are likely to occur when…
If data integrity processes are not followed, it can, as we have already mentioned, have a high cost to business, research, and anyone attempting to make decisions based on that data.
Here are some scenarios and instances where data integrity can become compromised:
- Human error – these are mistakes in data recording or maintenance, ranging in intent from accidental to malicious. Human error occurs when data activities are recorded incorrectly or wrongly deleting files, etc.
- Transfer errors – unintentional corruptions that occur during the movement of data between devices, applications or databases. Alterations, duplications of records, or undetected failed transfers can all threaten data integrity.
- Cyber threats – software bugs, hacking and viruses/malware are examples of insidious attempts to gain unauthorised access to sensitive private data.
- Compromised hardware – device or disk crashes can cause data to be corrupted or lost entirely, meaning it is no longer usable for analysis or record keeping.
- Remote working – moving from IT team controlled enterprise content management systems to off-premise remote working solutions can cause a loss of data integrity. For example, if personal laptops or USBs are used to enhance data accessibility this can lead to leaks or misplacement of sensitive data.
How can you ensure that your data is accurate and consistent when it’s generated, duplicated, accessed, and moved around your enterprise at such a rapid rate?
The FDA (Food and Drug Administration) has outlined some principles for those in the pharmaceutical to ensure data integrity when recording on paper or electronically. However initially intended, they have become widely circulated and accepted as the standard across all industries. The principles can be remembered by using the acronym ALCOA, which stands for:
- Attributable
This principle refers to the responsibility of data and the ability to trace any action to a single user. To ensure attributable data, any person who makes a data action (recording, transforming or moving data) must be identifiable as the person who took action.
Analysts must create data logs for every action to include the name, computer ID, date of the data action, etc.
- Legible
Simply put, this principle aims to ensure that data can be read and understood by everyone who accesses it – whether it is recorded on paper or electronically.
Ensure that data is recorded in standard terms and values so that even when the data-recorder has left an organisation, the data remains valid and usable.
- Contemporaneous
Data integrity processes should occur at the same time as the data activity or immediately afterwards. All data activities should be timestamped to ensure that analysts have a clear record of the date and time when they took place.
Back-dating or overwriting data activity logs is a threat to data integrity as it increases the likelihood of human error or data loss.