There are a wide variety of threats to data integrity. And while most people imagine malicious hackers as the main threat, the majority of root causes are internal and unintentional, such as errors in data collection, inconsistencies across formats, and human error. You should build a culture of data integrity by:
- Educating business leaders on the risks
- Establishing a robust data governance framework
- Investing in the right tools and expertise.
You can ensure physical integrity in your database system by taking steps such as:
- Having an uninterruptible power supply
- Setting up redundant hardware
- Controlling the physical environment against heat, dust or electromagnetic pulses
- Using a clustered file system
- Using error-correcting memory and algorithms
- Using simple algorithms such as Damm or Luhn to detect human transcription errors
- Using hash functions to detect computer-induced transcription errors
You can ensure logical integrity by enforcing the four types of integrity constraints described in the previous section (domain, entity, referential, and user-defined). Often, integrity issues arise when data is replicated or transferred. The best data replication tools check for errors and validate the data to ensure it is intact and unaltered between updates.
In addition to these steps, here are 5 key actions to maintain data integrity as a data custodian:
1) Use a modern data lineage tool to keep an audit trail, tracking any alterations made to the data during its complete lifecycle.
2) Use a data catalog to control access, making different kinds of data available to different kinds of users. You should also control physical access to your servers.
3) Require input validation for all data sets, whether they’re supplied by a known or unknown source.
4) Ensure your data processes have not been corrupted.
5) Regularly backup and save all data and metadata to a secure location and also verify the retrieval of this backup data during internal audits.