Site icon Bill, the Vest Guy

A discussion on data validation

Validation is a critical consideration when considering sourced data, that data over which the system designer does not have complete control.  Validation is also an essential part of interpersonal relationships, but that is out of scope for this article…  When looking at sourced data (and interpersonal relationships, actually), we can see that data can be made up of historical data, integration-related data, and\or user-entered data.  This data is called out because it was created by systems that are not necessarily making the same assumptions about data as your system.  Since there are different assumptions, there is a chance that this provided data will not work within your system.  This is where validation comes in; a way of examining data at run-time to ensure “correctness.”  This may be as simple as the length of a field to something more complicated and businessy, such as a verified email address or a post-office approved address.  This examination for “correctness” is validation. 

In the grand scheme of things, validation is simply the detection, measurement, correction, and prevention of defects caused directly, or indirectly, by weak quality data.  There are three different types of validation, internal validation, external validation, and process validation.  Internal validation methods are embedded in the production system at design time to ensure the internal validity of data by eliminating any inconsistencies. External validation, on the other hand, makes use of knowledge about the real world. This use of the real world means that, when looking at a data model, external validity of that model is determined by comparing it against the real world.  Finally, process validation focuses on actual data processing operations. This focus on operations means that all data defects are synonymous with process defects, and thus, correction of process defects effectively leads to the prevention of future data defects. Efforts around process correction are based upon the fact that once a piece of incomplete data is in a database, it will be challenging to fix.

lotsa words needs a picture of ducks!

Validation and correctness are somewhat vague words, so let us add some context around them.  The easiest way to do that would be to talk about the different types of validation that may need to be done.  The most common forms of validation are range and constraint checking, cross-reference checking, and consistency checking.

The last thing for consideration as part of validation is post-validation actions, the steps to be taken after data validation.  The easiest decision is when validation is successful, as the business process can continue.  The more significant decision is what to do when validation fails.  Three different actions can happen in the event of a validation failure, an enforcement action, and advisory action, or a verification action.  I guess there is always the option of taking no action at all, but that kind of makes the performed work to perform the validation of a big ole’ waste of time…

Validation is an important consideration when building any application that interacts with data.  There are various types of validation, such as internal validation, external validation, and process validation.  There are several different types of checking that could be performed for the validation, including range and constraint check, cross-reference check, and consistency check.  Lastly, there are three major approaches for managing an error: and enforcement action, and advisory action, and a verification action.

The next posts in the Validation series will show how to implement differing validation strategies.  Stay tuned!

Exit mobile version