News
28 May 2021

Facilitating harmonized data quality assessments. A data quality framework for observational health research data collections with software implementations in R

At present, no standards exist for the handling and reporting of data quality in health research. The paper “Facilitating harmonized data quality assessments. A data quality framework for observational health research data collections with software implementations in R”, recently published on BMC Medical Research Methodology and authored, among others, by Universit√§tsmedizin Greifswald, introduces a data quality framework for observational health research data collections with supporting software implementations to facilitate harmonized data quality assessments.

Developments were guided by the evaluation of an existing data quality framework and literature reviews, while functions for the computation of data quality indicators were written in R statistical software, based on data from the euCanSHare cohort study of Health in Pomerania. R functions calculated data quality metrics based on the provided study data and metadata, while guidance on the concept and tools is available through a dedicated website. The resulting data quality framework comprises 34 data quality indicators, targeting four aspects of data quality:

  • compliance with pre-specified structural and technical requirements (integrity);
  • presence of data values (completeness);
  • inadmissible or uncertain data values and contradictions (consistency);
  • unexpected distributions and associations (accuracy).

The presented data quality framework is the first of its kind for observational health research data collections that links a formal concept to implementations in R. The framework and tools facilitate harmonized data quality assessments in pursue of transparent and reproducible research. Application scenarios comprise data quality monitoring, but the developments are also of relevance beyond research.