top of page

Data Warehouse

The Data Warehouse

​

A data warehouse is a system used for data analysis and reporting of current and historical data.  Data warehouses are centralized repositories of integrated data from one or more disparate sources.  Ensuring that data it is delivered in a timely, consistent, and efficient fashion can be difficult. That’s often because source data come from different data structures, formats and may use different code sets to describe their information. Some data may be of low quality or have inconsistent or missing data elements when compared with other data sets.  Integrating these disparate sources is a key step to producing a data warehouse that has usable and actionable data for decision makers.

​

Metadata

​

Metadata is data that provides information about other data.   A data warehouse is an integrated repository from multiple sources and it is important to know where each data element came from and any transformations that have occurred during the Extract, Transform and Load (ETL) processes.  By recording source and target entity and attribute information in a metadata repository, the data warehousing team can track each data element and has a crosswalk that tracks each element from the data warehouse back to source systems.  These metadata can be leveraged to dynamically generate code for data processing to automate the build process.

​

Data Cleansing/Data Quality

​

Data quality and validation procedures, when practiced over time and iterations of data imports, establish rules to measure incoming data against standards, metrics, trends, data anomalies and other data analysis approaches.  Profiling over time allows you to look at trends in the quality of data and establishes a knowledge base of known valid values.  Problematic records can be flagged or extracted for review and manual intervention.  Rules for each data element or related groups of elements specify values that are not allowed, what the available values are or what the range of values can be. Each data element is validated against the rules that apply. Data cleansing is a critical precursor to successful Master Data Management.

​

Master Data Management

 

Master data management (MDM) is a comprehensive method of enabling an enterprise to link all of its critical data to one file, called a master file, which provides a common point of reference.   By managing common data elements through an MDM process, the enterprise can ensure that critical data has only one version across the enterprise.  This is commonly referred to as "one version of the truth".  MDM processes can be implemented to manage common codes sets and their definitions.  For example, under HIPAA, HHS adopted specific code sets for diagnoses and procedures to be used in all transactions. The International Classification of Diseases, Tenth Edition, Clinical Modification (ICD-10-CM) code sets, CPT-4 (Physicians Procedures), CDT (Dental Terminology), ICD-9 (Diagnosis and hospital inpatient Procedures) and NDC (National Drug Codes) codes with which providers and health plans are familiar, are the adopted code sets for procedures, diagnoses, and drugs.  These code sets must be accurate and should be managed through data validation and MDM business rules.

bottom of page