Integration of HIP & GHI Data into a Single Data-Warehouse

The Client

The client is a HDIMDW (Health Dimension Data warehouse) that integrated data of a $10 billion dollar health maintenance organization (HMO) and a health insurance company with 3.4 million members. It is also handling the HIX (Health Information Exchange) project that transfers health information to the State Department of Health.


The merging of the HIP and GHI has resulted in the formation of a HMO. The need to blend both the GHI and HIP data into a single data warehouse arose as the HMO was formed.

Project Scope

The client wished to incorporate GHI and HIP data into a single data warehouse.

Project:Integration of HIP & GHI data into a single data-warehouse
InfoSphere DataStage 8.1
Information Analyzer
CSV files
Fixed width files
COBOL files
Toad for Oracle 11
Visual Source
HP Quality
Application Life Cycle Management


Healthcare data is complicated as each patient data record contains detailed information regarding the various diagnoses, reports, etc. The sheer volume of data is overwhelming for the system. All of this complexity has to be related to the various fields in the dimensional model. Transformation of data is required to adjust complicated data to the data model of a dimensional schema. The clinical data also needs to be a separated from sales data.


Utilizing the constructive approach an extensive requirement analysis of user and the system was completed. The source systems underwent data analysis for data migration, transformation, integration and cleansing purposes. The Data Warehouse is then designed and implemented.

We developed jobs for loading the data into staging and warehouse using Information Server DataStage/QualityStage 8.1 and 8.5. The jobs were created such that they can handle type 2 changes in the dimensions from source to staging and data warehouse.

The HIP and GHI data was integrated and CreateDataStage naming convention and standards document were created for the project. Next, the JIL files were prepared for scheduling DataStage jobs in the Autosys scheduler. Lastly, DataStage jobs were optimized for better performance, significantly improving extraction, transformation and loading time of data.

The UtilizeDataStage director and its run-time engine was used to validate, run and monitor the resulting executable versions. The jobs were performance-tuned again to reduce extraction, transformation and loading time of data. Our testing then developed unit test cases and unit tested the jobs prior to QA and production.

Value Delivered

  • Centralized Data
  • Reduced Time for Fetching Data
  • Reduced Time for Fetching Data
  • Reduction in Data Redundancy & Duplication