EDM, EDW, and Data Science with 40+ Applications

Industry:
Hi-Tech Manufacturing

Client Overview

Our customer is a rapidly growing engineering and manufacturing organization located in the Midwest, with a turnover of over $4 billion. They provide state-of-the-art digital continuity products & services from Critical Power Management to Server Rack Systems. The organization is engaged in designing, building, and servicing infrastructure for datacentres, communication towers and commercial/industrial facilities.

 The organization has over twenty thousand employees worldwide, twenty-five manufacturing and assembly facilities, and an established local presence across six countries.


Project Scope

The scope of the project was to

  • Implement an Enterprise Data Management solution on Hadoop Impala data lake as the central data warehouse.
  • Extract from all legacy and cloud applications, Transform and Load Data into this data lake - one time as well as incremental feeds.
  • Extract data from all new cloud applications and load into the data lake.
  • Perform data cleansing, prior to transformation and loading, for each stream of data.
  • Where applicable, data streams were to be harmonized and loaded into data lake.
  • Make Data available on a real time basis for all analytical reporting from this data lake for all business functions across the globe.

There were more than forty legacy and cloud applications.


Business Situation

The Business embarked on a major digital transformation program, moving from legacy and SAP ECC on premise to Oracle Cloud applications. This was executed in multiple waves over a three-year horizon. Businesses wanted all their current analytical reporting to continue without any hindrance and invest in a technology that will cater to their future needs.

The legacy and cloud systems were located across five different time zones and had more than thousand discrete data models and over five hundred discrete object sets.

Hadoop Impala was chosen for the EDM to store hundreds of TB data. The data lake was used for data analytics and data science projects.


Technical Situation

The data stored in enterprise data models were the most valuable data asset for the organization. It offered a consolidated repository for data from legacy and new Oracle cloud systems.

 Some of the major challenges were: 

  • Thousands of Discrete data models had to be profiled and combined with similar ones wherever applicable with varied formats and data types.
  • The volume of data across applications was close to thirty TB.
  • Data migration had all the complexities - Incremental migration in addition to onetime migration of historical data, real time feed of current transactional data, scheduled and on demand data migrations.
  • EDW was to act as a single source of truth, imposing strict quality norms.
  • Master data were governed by MDM and was used for validating all the transaction data that comes to the data lake.

Solutions

  • The Enterprise Data Warehouse architecture had three layers of data set – Bronze, Silver and Gold. Data progresses were ensured through a Silver (normalized) & Gold (de-normalized) data models before going into the Star schemas in Power BI for efficient analytics.
  • The data topology: Incoming was used for staging, Bronze was used for history, and Silver was used for current data. MDM maintained clean, consolidated, and harmonized master data. Gold had star schema with denormalized Facts and Dimensions 
  • Pre-configured templates were used to extract and load into the Incoming, Bronze, and Silver data models in EDW. The data models of Bronze and Silver are identical to ChainSys standard EDM for SAP ECC, SAP R/3, Oracle EBS, Microsoft Dynamics, and Oracle Cloud Applications. For legacy mainframe applications, we used the data models based on the profiling results and existing knowledge.
  • ChainSys dataZen was used to Cleanse and Match / Merge of master data and integrate/ingest them into the EDW. 
  • The data from Silver and MDM Repository are transformed into Gold data models using ChainSys dataZap application. 
  • A robust data profiling & consolidation strategy were put in place for all applications. Accordingly, each stream of data was profiled and validated. Data profiling helped to learn the unknowns.
  • A robust workflow-driven governance process for data going in and out of the EDM/ EDW was also established. ChainSys dataZap's governance engine ensured that workflows were up and running in real-time. Data reconciliations were carried out for the data ingested from source applications into EDM.
  • Target for Data quality was set at 99% clean. Quality processes were instituted to achieve the same.
  • ChainSys data quality engine ensured total profiling and validation of all the data. The profiling process was repeated until data quality reached 99%.
  • ChainSys dataZense was used for creating all Analytical reports.
  • ChainSys dataZense was used for Data Catalog, Data Security, Compliance, and Visualization. Data Catalog provided Metadata management, Data Lineage, Business Glossary, Data Virtualization, and Global Search engine for the enterprise data. 

Illustrations

Benefits

The partial list of benefits:

  • Business was benefited by the analytical reports from a single source, EDM for their global operations.
  • A full 360-degree view of Customers, Suppliers and Products provided all the insights for Sales, Service, and other functions. Similar insight was also available for the Financials.
  • Clean and complete data was available in centralized enterprise data models for consumption.
  • Product Profitability analysis helped to decide on the right line of products.
  • C-Suite dashboards helped all the C-level professionals with information needed for staying current on the financials, costing, spending, HR headcounts, and profits and decision making.
  • Data Quality improved by over 40% in the first three weeks to get to 99% accuracy.
  • Data Catalog improved trust in the EDM data and established a baseline for all custom development work.
  • The implementation of EDM accelerated the implementation of new cloud solutions, equating to approximately 40% cost savings
  • ChainSys pre-built templates accelerated the whole EDW creation and established a robust EDM.

Products and Services Used

dataZap - Pre-Configured Templates & Migration Engine to Extract, Transform, Pre-Validate, Load, Reconcile & Report.

dataZen - To 'Get Clean' and 'Stay Clean', and Introduce Master Data Governance.

dataZense - To Visualize, Analyze, Catalog and Scramble Data for Effective Decision Making & Security.

Reference

No items found.