1. What We Do
  2. Case Studies
  3. Products & Partners
  4. News
  5. About Us
  6. Contact Us

Data Migration: Stage Data to add Business Value

Many large organisations are faced with challenging strategic objectives in order to stay ahead of the game in their respective industries – they need to constantly improve service offerings, reduce cost, gain competitive advantage and attract new customers.

Companies are often held hostage by disparate legacy applications and business processes that may have attributed to their success in the past, but in the present have reached capacity and turned into a liability, hindering them from achieving these objectives.

To overcome these hurdles, Business and IT Departments are often tasked with identifying new applications that leverage new technologies and business processes that can be an enabler to the organisation and help them achieve their strategic objectives.

In the quest to address these issues and drive business forward, organisations are proactively seeking strategies to insure that success. To that end, many are adopting migration as a strategy to resolve their business and IT challenges. When migrating from a legacy application (source) to a new application (target) a Data Migration approach is preferred. To stay ahead of the game, organisations should maximize the value of their data.

There are many architectural options when approaching a Data Migration project, such as the direct migration of information from source to target, or the use of a staging area to enrich organisational data from source to target.

Why Stage Data?

Data Migration often lets a business maximise the value of data before it is migrated to the target system, as there might be limited opportunity to improve the data once it is loaded. The idea is to make the data as useful as possible when it reaches the target system.

It makes sense to stage data before migrating to the target system, because you can:


User Story

Data Migration of legacy systems with poor maintenance history and silo existence grants the use of a staging area.

A multinational utility company needed a Data Migration approach to consolidate hard and soft copy documents (along with metadata) from its Legacy systems to re-constitute its design base within a specialised Engineering Tool. These Legacy systems were commissioned over the life of the company, with multiple data owners operating in departmental silos, coupled with poor maintenance history. This had resulted in data duplication and poor data quality. In contrast, the target system had strict business rules, schemas and mandatory fields that the source system couldn’t cater for.

A direct integration between the source and target was not possible without data cleaning and standardisation. A staging environment was chosen to remedy these short falls before migrating the data to the target environment. This strategy had to be repeatable across its 16 regional installations.


What is staging?

The staging area is an interim data source into which data, documents and supporting metadata from single or multiple source systems are extracted, profiled, cleaned and loaded. The staged data can be analysed by subject matter experts, to formulate business rules that can be applied to enrich and transform data to satisfy target system requirements. The finalised data is verified by subject matter experts for integrity and accuracy. These experts can approve or reject the migration to the target system. Migration auditing, configuration management and reconciliation are value added services provided by the staging area.

Figure 1: The use of a staging area is really about gathering information from different sourcesand creating a single, unified view of the data for integration to a target system.
Figure 1: The use of a staging area is really about gathering information from different sources and creating a single, unified view of the data for integration to a target system.

Data Migration Phases

Data Migration is performed in four primary phases:

Figure 2: Iterative Data Migration Project Lifecycle
Figure 2: Iterative Data Migration Project Lifecycle

Planning for the staging environment starts with the Data Migration Planning Phase. The Staging Phase starts with Data Quality Validation and Analysis & Design, and ends with the Data Migration Implementation Phase.

1. Data Migration Planning

A good plan is like a road map: it shows the final destination and usually the best way to get there.
H. Stanley Judd

Thorough planning is the foundation for consistent success in any process, and data migration is no exception. The following guiding principles, along with risk and/ or critical success factors, are used when planning for a Data Migration implementation:

  1. Minimise project risk: this is achieved through stringent planning and execution, as well as multiple test runs, to ensure that risks are identified and mitigated prior to going live
  2. Ensure Conversion Data Accuracy: this is achieved through data reconciliations and testing
  3. Minimise business interruptions: this is achieved through stringent planning, risk and issues identification and mitigation
  4. Tie-back to Objectives: ensures that all activities and/ or specifications detailed in a document are linked directly to the objective of the document
  5. Stable and repeatable process: ensures that the migration approach and process is stable and repeatable for all future data migrations
  6. Approved approach: ensures that the approach is approved by the relevant stakeholders involved
  7. Achievable: ensures that the approach and adopted process are achievable within the given time frames

The following Data Migration Planning activities will directly influence the planning of a Staging Area:

a) Determining Data Migration Technology:

The decision to choose a particular Data Migration Technology is often influenced by the following factors:

a) Ensuring scalability and throughput of the tool to support varying volumes of data
b) Realising investments by re-using licensed tools, rather than investing in new tools
c) Providing reporting, reconciliation, configuration and auditing capabilities
d) Providing an interface and data repository to configure and store the data dictionary, as well as business and transformation rules
e) Providing analysis capabilities, such as data profiling and data mining

These two common data integration tools and practices can be applied to data migration:

a) ETL (Extract Transform and Load): In a more traditional data migration scenario, where data is to be migrated from source to target system with extraction, transformation and loading requirements, a vendor-specific ETL tool is preferred. Users prefer ETL for its unique ability to handle the extreme requirements of data migration, including terabyte-scale datasets, multi-pass data transformations, deep data profiling, interoperability with data quality tools, and many-to-many data integration capabilities.

b) Hybrid Staging Solution: With today’s complex Data Migration requirements, best practices recommend using a hybrid solution rather than a single tool to migrate data. The hybrid solution will consist of components that are specialised to support various stages of the migration process.


User Story

Best practices recommend the use of Hybrid Solutions for Data Migration Projects with complex requirements.

A multinational utility company required a data migration solution that could collect and track physical and electronic document stored in various locations, from a paper based library, file servers, PCs and Legacy Document Management Systems. Document content was to be searched and analysed to extract supporting metadata, which was required by the target system.

Duplicate and non-technical documents or files were to be identified and excluded from loading to the target system. The finalised data set (documents and supporting metadata) was to be loaded to the target system, which could then be reconciled and audited, post-migration.

The chosen vendor adopted a Hybrid Solution to meet this complex requirement, which composed of technologies such as scanning and bar-coding, OCR, Hashing algorithms, full-text indexing, a document-based data model and business process automation tools.


b) Determining the Conversion Plan:

Migration requirements may require a change to the legacy data during the migration process. The changes may be to form, value, or volume. The Project Manager should plan and schedule workshops between the staging area’s business analyst and data stewards, to define and form data conversion rules between the source and target systems. These rules can be stored in configurable data dictionaries within the staging area. The following sample requirements can trigger data conversions:

a. Conformance to target data requirements
b. Translation of source data values to target data values
c. Enhanced data types and/ or formats in newer technology

c) Determining the Integration Plan

Data migration may require drawing data from more than one legacy data source. The Integration Plan describes how conflicts and duplication among multiple source data and data structures will be resolved. The plan also determines how to move the data from the source system(s) to the target system.

There are two options:

d) Developing the Migration Data Quality Plan

Planning for data quality is essential for managing migration timelines within a staging environment. It is recommended to use a data profiling tool to profile the quality and relationship between data at source. Defect management process and tools should be used to log data quality issues, which can be channelled to the relevant team or person for resolution.

Weekly status meetings can be held to track the resolution progress and status of defects. Also, weekly reports can be sent out by the defects co-coordinator, providing the priorities, statuses and age analysis of the defects.

Data loss tolerance factors should be defined, as obsolete data structures and formats often don’t translate 100% into the target system. The Project Management team should consult with business stakeholders to determine the tolerance level for data loss. This will contribute to the remediation (what to do if a level of data loss is unavoidable) and the validation procedures (measuring whether the migration is successful).

e) Planning for Simulations

Depending on the scope, complexity and timelines of the migration, the following need to be included in the migration plan: mock-runs, simulations and dress-rehearsals. This will mitigate data quality, as well as transformation and conversion risk before the Migration Go-Live.

f) Planning for Switchover

The main consideration in the scheduling of the data migration is the conversion window allowed for on the date of the migration. This should be defined up front and targeted throughout the data migration process to ensure that it is achievable. This will be tested for in the various simulations.

The date and duration of Migration will be chosen to minimise downtime (next to zero-downtime) and disruptions of service delivery to customers.

g) Planning for Data Reconciliation and Auditing

The success or failure of the migration can be judged by reconciling the integrity, quality, accuracy and usability of the migrated data against the staged data. Requirements for reconciliation reports can be drafted during the planning phase, which can be developed and produced by the Technical Migration Team (Staging and Target) at crucial migration checkpoints. It is advisable to define checkpoints after crucial migration activities (e.g. extraction, enrichment and transformation) to verify that data integrity and quality isn’t compromised prior to final migration.

Subject Matter Experts and Business Stakeholders should review and audit the reconciliation reports so that an informed decision can be made to sign-off or roll-back the migration.

2) Staging Analysis & Design Activities

Analysis activities are initially performed during the Pre-Migration phase. The objective of these analysis activities is to understand the source system data, identify and cleanse data-quality issues (at source), and formulate mapping and transformation rules that will be used during the Migration. Analysis efforts will have to be re-visited on mock runs as new data is received. The documented results of the analysis become a resource for the design of the staging and target architectures.

The Analysis and Design activities should aim to produce the following Data Set qualities for implementation:

  1. Usable data sets: The finalised data is in a format that is usable for the implementation
  2. Complete data sets: All the data that is necessary for the implementation is available
  3. Accurate data sets: The data represents the correct real world situation and can be reconciled and signed off
  4. Correct data sets: The correct versions of the data sets are available and mapped to the correct destination

To achieve the above objectives, a sequence of task and sub-task (repeatable and achievable across migration runs) should be modelled in a Detailed Data Migration Process. While not every Analysis and Design activity will necessarily occur for every migration, a thorough Data Migration Plan should include either all requirements for the Analysis and Design activities or justifications for not including a particular activity.

Figure 3: Detailed Data Migration Process contains a sequence of inter-related Analysis & Design and Implementation & Validation activities to deliver a successful Migration
Figure 3: Detailed Data Migration Process contains a sequence of inter-related Analysis & Design and Implementation & Validation activities to deliver a successful Migration

a) Analyse the Current Environment

The Functional and Technical Migration Team (in collaboration with the source system team) reviews all complied artefacts related to the legacy environment to fully understand the IT Environment in place, and any constraints or technical limitations. It is important to determine remedies for these limitations in the new target system. This analysis can be performed during the Pre-Migration phase. The results of the analysis can feed as input for the Project\ Migration planning phase.

b) Extract Legacy Data

Extract the legacy source data while maintaining its integrity and relationships. The data should be extracted to the format that can be loaded into the staging environment. Legacy data should be extracted with zero to minimum disruption to business operations.

c) Data Profiling

Data Quality refers to the reliability and effectiveness of data. These are the types of potential data quality issues, each of which require a different set of remediation activities:

Data Profiling is an initial assessment of source data to understand and determine any quality challenges. Data profiling benefits include:

Data Quality issues experienced during data profiling should be assigned to the respective data steward using the defect management process outlined in the Data Quality Plan.

When possible, users should profile with a vendor tool to get greater accuracy, repeatability, and productivity.

d) Design Staging Area

The Staging Data Structures should house the interim data during the (trial) migration, and the procedures (staging, cleansing, conversion, migration, reconciliation and auditing) must populate the structures throughout the migration(s). To ensure the success and validity of the data migration procedures, it is essential that the staging environment mirrors the production environment. Otherwise, the results of the trial migration might not be representative.

e) Design Data Dictionary

A Data Dictionary or metadata repository is a centralised repository of information about data, such as its relationship to other data, its format and default values.

Due to the difference between the source and target data models, a data dictionary should be formulated to align the differences for most data migration projects. The data dictionary repository is housed within the staging area; it is presented to the staging team through a configurable GUI. The following configurations can be found in a data dictionary:

Formulating the Data Dictionary is a complex exercise that requires attention to detail and a comprehensive understanding of source and target data models by the staging team. The definitions in the data dictionary will be used by procedures in the Implementation Phase to enrich, transform and clean legacy data.

3) Data Migration Implementation & Validation

Action is the foundational key to all success.
Pablo Picasso

The Implementation Phase commences with receiving profiled and cleaned data into the staging area, and ends with the validation of the migration to the target system as depicted in Figure 3.

The activities performed during the Implementation Phase should be tracked and monitored in the form of a checklist and timing sheet artefact, maintained by the Migration Manager. These artefacts can be used as a reference to improve efficiencies prior to subsequent simulations, and give the migration team and business stakeholders a close to realistic timeline for Migration Go-Live.

Although not every implementation and validation activity may occur for every migration, the ones that do occur generally follow the sequence below:

a) Analysis and Data Enrichment

When Data is being migrated from multiple sources to a single target system the staging area acts as an integration platform. As part of the integration effort, analysis will have to be done on the metadata from the different source, to identify the most accurate and relevant metadata that can be transformed to the target system.

Analysis efforts are also spent to derive missing metadata for source systems that are mandatory in target by identifying patterns and relationships between data. Due to the volumes of data involved in multi-source migrations, the analysis effort can be made more efficient by providing the functional team or data stewards with a rules-based engine. This engine can be configured to apply pattern-and keyword searches that will filter data relevant to the analysis.

b) Clean and Transform

The Data Cleaning and Conversion Procedures are applied in the staging area to resolve data quality issues, and transform legacy data to be compliant with target system requirements. The mappings and translations stored in the Data Dictionary are referenced for transforming data.

c) Verification

Check point reports can be generated from the staging environment for Functional Teams, to verify the accuracy, integrity and data quality tolerance factors of data after the data enrichment, cleaning and transformation efforts. Based on the verification results, the Project Management Team (along with Business Stakeholders) can make a decision to proceed with the migration of the data to the target system or – if further analysis, cleaning and transformation are needed – to improve the quality of staged data.

d) Load

Data Migration procedures can be applied to migrate the finalised data from the staging environment to the target system. Two approaches can be taken to load data to the target system:

  1. Preferential Phased Loading: In cases of data migration projects with large data sets, loading of data to the target system is preferentially phased to save time, prepare the target system and identify issues early in the loading process. System-Critical/ Master Data is preferentially loaded to the target system, while Operational Data might still be cleansed and transformed in the staging area.
  2. Complete Loading: In Data Migration projects with smaller data sets the loading of data to the target system can be achieved in one go.

e) Synchronise

In case peripheral systems (that weren’t migrated) have to interface with the target system, the target master data and other referential data should be synchronised to the peripheral system from the staging area.

f) Test and Reconcile

Post-migration functional testing of the target system should be done by the Quality Assurance Team. Test reports should be produced for review by the Functional Team and Business Stakeholders.

The integrity and quality of the migrated data can be verified by the Functional Team and Business Stakeholders using reconciliation reports generated from the staging area, as well as the target system. The reconciliation reports should highlight un-reconciled records and reason for mismatch, which can be used as a reference to rectify mismatches.

g) Switch over

It is near to impossible to achieve 100% migration of data from source to target. If the errors in the functional testing and un-reconciled records are within the tolerance boundaries of the migration acceptance criteria (defined in the data quality plan), business can make a decision to switch over operations to the new system with the assurance that any defects will be fixed and migrated within an acceptable time frame. Alternatively, if the defects are beyond the tolerance boundaries (for example, they will affect service delivery), then business can make a decision to continue operations on the source system until confidence in the migration is regained.

Conclusion

Considering the advantages of leveraging staging for Data Migration, there are many business advantages over more traditional integration approaches, including:

References

The following sources contributed to the content and/or formatting included herein:

Renosh Mathew

Renosh Mathew BSc(Hons)

Senior Systems Analyst(JHB)


Email Us

Fill in your details and message below to contact us directly

 










Loading