White Papers

The new Sciance of Information Security

Planning for a Data Migration Project

Is Offshoring for you?


Useful Links

Glossary of Internet and Web Jargon

Search Engines: A comparison

How to use Subject-Focused Directories

Web graphics resources
Template Samples

Developrs: Register with Us
Planning for a Data Migration project

Organizations seek the ultimate database platform.  This is the platform that will hold all current and historical data, be accessible from all sources while providing the needed security to thwart unauthorized retrieval.  This platform, of course, does not exist.  So, companies are forced to move their data from location to location. This article outlines the steps and strategies to be employed to successfully migrate files and databases from one platform to another. This article describes how to organize and perform a successful data migration project. First, we will focus on the planning stage.  The issues that need to be addressed, the scope of the project, and the players that should be involved will be identified. Finally, we will detail the project plan itself, testing, and the budget.

Data migration involves moving data from one platform to another. A platform can be a file or database format, an operating system, or an encoding scheme (such as EBCDIC, ASCII, or DBCS). There are two forms of Data migration. In a conversion, the organization plans to move all the data to a new platform and remove the old platform altogether.  A VSAM to DB/2 conversion will remove the VSAM files(s) when the conversion is completed. Some data conversions are “partial”.  For example: Some of the fields in a customer master record may be removed and placed in a different database.  In a partial data migration, every program that references the fields to be moved has to be identified.  The program logic then has to be modified to retrieve the data from multiple sources.  This adds a level of complexity that requires extra testing.

Extracts will take data from one platform and copy it to another. The extraction process can occur on a timed basis, such as batch, or in real time.  They are performed for a variety of reasons.  The most common is to provide the data on a separate platform that is unable to access the data in its original form.  Some organizations extract their data to a medium that lends itself to faster processing.  For example: Unloading a database table to sequential media to speed up the batch cycle.  They can also be used to send data to outside organizations. XML is becoming a popular standard because it provides a common, self-defining, format that outside agencies can utilize.

A successful data migration project involves several major phases.  The planning phase is the most important.  With proper planning, you can build an accurate budget and the development and production monitoring phases will be less prone to problems.

The first step in building a plan is to Define Your Goals. Provide justification for the data migration project.  Why does the migration need to take place? Identify the benefits to be gained, both technical and business-wise. The migration can provide the organization with better security or compliance with new laws.

Define in detail the behavior of the end product.  Specifying that the presentation systems must work the way they did prior to the migration may not be practical. Changing the data’s encoding may affect the collating sequence which will affect the order that data is presented. Relational databases offer more functionality but tend to take longer to access the data. Generally, the more functionality available by the DBMS, the more storage space required. Migration of data will always result in some changes.  The structure of the data; the physical way the data is stored, may change. The organization of the data; one table can be split into multiples, a sequential file might be transformed into an index schema, etc. The encoding of the data;  data stored in EBCDIC format may be changed to ASCII.  Date fields may be stored differently. The accessibility of the data; will the data be protected in a different manner? Will the user community have greater (or less) access?    How will this affect the user’s experience?

Goals need to take into account additional issues. In this day and age of greater compliance and security, protecting the organization’s data is more important than ever. In the early days of data processing, master files were kept on tape. The computer operators would mount and re-mount the tapes as often as needed.  Today, master files are stored as relational databases with near instant availability.  Sequential file processing is still the fastest way to step through the data but, its not very practical for online and web access.  Ever since the first days of ISAM, a battle has raged between performance and functionality.

 It is not enough to know what you are starting and ending with.  The path that will be taken is crucial.  Migrations do not happen overnight, and the “slam-dunk” strategy (i.e. everything is implemented at once) is both impractical and dangerous.

The strategy has to include a back-out plan.  Regardless of the degree of planning before-hand, the unforeseen always happens.  For each step of the migration plan, consider the implications of backing up and redoing it.  The original strategy may have to be rethought. A data migration project takes place over time.  Individual parts may be implemented separately.  How does implementing one part affect the original data processing (and access) methodology?  For example: the nightly batch cycle may have to change repeatedly as the various portions are implemented. How difficult will it be to undo parts of the implementation.  A problem in one portion of the migration may affect a prior portion.  This might require revising several portions of the migration plan. What are the issues raised by performing a step-wise implementation?  Does the data need to coexist on multiple platforms during the migration?  Will it create a maintenance nightmare? How will the organization maintain availability when the unforeseen happens?

The next planning step of phase I is to determine the scope of the project. Identify the databases to be migrated, the programs and utilities that access those databases, and the user community that uses them. Take a complete inventory of each database or file to be migrated. This is fairly straightforward.  Data is not isolated though.  Partial migrations can fail because ancillary databases were not taken into account. Identify the related data. For example an invoice file refers to customers.  So, converting the customer master can have an impact on the invoice master. Inventory the programs and jobs that use the target data and all related databases. Additionally programs that provide input and receive output from the affected programs should be inventoried.

In many organizations the user community has direct access to the information stores and may have developed their own programs. For example:  SAS is a popular end-user analysis tool.  These user-developed programs (generally known as ad-hoc) are difficult to identify.  The original user may have been promoted or left the company or the user may not even be readily identifiable.  Migrating ad-hoc user extracts and programs is exacerbated by users that adhere to no universal standards. Additionally, user cooperation and participation can be difficult.  Data management is not part of their job description.

Most data migrations involve some form of reformatting and cleanup. The old DBMS may be more lenient with regards to missing data.  Changing platforms may cause data to be stored differently.  For example: a 4-byte binary number on a mainframe is stored high digits first whereas on an Intel-based platform, it is stored in byte-reverse order. So, redefining binary data as a string and manipulating it will corrupt its value.  One choice is to first convert the data to a common format before importing it to the new format. 

A successful migration plan will need to account for the above. 

There are many tools in the marketplace that can assist and streamline the migration process.  When evaluating tools, identify what parts of the data migration plan are addressed by the individual tools. What are the costs of the tools in terms of both time and money savings.  A tool that creates the new database and migrates the data can be a real time-saver.  A tool that provides transparent data access can be useful in the short-term but, its cost may be prohibitive. Will the tool help the migration succeed? How?  A data transparency sub-system can ease the pressure of keeping the data converted in sync with the programs that use it.  Then, the programs can be converted separately. Will the tool be used strictly as a migration aid or will it have to be a permanent addition?  A data conversion program will no longer be needed after the project is complete but, a performance monitoring tool is be useful on a more permanent basis. Additionally an organization can develop their own tools.  This approach creates some new dangers.  The developed tool(s) may not work properly.  “Fixing” the tool in the middle of a migration will add a great deal of time and cost to the project.

Who will be involved in the migration?  Does the organization have the resources in-house to do a migration without impacting ongoing maintenance and business needs?  If an outside agency will be brought in, identify their specific role. There are key roles that need to be filled,  some of these can be performed by the same players but, others should not.

  1. The strategic team that recommends the project
  2. The executive team that approves the project
  3. The management team that oversees the project
  4. The analytical team that designs the project
  5. The technical staff that converts the data, programs, and operational methodology.
  6. The acceptance team that verifies that the goals of the migration are being met.
  7. The Documentation group that alters existing documentation and creates turnover documents.
  8. The Operational staff that processes the data.
  9. The user community that is expected to use the results of the migration.
  10. The support staff that resolves post-implementation problems and training.

The acceptance team should be separate from the technical staff. Within a team, there should be some overlap in abilities.  A key player getting sick for a couple of days can hold up the entire project.

A data migration (or the more popular “modernization”) project is very repetitive.  Many files, programs, and run procedures are modified in essentially the same way.  The project designers can set up a general template that identifies the steps each file and its associated components must undergo.  Each file or table to be migrated requires modification to, and the possible creation of, several components. For each file to be migrated insure that all the appropriate tasks have been identified. 

  1. The file or table
    1. Modify or remove the original file and the utilities that maintain the original file or table
    2. Create the new database and the Utilities that the data and maintain the table
  2. The programs that access the file or table
    1. The source will need to be upgraded
    2. The run streams for batch
    3. The access definitions (ex. CICS definitions)
  3. Software and Hardware
    1. Purchases
      1. Temporary purchases that facilitate the migration such as extra DASD or migration tools
      2. Permanent hardware and software such as performance monitoring tools and the new DBMS itself.
    2. Retires - such as software that will no longer be needed after the migration.
    3. The effort required to install and remove software and hardware
  4. The Security profiles of the user community
  5. Documentation
    1. Table
    2. Program
    3. Processing
    4. User

The number of individual tasks in a migration project can be daunting. A simple migration project consisting of 2 files and 10 programs can consist of over 50 individual tasks.  Choosing the right Project Management System (PMS) can ease the effort. There are many project management systems on the market.  A Google search of the phrase “Project Management System” will return well over 100,000 pages. This is far too many to perform a thorough evaluation.  Evaluate systems that offer trial downloads and stress the ability to modify the project on demand.  Many Project Management Systems stress team development and browser access.  These are useful features but, of secondary importance.

Include tasks and time estimates. Avoid long tasks or tasks that require multiple people.  One consulting company issued an edict that no task could take more than 80 hrs.  The finer the “granularity” the more visible the slippages.  The repetitive nature of a migration means that individual task slippages can be multiplied many times. Each slippage needs to be analyzed for its impact on other, similar, tasks. Unlike a development project, backing out one or more tasks and redesigning and redoing them is much more likely in a data migration project.  Most project management systems don’t make provision for task failure within the project definition. 

Identify “dead time”.  That is time when people are idle, waiting on the completion of some prerequisite task. There is a big difference between effort time and elapsed time.  A task that takes one day of effort can take 2 or more days of wall clock time.  This extra time is usually “dead time” and will make a project slip. Many project planners assume that individuals are productive every hour of every day.  Account for non-productive time such as breaks and reporting.  In an eight hour work day, a person may only have six productive hours.

The project should include tasks for installing and supporting temporary software and hardware needed by the migration. At the end of the project there should be tasks for the removal of software and hardware that will be retired.

The project plan has been created and approved.  Everyone knows their role and are poised to begin the work.  The test plans are in place.  But, what about the project plan itself?  On large data migration projects, the plan itself should be tested.  How?  Select a small isolated sub-system and perform a “pilot migration”.  The pilot migration can act as a “proof-of-concept” to shake out gaps and oversights in the overall project. 

The first steps in implementing the project plan are to test its viability and verify the estimates.  Time estimates are usually based on the individual’s belief in their abilities and their desire to appear competent.  Take a small portion of the migration and perform it as a pilot project or “proof-of-concept”.  Assume there will be problems and oversights.  The issues uncovered should then be resolved and the plan adjusted to account for the changes.  If the pilot project uncovers many shortcomings; a second pilot project may be in order.

The first question an IT manager will ask when presented with a request for a migration is “What is this going to cost?”  The budget should encompass the total impact on the IT department.  Although, a budget that accounts for only the out-of-pocket costs will appear attractive, the full financial impact on the organization needs to be considered.

There are three major sections to a budget.

  1. Labor costs –  The people cost.  Any good Project Management System can provide these numbers.
  2. Purchase and/or Lease price of temporary or permanent Software and Hardware
  3. Indirect costs – Overhead costs like office space and supplies

Include a section on savings.  After the migration is completed, software and hardware that was in place prior to the migration may be retired. Staff that was devoted to maintaining the old  file structures may be reassigned.  These long-term savings can help offset the costs of the migration.

Develop a detailed budget. Many people estimate the total cost of the project and add a “fudge factor” like 10% to cover overruns.  That way if the project is completed on time, it is under budget.  Avoid “fudge factors”.   There are several ways to create the budget. Develop step-by-step cost estimates.  Then add the budget for each step to arrive at an overall number. Use the total elapsed time of the project.  The “bottom-line” will then reflect the total cost. A third way is to determine the percentage of the total available IT resources that the project will consume.  Apply this percentage to the overall IT budget. Build the budget several ways. You will find that the budgets will not quite match. The difference can reflect “dead” time and oversights within the project.  That is time when resources are idle but, not reassignable, waiting on some other task to complete.

There are two types of costs. Hard dollars are direct costs such as salaries and software fees. Soft dollars are costs like office space and computer time. Many people ignore the soft dollar costs figuring the organization is spending the money anyway.  Realistically, these resources could be used to support other projects.  The budget can be split into two parts, one for the hard dollars and one for the soft dollars. Include costs for tools, staffing, and the financial impacts to backing out and redoing the steps of the project.  Include the cost of support services like office space, telephone, management review, tracking the project, and reporting. User involvement in testing and design should also be included in the budget.  Keep in mind that the user community will probably not provide funding for this.

A data migration project can seem like a daunting task.  Modernizing even one file can have a far-reaching impact.  It is important to involve all the parties from management to the support staff to the user community.  Unlike a development project where redoing a few tasks can be performed independently of the rest of the project, a migration project is repetitive so, a redesign has to be applied to all similar tasks.  This can have a major impact on the project timeline. Project slippages can be minimized by thoroughly testing the plan before committing the organizations full resources.  Choosing the right tools can provide extra benefit by standardizing tasks and automating some of the repetition.  The entire IT department has a budget that determines how resources can be utilized.  Develop a comprehensive budget taking into account all the resources the project will consume and any savings after the project is completed.

Online Networking Inc.   dennis@onlinenetworking.org