1.        Definition

In this document context, ‘MFGPRO ERP System Disaster’ refers to the situation where the total system is unavailable to the users either due to hardware, network, or software problems and recovery is estimated to take more than 48 hours

‘MFGPRO Disaster Recovery’ is to offer a temporary measure/workaround solution to the organization to enable partial delivery of the processing service.     

2.        Objective

The objective of this plan is to provide a guideline to the organization to continue managing the business through MFGPRO and to minimize the disturbance to manufacturing operation in the event that the MFGPRO system is totally unavailable and cannot be recovered within 48 hours.

The service provided during the disaster recovery is of survival nature, may not extend to every user, and may require end-users to re-entering the backlog transactions at the commencement of the disaster recovery and at re-starting of the normal operation.

The recovery option will be varied in nature depending on the cause of the outage. This may require set-up of LAN server, secondary processing centre etc.

3.        Applicability

This plan is applicable to Company when all of the modules in MFGPRO system running on HK server are totally unavailable to all of the users and are estimated to take more than 48 hours to recover.

For outage below 48 hours, users should execute the manual procedures and corresponding transactions are accumulated for inputting back to the system as soon as it is recovered.

IT will be responsible for updating and maintaining this plan while operational management is responsible to ensure appropriate manual process/procedures are developed and in place for both outage below and above 48 hours.

4.        Facilities Description

4.1 Warm Backup Server

In HK, there are two servers for MFGPRO, one is for production system & another for system development located in HK Office. These two servers are of different configuration, however, could be accessible through the LAN.

The development server will be upgraded with more memory to support upto 24 users in order to operate as the backup server if the secondary processing centre is needed.

4.2 Power Supply

  • Currently the electricity to HK Office is supplied directly from HK China Light. In the event of power outage, secondary power supply (also from HK China Light) will be activated. During the interim period before the secondary power is up, MFGPRO will automatically switch on to the attached UPS battery and the normal MFGPRO shut down procedures will be automatically executed and completed within the 30 minutes emergency power supply time.
  • In the event that the alternative supply also fails, operation will be closed down.

4.3 Cold Site Backup

HK-IT has an agreement in place with the IS department of China Factory. They have agreed to provide us a cold-standby in case of any system failures in Hong Kong Office.

4.4 Test Drill

There is a separate document for disaster recovery test drill procedure for warm-backup environment, which described how to execute the DRP in real-life cases. IT policy requires annual execution of the disaster recovery drill in a testing environment in order for the IT department and other key stakeholders to prepare for potential real-life disasters. The drill run is normally scheduled to run in middle of year.

5.        Triggering Events & Recovery Options

5.1) Power Failure > 48 hours

Possible Incidents :

This refers to the situation when the primary power supply to the main computer room or to the building where the MFGPRO server is located, is out of order and will not return to normal condition within 48 hours. The secondary power supply is either not working or is unable to reach to the computer room.

Risks Assessments:

This is considered to be a ‘Medium’ risk as it should be a rare ocassion for such happening as well as for the ‘Power’ company to fix a problem more than 48 hours. The impact will be for the IT applications.

Recovery Options :

Under this situation, IT will relocate the production server to the anotherHK office to provide processing services to the users in HK, and China factory. A control room will be set up by IT to house couple of workstations to accommodate users.

Workstations and printers are needed to be reconfigured as the server moves.

All of these actions should be completed within 4 hours.

5.2) LAN outages > 48 hours 

Possible Incidents :

  • This refers to the situation when the Local Area Network including hub or LAN server either in office building are down and could not be recovered within 48 hours (ie. New LAN server cannot be activated in 48 hours which should be a very rare situation).

Risk Assessments :

The risk is considered to be ‘low’ due to the recovery nature of the infrastructure concerned and time to fix for such a problem is usually within four hours.

Recovery Options :

Office LAN is down

  • Swap the remote access system on factory side to the office and set up dial-up capability on five workstations for accessing to the MFGPRO via the remote access system on Office building.
  • Set up dial-up capability on five workstations in office building for accessing to the MFGPRO

All of these actions are required to be completed within one hour.

5.3) Hardware failure > 48 hours

Possible Incidents :

This refers to the failure of the server on which the production version of MFGPRO is operated either due to disk crashes, CPU errors, RAM failure, etc.

Risk Assessments :

This is considered to be ‘Medium’ as the spare part availability may not always be available in 48 hours and the type of failure is not uncommon..

Recovery Options :

IT will set up the secondary processing centre within 8 hours to provide a maximum of total 24 user connection for users in all of the buildings in HK and China.

5.4) WAN failure > 48 hours

Possible Incidents :

This refers to the situation when the WAN is down either in HK office due to hardware or carrier issues. For example the router is not working, or the physical line connection fails

Risk Assessments :

The risk is rated as ‘high’ given that a lot of dependencies are with external suppliers and particularly in China, the telecommunication is not so responsive as compared with Hong Kong or elsewhere.

Recovery Options :

IT will set up dial-up capability on pre-defined workstations (as follows) in the affected locations to access the central MFGPRO server in Hong Kong.

The dial-up capabilities should be available within one hour and in fact, should commence as soon as the WAN starts failing to work.

5.5)  MFGPRO and/or Progress and/or Operating system corruption > 48 hours

Possible Incidents :

This refers to the situation that the software corrupts and is unable to recover through the daily system and data backup.

Risk Assessements :

The risk is ‘low’ as the chances of recovery through backup is high.

Recovery Options :

Manual procedures need to be executed. Transactions should be accumulated for capturing back to the system when it resumes normal.

6.        Secondary Processing Centre – Priority and Decision Making Process

When any of the conditions as described in section 5 arises, the disaster recovery team will be formed and actions will be taken accordingly. This team will be resolved when the operation returns to normal state. Depending on the situation, secondary processing centre will be set-up to provide access to limited number of users in the organization.

On disaster recovery, priority will be given to Order Desk and Store/Shipping to access the system. However, should the facility allow, other functions will be included.

7.        Command Structure

The Disaster Recovery plan is operated upon team structures, specifically a hierarchy of teams, rather than on individuals. It has two levels.

Disaster Recovery Management Team :

Disaster Recovery Team:

Disaster Recovery Manager    :

Operation Representatives :

IT Representatives  :

8.         Roles & Responsibilities

8.1) Disaster Recovery Management Team

  • Act as the owner of the recovery operation.
  • Inform the organization of the disaster and the arrangement of the recovery actions.
  • Determine who receives priority in accessing the system.
  • Regularly receiving update from the Disaster Recovery Manager and make decisions as to the resources allocation and additional required financial funding.

8.2)    Disaster Recovery Manager

  • Act as the focus point in the organization and co-ordinates with various departments/functions.
  • Manage the team recovery operation to assure rapid, smooth and effective response.
  • Conduct daily control meeting with all of the sub-teams to review issues and assign action items.
  • Report daily to Disaster Recovery Management Team on the key issues affecting the continuity of the business operation.

8.3)    Operation Representatives

  • Co-ordinator with individual user department in switching to the ‘disaster recovery’ mode orderly.
  • Work with IT team to conduct simple testing on setting of the backup server.
  • Co-ordinate with individual user department to capture backlog transactions and perform test if necessary.
  • Work with individual user department to prepare a document to outline how the business transactions are managed on ‘disaster recovery’ mode.

8.4)    IT Representatives

Technical Team

  • Ensure hardware (server & PC) and network recovery
  • Ensure operating system recovery
  • Provide technical support to the Disaster Recovery team
  • Ensure recovery of the system and database
  • Provide application support to the Disaster Recovery team

Application Team

 

9.        Secondary Processing Centre Set-up Procedures

  • When the condition 5.3 as described in the section 5 ‘Triggering Events’ occurs, a secondary processing center should be set up.
  • Disaster Recovery Manager will call the service from each of the sub-team in one hour and conduct the first working meeting with the sub-team’s in-charge to determine the following :
  • Set up a control room to manage the recovery operation and a help desk to answer the user inquiry.
  • Assess the IT environmental situation including backup server, network and client PC to determine the location of the secondary processing centre.
  • Confirm the availability of the latest system and data backup including the UNIX OS.
  • Assign users to be connected to the backup server
  • Determine a rough task plan and the tentative schedule for completion
  • Set the daily meeting schedule preferably at 5:00pm
  • IT Technical team should start perform the following in 4 hours :
  • Run the diagnosis of the backup server to ensure no hardware errors
  • Remove any systems from the backup server
  • Re-install the same operating system (if necessary) as the production server and test it.
  • Re-set up Network server if necessary using an existing Dell P-166 PC with 64MB RAM & 4.3GB harddisk.
  • Re-configure the client PC to connect to the backup server and test it
  • Re-set up the printing connection for the client if necessary
  • Ensure connection to WAN.
  • Inform the Disaster Recovery Manager on completing step 3 and step 6.
  • Assign a dedicated persons to join the help-desk to be set up to answer questions (or support) arising from the use of the backup facilities.
  • IT Application team should start the following after the technical team run the diagnosis, and complete it in 4 hours
  • Restore the latest system and data backup to the backup server
  • Perform a preliminary test to ensure
  • the access is okay,  and
  • database completeness (comparing the record before and after)
  • Work with Operation Representatives to conduct a quick checking of the data before it is released to the operation.
  • On completion of the above, inform the Disaster Recovery Manager.
  • Assign a dedicated person to join the help-desk to be setup to answer questions (or support) arising from the use of the backup facilities.
  • Operation Representative contacts the head of each of the department to inform of the emergency arrangement of the workstation for accessing to the backup server. On  notification by the Disaster Recovery Manager that the backup server is in operation, operation representative of each department would arrange for the following rough test:
  • Review the last three transactions to see whether they are there
  • Operation Representative to inform the Disaster Recovery Manager whether the rough testing done by the users is positive or not. If there is no transaction loss, the backup server will be released to the users through the User representative.

The Disaster Recovery Manager would formally announce the backup server in operation.

10.     Appendix A – MFGPRO Backup Procedures

10.1 Daily Backup

  • Backup data will be done every day including Sunday except Saturday.
  • Backup job is scheduled at midnight 0:00 using UNIX crontab scheduler.
  • Two sets of 6 backup tapes will be used on rotation basis. One set is used for one week, one tape for each day, so 12 tapes will be used on a rotation basis.
  • Following is the backup schedule.
Monday Tuesday Wednesday Thursday Friday Saturday Sunday
Backup Required Backup Required Backup Required Backup Required Backup Required No Backup Required Backup Required

 

  • The Sunday backup will be stored off-site at the Office Building computer room in a fire-proof safe.
  • New tapes should be used at the beginning of each year to minimize potential data loss due to media failure.

10.2) E450 O/S system backup

  • System backup should be taken on a 2-monthly basis and will be performed by the System administrator. Two generations should be kept.
  • New tapes should be used after it is being used for two years.

10.3) Month End backup

  • Before the starting of the accounting month end process, a copy of the data will be taken. Month end backup will be kept for 3 years, so altogether 36 tapes will be used on rotation.
  • The Month/Year end backup is executed in according to the Calendar. e.g Friday on following week code
Week Code Week Code Week Code
WW?04 WW?08 WW?13
WW?17 WW?21 WW?26
WW?30 WW?34 WW?39
WW?43 WW?47 WW?52 or WW?53If WW?53 exist.
  • Renew new tape as necessary.

10.4) Year End Backup

  • Before the accounting years end process started, take a copy of the data
  • Year-end backup tape will be kept for 7 years, so 7 tapes will be used on rotation.
  • Year-end backup is taken in according to the Calendar e.g. WW?52 or WW?53 if exist.
  • Renew new tape as necessary.

10.5)    Backup Tape Storage

  • The pervious set of daily backup tape will be stored in an anti-fire safe.
  • The current set of daily backup tape will be stored in  I.T. room
  • All monthly and annual backup tapes are also stored in the anti-fire safe.

10.6)    Backup Log

Details of the backup should be logged on the following respective log sheet on a timely basis. Assistant Manager, Technical should conduct a weekly review of the log sheet as well as the monthly checking of the last system and data backup to ensure the backup job is working properly.

  • Daily backup log sheet
  • Monthly backup log sheet
  • Yearly backup log sheet

10.7)   Backup Hardware Repair/Maintenance record

  • Repair/Maintenance details should be recorded on a log book for future reference. This log should be reviewed by the Assistant Manager – Technical on a monthly basis, and follow-up any outstanding items.

10.8) Archive Mfg/Pro History

  • Before Achieve the MFGPRO History, relevant owners/users should be consulted and agreed via email or other paper form. One day prior to the archive date, relevant department should be reminded through email of the action.

The type of transactions for regular archiving is :

  • Work Order Delete/Archive
  • Operation History Delete/Archive

10.9) Re-load Testing

  • Once a month, the Assistant Manager, Technical or his delegate load the archive data into testingmachine: using following command

tar xvf /dev/rmt/0m /data/vol1/*.* /data/vol2/*.*

  • Get the Assistant Manager, Systems (MFGPRO) to review the restored data to see whether they are complete.

10.10) Retrieve History Data

  • In the event that retrieval of the archived history Data, user should fill Retrieve History Request form and submit it to the Assistant Manager, Technical for processing.

11.    Appendix B – MFGPRO Recovery Procedures

1)    Checking the Backup Machine Environment Setup

The Backup Machine address, which is normally assigned to MFG/PRO customerization program development. Before proceeding to the set-up of the secondary MFG/PRO processing centre, the following setting should be checked:

1.1)     The MFG/PRO programs are same as those in Production machine

1.2)     Activate the temporary MFG/PRO user ID

2)        Reload the MFG/PRO data using command: tar xvf /dev/rmt/0m /data/vol1/*.* /data/ *.*

3)        Start up the server using command: sh /mfgproce/85InstallDir/start.Conv8

4)            Test the Recover MFG/PRO System as:

Login using the test ID, then check the last inventory transaction date and last feedback transaction date in order to confirm the last update date of the recovered System.

5)            Announce production of the Recovered MFG/PRO System.

6)            When the production machine has been fixed and resumed normal, stop the recovered MFG/PRO System and take a backup as follows:

– Run the command : sh /mfgproce/85InstallDir/stop.Conv8

– Backup the Db       : tar cvf /dev/rmt/0m /data/vol1/*.* /data /*.*

7)        Reload the backup MFG/PRO Database on the backup machine to the production machine using command : tar xvf /dev/rmt/0m /data

8)        Start the MFG/PRO Database and test the Reloaded MFG/PRO Database

9)        Announce the production MFG/PRO System back to normal and re-capture back the transactions received during the recovery period.