Services are most vulnerable during change. Continuity of service needs to be ensured during change, and large portions of several ISO and BSI standards are focused on proper management of change.
However well controlled, an incident can occur during the change, thus causing failure of service. We will discuss the IT Service change planning process from a point of view of preventing unplanned downtime in case of problems.
Ensure continuity of IT Service planning process
Most major changes require some sort of planned downtime for an IT service. This downtime may be actually required by the change, or may be of a preventive nature, to reduce pressure to involved parties during the change.
But extending this downtime is very undesirable, and the effects of such extensions can range from customer dissatisfaction to regulatory or contractual penalties. In order to ensure continuity of service as planned, the following process can be applied:
- Identify the time window available to apply the change – this is the time period of the ‘planned downtime’, or period during which the change will impact a minimal number of customers. Breaking this time window will put the service in an undesirable failed state for customers.
- Have a very detailed plan of the change – Each step of the change needs to be described with actual actions, responsible persons and tasks.
- Time every step and confirm- calculate the time required for each step of the plan. Lean towards pessimistic timing when unsure of actual time required.
- Assess risks – Assess risks at each step, and identify mitigating measures. After that, identify which steps have remained critical and can cause significant problems and delays.
- Define corrective measures – for each critical risk, define corrective measures and steps. For each corrective measure, calculate the time for each
- Prepare a back-out plan – prepare a very detailed plan that will be applied in case there is an incident or problems during the change which prevent the change to be applied successfully in the defined time window of planned downtime. The back-out plan must include any activities that need to be performed by business stakeholders (for example, data entry due to restoring of an older database copy). We call this, the ‘if all else fails we need to keep working’ plan
- Time the back-out plan – Calculate the time needed to implement the back-out activities
- Check whether your plan and back-out plan fit within the time window of planned downtime – With all the elements prepared and timed, add up the timing. Your time window must accommodate the times of the plan of change as well as the back-out plan within the time window. The total timing window can be viewed in the diagram below
- Calculate the point of mandatory back-out – In any change, should something go wrong, you can attempt to fix the issues as long as you still have enough time to back-out before the time window of planned downtime expires. In other words, the point of mandatory back-out is the moment at which you must start the back-out plan and be up and running.
- Start implementing
Cross-posted from ClearMorning