We have been talking a lot about DataCenter, how to Manage it? Every DataCenter has a physical infrastructure involved in it, which, by definition, includes:
iii. Racks and physical structure
v. Physical security and fire protection
vi. Management systems
To manage all these layer, ITIL framework has been in use, considering all aspects of it. Now, ITIL (Information Technology Information Library) is Not a Standard but Framework; and you implement pieces which are relevant to your business. ITIL consist of two aspects; one is Service Support Process, where focus is on End User and other is Service Delivery Process, where focus is on Business owners. They include following management process under them:
- Service Support process - Incident, Problem, Change, Release, Configuration Managements
- Service Delivery process - Service Level, IT Service Community, IT Financial, Capacity, Availability
Now, I wont be covering the in-depth study about ITIL, that’s a different discussion all together. While Managing DataCenter only following process:- Incident, Availability, Capacity and Change Management. All process are inter-related via process flows
A brief description about each process w.r.t. DataCenter is mentioned below:
1. Incident Management – bringing business back to normal with minimum impact on business
a. Why: Monitor events alarms of physical infra such as of hardware, network, power etc
b. Challenges: identifying location, owner of incident, identifying severity, and executing action plan to fix the problem
i. System level view of inter-related components will give overview of location along with impact of individual components.
ii. Its best practice to mention system owners in inventory/asset management tracker. E.g. Blade enclosures OA gives you option to mention its rack details along with Point of Contact details. UID lights are also helpful in remotely identifying the equipment with the help of local hands and feet support. Note that responsibility is often shared so as to avoid single point of failure. ARCI (Accountable, Responsible, Consultant, Inform) matrix should be defined and available as reference in case of any Incidents. Its also helpful when you need approval to apply break-fix solution or driving change.
iii. It’s a good practices to use system defined alerts (High Medium Normal) along with Business SLA to as reference points while defining the prioritization of Incident.