We have been talking a lot about
DataCenter, how to Manage it? Every DataCenter has a physical infrastructure
involved in it, which, by definition, includes:
ii. Cooling
iii. Racks and physical structure
iv. Cabling
v. Physical security and fire protection
vi. Management systems
vii. Services
To manage all these layer, ITIL
framework has been in use, considering all aspects of it. Now, ITIL (Information Technology Information
Library) is Not a Standard but Framework;
and you implement pieces which are relevant to your business. ITIL consist of
two aspects; one is Service Support Process, where focus is on End User and other is Service Delivery
Process, where focus is on Business owners. They include following management
process under them:
- Service Support process - Incident, Problem, Change, Release, Configuration Managements
- Service Delivery process - Service Level, IT Service Community, IT Financial, Capacity, Availability
Now, I wont be covering the
in-depth study about ITIL, that’s a different discussion all together. While
Managing DataCenter only following process:- Incident, Availability, Capacity
and Change Management. All process are inter-related via process flows
A brief description about each
process w.r.t. DataCenter is mentioned below:
1. Incident Management – bringing business
back to normal with minimum impact on business
a. Why: Monitor events alarms of physical
infra such as of hardware, network, power etc
b. Challenges: identifying location, owner
of incident, identifying severity, and executing action plan to fix the problem
c. Solution:
i.
System level view of inter-related components
will give overview of location along with impact of individual components.
ii.
Its best practice to mention system owners in
inventory/asset management tracker. E.g. Blade enclosures OA gives you option
to mention its rack details along with Point of Contact details. UID lights are
also helpful in remotely identifying the equipment with the help of local hands
and feet support. Note that responsibility is often shared so as to avoid
single point of failure. ARCI (Accountable, Responsible, Consultant, Inform)
matrix should be defined and available as reference in case of any Incidents. Its
also helpful when you need approval to apply break-fix solution or driving
change.
iii.
It’s a good practices to use system defined
alerts (High Medium Normal) along with Business SLA to as reference points
while defining the prioritization of Incident.
No comments:
Post a Comment