Saturday 29 November 2014

Managing Your DataCenter Physical Infrastructure - Part 3

4.       Change Management – It’s the process of making changes in environment with methods or procedure with lowest possible impacts on business. Every change requires preparation, planning, simulation for expected results and verification. As it involves business, every change requires approval from stake-holders too
a.      Why: To minimize the unplanned downtimes
b. Challenges: executing movement/addition/changes without impacting availability or business, maintaining compatibility within eco-system of IT Infrastructure including spares while implementing upgrades or changes
c.      Solution:
                                                              i.      A plan which explains the work-flow for the changes to be performed
                                                            ii.      A proper change management, with business approved downtimes to perform change and to avoid business impacts like off-hours or weekends. A redundant infra or resources would be required to perform change without downtime, which adds to CAPEX and OPEX both
                                                          iii.      HCL or Hardware Compatibility Lists should be referred along with Inter-portability matrix to verify compatibility of components with other components post upgrade e.g. Storage firmware level of Brocade or Cisco SAN switch with Host HBA card or Storage Controller card should be verified before performing firmware upgrade. There would be a separate article to give you some over-view on Change Tech Plans in DataCenter.
                                                           iv.      It is a best practice to maintain even with firmware level when upgrading infrastructure.
                                                             v.      Some examples of Changes required in a datacenter can be relocating the server, patch-panel re-wiring, firmware or software upgrades, addition of resources or components, replacement of faulty components etc. Configuration changes are also considered under change management.

Putting Pieces Together: Most of the Business organization takes the following initiative as their strategy to meet above
·         Implementation of Incident Management process
·         Defining and Measure Availability Targets
·         Monitor and Plan future Road-Maps for Long Term Capacity Plan
·         Implementation of Change Management process

Special Note:
·         It may be difficult to find single tool to manage complete or all layer of equipment in your datacenter i.e. rack space, power/cooling, server, storage, network and application. Usually Datacenter are hybrid and have multi-vendor solution. Integration or coordination between solutions by these vendors may be difficult. 
·         20% Improvement during Designing Phase can remove 80% of the problems which may be caused by them in future, now this statement doesn’t cover only the DataCenter but the process or methods and procedures as well.
·         As observed, there may be overlap of responsibility between different departments like facility and IT, which may cause conflicts. Management decisions would be required for defining roles in such scenarios. It’s the technology, process and people, which together keeps the business up and running
·         Even though Physical Infra or Enterprise Management systems (EMS) sounds similar to Building Management system (BMS) due to components being managed or monitored under them i.e. space/power/cooling, but the focus of both are different. Physical Infra focus on availability of business, while BMS focus on comfort and safety. Integration of both EMS and BMS could be very costly.

Any more questions? please write back or comment here. There are more things to share.. 

Request you to join my group on Facebook & LinkedIN with name "DataCenterPro" to get regular updates. I am also available on Tweeter as @_anubhavjain. Shortly I am going to launch my own YouTube channel for free training videos on different technologies as well. 

Happy Learning!!

Managing Your DataCenter Physical Infrastructure - Part 2

2.       Availability Management – Identifying Availability & Reliability requirements against actual performance  and if required introduce improvement to meet and sustain quality of service
a.       Why: Once availability is defined, SLA should be monitored to analyze the potential downtimes from impact of any individual component or entire system
b.   Challenges: Metrics Reporting, Raising alarms, Planned Downtimes  and Continuous Infra Improvements
c.       Solution:
                                                              i.      A Tool which reports uptime/downtime of infrastructure or service, while identifying the cause of downtime, providing the time-stamp or duration and time it took for recovery. It’s a best practice to configure tools to provide  Pro-active warning
                                                            ii.      A Tool, which doesn’t require special training or expert e.g UPS or Battery health, temperature or humidity, Disk or Power status etc.
                                                          iii.      Unplanned downtime can lead to false alerts. Using Maintenance mode in system units e.g.VMware ESXi, Storage Array Controllers, UPS or Blade Servers etc; A usual mistake often happens is to rollback all changes made to put a system into maintenance, hence need to be done with caution. It is suggested to use tool which provide alert if any condition is left uncorrected post maintenance.
                                                           iv.      To make improvements, the very first step is identify its need along with Risk involved in bringing up the change. Now FMEA techniques may not be known to everyone, hence its is suggested to use a tool which does risk assessment. You can even use health reports as you reference points to bring in improvement. Corrective Measures to Mitigate these Risk will become your Improvement Plan. Note: Improvement should be continuous. Some examples can be: power consumption, Disk full status, Performance reports, Cluster Loads etc.

3.       Capacity Management – providing IT resources as when required at right cost.
a.    Why: Current and Future requirements keep changing and need to monitoring and addressed
b.   Challenges: Asset Management with on-going changes (monitoring, recording, tracking), providing capacity as when required, Optimizing capacity for more ROI and better management, Incremental Scalability
c.   Risk involved: unplanned downtime if resources over-utilized e.g. Power, CPU/RAM in ESXi, Network Bandwidth etc
d.     Solution:
                                                              i.      A tool that performs centralize monitoring for current usage and alerts upon potential over-load of resources e.g. Power & Cooling Monitoring Systems for DataCenter by Emerson/ APC /Schneider-electric, Network Bandwidth Monitor, vCOPs for VMware Clusters, Storage Performance Indicators, HP Insight Managers for HP Blade enclosures or Servers, Brocade Fabric Managers etc.
                                                            ii.      Capacity requirements are tend to miss or not considered during implementation. Hence a tool is required for Trending Analysis , which also alerts on Threshold violations or over-loads. This tool should be referred before going ahead with Future procurements or new deployments.
                                                          iii.      A poorly designed datacenter may requires more resources (server, storage, network, space, power) (High CAPEX) and hence would cost more to operate (High OPEX). Analysis of requirements should be done during designing of datacenter or even during new deployments. It is good to implement Six Sigma DMADV techniques if possible.
                                                           iv.      Weekly/Month/Quarterly reviews on current capacity and usage trends will forecast the future required scalability too. Ideally while designing a new datacenter, every single component (server, storage, network, virtualization platform, space, power) are designed with such a scale that they can either bear 30% incremental capacity (scalability) and should be operable for atleast next 3 years. Even support contracts are considered in the same manner.
                                                             v.      In terms of DataCenter; Location, Power (input, socket), Cooling, Rack Space and Cabling are the major requirement and consideration in terms of Capacity Management; while in terms of Network, Network ports, bandwidth, VLAN, IPs etc. can be considered. In terms of Servers  (Physical or Virtual) , CPU/cores, RAM, Cluster etc can be considered; while for Storage, type of connectivity (FC, NFS, iSCSI, FCoE, DAS), Space required, IOPS, Backup, Recovery & Redundancy options (RAID, snapshots, Clones, replication) etc should be considered.
                                                           vi.      Usually Life Cycle or Capacity Manager track the inventory and usage of their assets, which is a best practice as well. It is also suggested to visualize the impact on capacity with every new deployment.

                                                         vii.      Note that TCO & ROI need to be considered and is a deciding factor when it comes to business decisions. 

Continue to Read..

Part 1: Managing Your DataCenter
                             Part 3: Managing Your DataCenter

Managing Your DataCenter Physical Infrastructure - Part 1

We have been talking a lot about DataCenter, how to Manage it? Every DataCenter has a physical infrastructure involved in it, which, by definition, includes:

        i.            Power
      ii.            Cooling
    iii.            Racks and physical structure
     iv.            Cabling
       v.            Physical security and fire protection
     vi.            Management systems
   vii.        Services


To manage all these layer, ITIL framework has been in use, considering all aspects of it. Now,  ITIL (Information Technology Information Library)  is Not a Standard but Framework; and you implement pieces which are relevant to your business. ITIL consist of two aspects; one is Service Support Process, where focus  is on End User and other is Service Delivery Process, where focus is on Business owners. They include following management process under them:

  1. Service Support process - Incident, Problem, Change, Release, Configuration Managements
  2. Service Delivery process - Service Level, IT Service Community, IT Financial, Capacity, Availability


Now, I wont be covering the in-depth study about ITIL, that’s a different discussion all together. While Managing DataCenter only following process:- Incident, Availability, Capacity and Change Management. All process are inter-related via process flows
A brief description about each process w.r.t. DataCenter is mentioned below:

1.       Incident Management – bringing business back to normal with minimum impact on business
a.    Why: Monitor events alarms of physical infra such as of hardware, network, power etc
b.     Challenges: identifying location, owner of incident, identifying severity, and executing action plan to fix the problem
c.     Solution:
                                                              i.      System level view of inter-related components will give overview of location along with impact of individual components.
                                                           ii.      Its best practice to mention system owners in inventory/asset management tracker. E.g. Blade enclosures OA gives you option to mention its rack details along with Point of Contact details. UID lights are also helpful in remotely identifying the equipment with the help of local hands and feet support. Note that responsibility is often shared so as to avoid single point of failure. ARCI (Accountable, Responsible, Consultant, Inform) matrix should be defined and available as reference in case of any Incidents. Its also helpful when you need approval to apply break-fix solution or driving change.

                                                          iii.      It’s a good practices to use system defined alerts (High Medium Normal) along with Business SLA to as reference points while defining the prioritization of Incident. 

Continue to Read..

Part 2: Managing Your DataCenter
                             Part 3: Managing Your DataCenter

What's Coming Your Way - Overview of ''DataCenterPro'' Contents

Many people are asking me what's new I have to offer in this blog or what's there for them. Well, there is plenty of it. I have already wrote many articles in my repository and posting them approx. 1-2 per day. Shortly I would be giving free sessions via classroom training or web/online meeting groups. I am planning a "Meet-Up" group for this group as well. Soon I will be launching a YouTube Channel with Free Training Videos or session recordings I will be taking up. Please find the list of Articles I would be posting here. Please share your interests or topics you would like to read about.

Social Presence: 
Request you to like my Facebook  page to get automatic updates "https://www.facebook.com/DataCenterPro". I have also created a LinkedIN group as "DataCenterPro". You can also join this website and follow me on Twitter @_anubhavjain. Request you to join these groups and invite your friends, colleagues and mates with similar interest in Technology. 

List of Articles (current list, more to add:

Basics, Fundamentals, DataCenter, Power, Cooling, Rack, Cabling
·         An Overview of DataCenter Physical Infrastructure
·         Overview of Modular DataCenter Architectures
·         Basic 101 – Fundamentals about DataCenter Power
·         Basic 102 – DataCenter Power Distribution & Redundancy (booby bhaiya)
·         Basic 101 – Fundamentals about DataCenter Cooling
·         Basic 102 – DataCenter Cooling Layouts (booby bhaiya)
·         Basic 101 – Fundamentals about DataCenter Racks
·         Basic 101 – Fundamentals about DataCenter Cabling Strategies
·         Labeling Your DataCenter – Get Your Hands Dirty
·         Choosing the Right DataCenter for You
·         DataCenter Physical Infra Management (EMS) Overview
·         Understanding  preventive maintenance of DataCenter
·         Basics 101 – Overview of Fan Systems
·         Basics 101 – Introduction to HVAC Systems

DataCenter Standardization
·         Standardization in DataCenter
·         Basic 101 – Fundamentals of Physical Security – ISO 27001
·         Designing DataCenter for Telecom Network (TIA 942)
·         Overview of ISO 50001: Energy Efficiency in Datacenters
·         US Energy Codes and Standards

DataCenter Operations (Server Hardware, Virtualization, Storage SAN)
·         Basics 101: Meet Server Hardware
·         Basics 101: Concept of Storage Area Networks (SAN)
·         Basics 101: Introduction to Virtualization
·         Data Center Documentation: What you consider it should have?
·         Decommission Check-list Overview
·         Tech Plans for VMware for easy Change Management
·         PowerCLI scripts –Few Useful Ones
·         what should be your checklist while taking knowledge transitions?
·         What are your possible Maintenance task while Managing a DataCenter, Storage, Server and VMware platform?

Best Practices for DataCenter
·         Best Practices – VMware ESXi and vCenter Deployment
·         Best Practices – Blade server Deployment
·         Best Practices – New VM Deployment
·         Best Practices – Storage SAN Switch Deployment
·         Best Practices – Storage Array Deployment

Tier Down
·         HP 3PAR Storage – Tier Down
·         Blade Enclosures

Solution Design Considerations
·         DataCenter Project Management overview
·         Considering DataCenter Consolidation?
·         Considering DataCenter Split?
·         Considering New DataCenter Deployment?
·         Considering Capacity Increment in DataCenter?
·         DataCenter Asset Management: A different approach
·         Do More with Virtualization: Are you exploring it?
·         Data Placement Policy for DataCenter: Case Study
·         How Virtualization can help you save money during Procurement of Servers?
·         Do I need server consolidation?
·         How to do soft-savings with VMware?
·         How to optimize your virtual environment in VMware?
·         Changing Vendor - What are your Organizational aspects of Change?
·         Risk, Assumptions and Mitigations for New Deployments

DataCenter Tools
·         MS Office – What’s really required?
·         Handy Tools

Thursday 27 November 2014

Free Cloud OpenStack Certification - Does that RING a Bell :)

Well, Yes!! Thats True!! For those of you who don't know this, A free Certification is offered by Rackspace, who was the founder for OpenStack Cloud. All it requires is for you to read 10 chapters. After each lesson you need to pass a 10 questions test (well, it allows you to correct yourself if you were wrong). Once you have cleared all 10 tests, you need to pass the final exam and HURRAYY! 

Just follow the link:
http://cloudu.rackspace.com/


You need to go through CloudU Curriculum for this. Any more questions? please write back or comment here.

Any more questions? please write back or comment here. There are more things to share.. 

Request you to join my group on Facebook & LinkedIN with name "DataCenterPro" to get regular updates. I am also available on Tweeter as @_anubhavjain. Shortly I am going to launch my own YouTube channel for free training videos on different technologies as well. 

Happy Learning!!

Now Available on Facebook

Please like the Page: https://www.facebook.com/datacenterpro

Also, available on LinkedIN as "DataCenterPro" group

Best Practices: Data Backup Strategy

While I was working with as Pre-sales Consultant with a Backup Software Company, the most common question I use to come across was "How should I backup my Data?" or "How frequent I should Backup my data?" or "what Strategy I should use to backup my Data?"

Many companies these days define a data backup policies with the retention period or type of backup during a certain period. Many of the backup strategy are defined by users say "Tower of Hanoi" or "Grand-Father-Son (GFS)", along with options like One time Full or Immediate Incremental backup, which are usually manual or with defined schedule.
However, I feel custom options, are for those, who really want to extract & use this software as efficient, as it is, to get complete ROI out of this application. Their are many dependencies or I can say "check list" before defining the a True Backup Plan. Space & Retention period plays the main role while defining the backup plan. While considering these parameters, we should also not forget what your software can do for you, like its Compression rate, around 55%, which I feel is the maximum I have seen as compare to other players in market. Apart from these, we should consider, am I looking for Data availability or Business Continuity. I should also consider RTO (Recovery Time Objectives) for the backup, the more number of backups we will have, the more time it will take to recover, which is standard for any application. These days some Backup Recovery products provide speed of around 1.3GB per minute. 
To clarify more about it, lets think of a scenario, where I have to 200GB of Data to be backed up in 2TB of Space across SAN. Now as per company policy, i have retention period of 6 months.  I am considering data is coming from a single server and we are doing the Image backup, not the files or folder. Now, the moment I will be doing 1 full backup in a month, with 4 differentials in week & daily incremental, I dont think we can achieve our target for keeping backup for 6 months, as usually differential backups are half the size of full backups. This doesn't mean option for GFS is bad, its just its the right choice in every situation. Backups are critical and part of every DataCenter compliance. They should be well defined, tested and implemented, considering the needs of the user. Backup Software is just an application, which can make wonders, but how to make wonders, its up to us. 
Coming back to the scenario, if the user would have selected the Custom Backups, with 1 full backup & daily incremental backup, it would have been enough. It will also provide me Business continuity as I will have less number of backups to be recovered and I can extract any file as well, hence have my data availability as well. 
Please Note Though sometime speed of backup also matter to us, but I am ignoring this fact since currently we are discussing the backup strategy, as speed is already pretty good as observed around 1.3GB per minute. However, we should consider the speed of backup application & connected storage speed of data transfer, when we have multiple backups running simultaneously. During that stage, we have to consider options like bandwidth allocation & de-duplication as well. 
Other Best Practices, I can think of, are as following (Please note, this is Purely on my Experience basis)
1. Do not give the passwords, wrong password attempts may corrupt the backup or can interrupt the backup operation when running a schedule. 
2. If number of machine is above 20, its better to create individual plan for group of machines. You may also consider creating different folder for backup in same location. Anyhow the backup plan waits for completing the previous plan, so like I said, its up to you, how you want your software to work.
3. Tapes are Pre-Historical :) (slow), These days backup application are New Generation (very fast), I dont prefer Tape if duration of backup recovery matters to you. Even though backup speed can reach up to 2GB per minute, but since target speed is low, data backup will be slow as well. However, they are still required but as offsite copy, should not be used as primary. 
4. Dont backup single machine on a de-duplication enabled location, since it will check for data redundancy in same location and will take time to complete backup.
5. Make sure you are using option to validate the "Full Archive" 
6. I always suggest to take backup on two different locations, if possible with no financial constraints to invest for data availability under Disaster Level 5. 
7. I suggest to take full partitions backups rather then filer or folder, since even though backup is of drive, we can still do Files & Folder recovery. Also, its much faster as does backup sector by sector and auto include any new file created in same folder location, which infact does not happen if we do files & folder. 
8. Drill for Test recovery should be done oftenly, atleast once in 3 months. 
9. Bootable CDs should be kept ready with latest version of kernel. Problems usually dont knock door before coming. We should be armed to fight against them. 
10. Notification plays important role. Make sure you are getting notification via email or SNMP. If its SNMP, make sure your Trap catcher is listening alerts. 
Feedback, Questions will be Appreciated. 
Note: This article is re-posted, earlier it was posted on vendor site. Its still a Hot thread in their community. FYI https://forum.acronis.com/forum/17555 

Any more questions? please write back or comment here. There are more things to share.. 

Request you to join my group on Facebook & LinkedIN with name "DataCenterPro" to get regular updates. I am also available on Tweeter as @_anubhavjain. Shortly I am going to launch my own YouTube channel for free training videos on different technologies as well. 

Happy Learning!!

Wednesday 26 November 2014

Journey Begins.. !!!!

After doing a survey with people across industry, I managed to create a flow  and sequence to learn different technologies in DataCenter. Now I decided NOT to re-discover or design a wheel but to re-use it wherever its applicable. However, it will be required to correlate them with real world scenarios, for which I will write articles. Below will give you an overview of our discussion topics.

To start with, we will learn the Basics about a hardware, this  will not cover in-depth of Digital Electronics, but will touch base it if required during the course . To go further, we will learn about Server Hardware, which includes the server components and a special session about Blades. There will be a separate session about Managing Blade Servers, hence will just give an overview within the first chapter. Moving forward, we will learn about DataCenter Universe, which will include Racks, Cables, Power/Cooling and PDUs. I will emphasis on important aspects, this will not cover the designing or sizing part in initial phase. While learning about Servers, we will refresh some basics of Operating system which are important from learning further technologies like file system, Networking etc. Later, we will start learning about Storage Foundations, I will try to cover most of it as mentioned in SNIA foundation curriculum.

When it comes to storage there are three dimensions which we need to learn: Storage Fabric or SAN, Storage Array and Host File system. We will cover all in our modules. For Storage Fabric, mostly I will cover up Brocade, less from Cisco. Storage Arrays have multiple vendors like HP, EMC, NetApp, Hitachi etc; obviously I wont be able to cover up all, but I will try to involve experts for which ever technology I am not able to cover.

Finally, we will deep dive into Virtualization. We will learn about VMware virtualization products. Now there are already few videos or articles about the technology or how to do it, but I will show the best practices, designing concepts, implementation risk along with some self made tools, which are actually required.

What’s there for ME?
What I feel, there is something for everyone. People who want to learn new things, or enhance existing knowledge, or take a step up in their respective domain. Anyone from Testing/Development, or Support Helpdesk, it can be even useful for IT Technical Managers or even individual contributors like Solution Architects. I have tried to touch-base real world scenarios and in corporate their issues or concerns in single place.

Overall, for few, you might already know how to do it, but you may not know, when to do it and when to use it, which I will cover here. This blog is not make help you become a specialist or an Admin, but to be a Professional!!!