How to make a Metro-HA from DR?

This is indeed frequently asked question often asked in many different forms, like: Can NetApp’s DR solution automatically do site switching on DR event with a FAS2000/A200 system?

As you might guess in NetApp world, Metro-HA is called MetroCluster (or MCC) and DR called Asynchronous SnapMirror.

The question is the same sort of questions if someone would ask "Can you build a MetroCluster-like solution based on A200/FAS2000 & async SnapMirror without buying a MetroCluster?". The short answer to that question is no; you cannot do that. There are few quite good reasons for that:

  • First of all is DR & HA/Metro-HA protects from different kinds of failures, therefore designed, behave & working quite differently, though both are data protection technologies. You see MetroCluster is basically is an HA solution stretched between to sites (up to 300 km for HW MCC or up to 10km for MetroCluster SDS), it is not a DR solution
  • MetroCluster Based on another technology called SyncMirror and requires additional PCI cards & Models higher then A200/FAS2000 and there are some other requirements.

Data Protection technologies comparison

Async SnapMirror on another hand is designed to provide Disaster Recovery, not Metro-HA. When you are saying DR, it means you store point in time data (snapshots) for cases like data (logical) corruption, so you'll have the ability to choose between snapshots to restore. Moreover, the ability also meant responsibility, because you or another human must decide which one to select & restore. So, there is no "automatic" switchover to DR site with Async SnapMirror. Once you have many snapshots, it means you have many options, which means it is not easy for a program or a system to decide to which one it should switch. Also, SnapMirror provides many opportunities to backup & restore:

  • Different platforms on main & DR sites (in MCC both systems must be the same model)
  • Different number & types of drives (in MCC mirrored aggregates must be the same size & drive type)
  • Fun-Out & Cascade replicas (MCC have only two sites)
  • Replication can be done over L3, no L2 requirements (MCC only for L2)
  • You can replicate separate Volumes or entire SVM (with exclusions for some of the volumes if necessary). With MCC you replicate entire storage system config, aggregate
  • Many snapshots (though MCC can contain snapshots it switches only between Active FS on both sites).

All these options give much flexibility for async SnapMirror and mean your storage system must have a very complex logic to switch between sites automatically, long story short, it is impossible to have a single solution which gives you a logic which is going to satisfy every customer, all possible configurations & all the applications in one solution. In other words, with that flexible solution like async SnapMirror switchover in many cases done manually.

At the end of the day automatic or semi-automatic switchover is possible

At the end of the day automatic or semi-automatic switchover is possible & must be done very carefully with environment knowledge, understanding precise customer situation and customized for:

  • Different environments
  • Different protocols
  • Different applications.

MetroCluster on another hand can automatically switch over between sites in case of one site failure, but it operates only with the active file system and solves only Data Availability problem, not Data Corruption. It means if your data been (logically) corrupted by let's say a virus, in this case, MetroCluster switchover not going to help, but Snapshots & SnapMirror will. Unlike SnapMirror, MetroCluster has strict deterministic environmental requirements, and only two sites between which your system can switch plus it works only with the active file system (no snapshots) used, in this deterministic environment it is possible to determine surviving site which is to choose and switch automatically.

SVM DR

SVM DR does not replicate some of SVM’s configuration to DR site. So, you must configure it manually or prepare a script so in case of a disaster your script is going to do it for you.

Do not mix up Metro-HA (MetroCluster) & DR; those are two separate and not mutually exclusive data protection technologies: you can have both MetroCluster & DR and big companies usually have both MetroCluster & SnapMirror because they have budgets & business approval for that. The same logic applies not only to NetApp systems but for all storage vendors.

The solution

In this particular case, with FAS2000/A200 you can have only DR, so manual mount to hosts must be done on the DR site after a disaster event on primary site, though it is possible to set up & configure your own script with logic suitable for your environment which switches between sites automatically or semi-automatically. For this purpose thing like NetApp Work Flow Automation & Backup/Restore ONTAP SMB shares with PowerShell script can help to do the job. Also, you might be interested in VMware SRM + NetApp SRM plugin configuration, which can give you a relatively easy solution to switch between sites.