tag:blog.sentrium.io,2013:/posts Sentrium Blog 2018-07-11T17:58:42Z Sentrium Blog tag:blog.sentrium.io,2013:Post/1302233 2018-07-11T17:58:42Z 2018-07-11T17:58:42Z How to make a Metro-HA from DR?

This is indeed frequently asked question often asked in many different forms, like: Can NetApp’s DR solution automatically do site switching on DR event with a FAS2000/A200 system?

As you might guess in NetApp world, Metro-HA is called MetroCluster (or MCC) and DR called Asynchronous SnapMirror.

The question is the same sort of questions if someone would ask "Can you build a MetroCluster-like solution based on A200/FAS2000 & async SnapMirror without buying a MetroCluster?". The short answer to that question is no; you cannot do that. There are few quite good reasons for that:

  • First of all is DR & HA/Metro-HA protects from different kinds of failures, therefore designed, behave & working quite differently, though both are data protection technologies. You see MetroCluster is basically is an HA solution stretched between to sites (up to 300 km for HW MCC or up to 10km for MetroCluster SDS), it is not a DR solution
  • MetroCluster Based on another technology called SyncMirror and requires additional PCI cards & Models higher then A200/FAS2000 and there are some other requirements.

Data Protection technologies comparison

Async SnapMirror on another hand is designed to provide Disaster Recovery, not Metro-HA. When you are saying DR, it means you store point in time data (snapshots) for cases like data (logical) corruption, so you'll have the ability to choose between snapshots to restore. Moreover, the ability also meant responsibility, because you or another human must decide which one to select & restore. So, there is no "automatic" switchover to DR site with Async SnapMirror. Once you have many snapshots, it means you have many options, which means it is not easy for a program or a system to decide to which one it should switch. Also, SnapMirror provides many opportunities to backup & restore:

  • Different platforms on main & DR sites (in MCC both systems must be the same model)
  • Different number & types of drives (in MCC mirrored aggregates must be the same size & drive type)
  • Fun-Out & Cascade replicas (MCC have only two sites)
  • Replication can be done over L3, no L2 requirements (MCC only for L2)
  • You can replicate separate Volumes or entire SVM (with exclusions for some of the volumes if necessary). With MCC you replicate entire storage system config, aggregate
  • Many snapshots (though MCC can contain snapshots it switches only between Active FS on both sites).

All these options give much flexibility for async SnapMirror and mean your storage system must have a very complex logic to switch between sites automatically, long story short, it is impossible to have a single solution which gives you a logic which is going to satisfy every customer, all possible configurations & all the applications in one solution. In other words, with that flexible solution like async SnapMirror switchover in many cases done manually.


At the end of the day automatic or semi-automatic switchover is possible

At the end of the day automatic or semi-automatic switchover is possible & must be done very carefully with environment knowledge, understanding precise customer situation and customized for:

  • Different environments
  • Different protocols
  • Different applications.

MetroCluster on another hand can automatically switch over between sites in case of one site failure, but it operates only with the active file system and solves only Data Availability problem, not Data Corruption. It means if your data been (logically) corrupted by let's say a virus, in this case, MetroCluster switchover not going to help, but Snapshots & SnapMirror will. Unlike SnapMirror, MetroCluster has strict deterministic environmental requirements, and only two sites between which your system can switch plus it works only with the active file system (no snapshots) used, in this deterministic environment it is possible to determine surviving site which is to choose and switch automatically.

SVM DR

SVM DR does not replicate some of SVM’s configuration to DR site. So, you must configure it manually or prepare a script so in case of a disaster your script is going to do it for you.

Do not mix up Metro-HA (MetroCluster) & DR; those are two separate and not mutually exclusive data protection technologies: you can have both MetroCluster & DR and big companies usually have both MetroCluster & SnapMirror because they have budgets & business approval for that. The same logic applies not only to NetApp systems but for all storage vendors.

The solution

In this particular case, with FAS2000/A200 you can have only DR, so manual mount to hosts must be done on the DR site after a disaster event on primary site, though it is possible to set up & configure your own script with logic suitable for your environment which switches between sites automatically or semi-automatically. For this purpose thing like NetApp Work Flow Automation & Backup/Restore ONTAP SMB shares with PowerShell script can help to do the job. Also, you might be interested in VMware SRM + NetApp SRM plugin configuration, which can give you a relatively easy solution to switch between sites.






]]>
tag:blog.sentrium.io,2013:Post/1290532 2018-06-04T01:46:20Z 2018-06-10T23:56:49Z A very quick article about a customer who has NetApp storage systems

As title stands, this will be a very quick article about a customer who has NetApp FAS systems.

This customer in 2014 bought their first two FAS3220 systems with NSE encryption and at that time with 7-Mode ONTAP.

Then in 2015 they bought one AFF8040 and one FAS8040 (HDD, and made it "Hybrid" with adding few SSDs tooled from AFF8040 system) both also with NSE encryption and ONTAP cDOT formed in a single cluster.

Then they migrated all their VMware infrastructure to new storage systems, upgraded old systems with ONTAP cDOT & joined old but upgraded systems to the cluster with FAS8040 & AFF8040 and moved back some of slow workloads back on 3240 non-disruptively but now with NetApp’s LUN move & Volume move, way faster than it would be done with VMware Storage vMotion.

And then in 2017 they bought AFF A700 without encryption. All systems happily working, monitored and managed under a single cluster and data during its life cycle non-disruptively migrated across all the nodes, while they got at least 2:1 data reduction on AFF systems (Cross-Volume Deduplication is not enabled yet) and 1.5:1 on hybrid & HDD-only systems.

Now in 2018 after 4 years since they got first FAS system they thinking to throw old FAS3220 controllers away, buy new low-end FAS2700 controllers (which probably will be same or faster then 3220) and connect old disk shelves to them with simply using MiniSAS HD to QSFP cable adapter. Then as always, connect all FAS & AFF systems to a single cluster again and be able to upgrade to 9.3 and farther, and be able to utilize new ONTAP functionality like FabricPool or Inline Aggregate Deduplication.

And now with A700 with simple ONTAP upgrade they will be able to use FC-NVMe with existing cluster when they are ready.

I have three rhetorical questions to take away:

  1. Which storage system vendor would allow you to keep in a single cluster: nodes from different models (3240, 8040, A700 and 2700); have Low-End, Mid-Range & High-End systems in a cluster; Have different types of systems (All Flash, HDD & Hybrid); have different generations (4 generations: from 3240 to 2700); some of the systems with encryption some without?
  2. Which storage system vendor would allow you to: upgrade same hardware with huge major software brake-trough release (Which was move from 7-Mode to cDOT); allow you to reconnect your old disk shelves between Low-End, Mid-range & High-End systems; allow you to reconnect old disk shelves to different generations & models?
  3. Can you name one?
]]>
tag:blog.sentrium.io,2013:Post/1254634 2018-03-01T12:51:11Z 2018-03-01T12:51:12Z Customizing Ubiquiti USG configuration with JSON just got easier

Ubiquiti USG (Unified Service Gateway) is a router and firewall appliance that is closely related to the EdgeMax product line, even though it's marketed as a part of the UniFi product family and focused on a different market segment.

It is meant to be managed by a Unifi controller, which overwrites the settings so configuring it in the traditional way will not make the changes permanent. Since it's hardware is closely related to the EdgeMax routers and its software is derived from EdgeOS, using only the limited configuration options available from the Unifi controller GUI would leave its capabilities seriously underutilized.

Fortunately, the controller allows customizing the config with JSON files, and the USG OS includes a tool for exporting the config from JSON. When a customer wanted to configure QoS for VoIP traffic in a way that is not supported by the controller, we had to look deeper into it, and found that the process is not nearly as easy as we hoped it will be.

The workflow is the following:

  1. Add the required configuration directly on the USG (with set commands as in the usual EdgeOS)
  2. Export it to JSON with mca-ctrl -t dump-cfg
  3. Extract the relevant sections from that JSON and put them on the controller

The last part is the offender here. Unlike the "show" command, or the cli-shell-api tool, the tool for exporting the config to JSON (mca-ctrl) does not allow exporting only a part of the configuration. Moreover, the controller is capable of merging configuration files, but not loading a partial configuration, so the JSON dict representing the configuration should include all levels (for example, if you add just one firewall rule, the config should have "firewall", "name" etc. levels in it to work nonetheless).

The official guideline suggests picking the relevant parts from the JSON config by hand. Obviously, this is a tedious and error-prone process, so I started looking for a way to automate it.

Luckily, the USG OS has Python installed. It's Python 2.7 while I would prefer to see Python 3 there, but it still has a JSON parser and formatter in its standard library, so it was good enough for the job.

So, wrote a script that takes a list of configuration paths as arguments and automatically export them from the JSON generated by mca-ctrl into a single JSON object that should be ready for deployment on the controller.

A simple example of how to run it and what the output is like:

ubnt@ubnt# ./usg-config-export.py "service ssh" "system offload"
{
    "system": {
        "offload": {
            "ipv6": {
                "forwarding": "enable", 
                "vlan": "enable"
            }, 
            "ipv4": {
                "forwarding": "enable", 
                "pppoe": "enable", 
                "vlan": "enable"
            }, 
            "ipsec": "enable"
        }
    }, 
    "service": {
        "ssh": {
            "port": "22", 
            "protocol-version": "v2"
        }
    }
}

It supports unlimited number of configuration paths, not just two.

The script can be trivially extended to support taking JSON from a file or stdin in addition to running mca-ctrl on its own, if anyone wants it, let me know and I'll add it. Let me know if you have any problems with the script as well.

Installation

You can get the script from my github repo. Just copy it to your USG (with scp or otherwise), chmod +x it, and it's ready to run. Hope it saves you quite a bit of time if you also need to customize your USG configuration.

]]>
Daniil Baturin
tag:blog.sentrium.io,2013:Post/1254399 2018-02-28T20:05:30Z 2018-03-02T22:44:06Z Ethernet port aggregation and load balancing with ONTAP

 

Abstraction

For a small company it is quite common to have two of four servers, two switches which often supports Multi-chassis EtherChannel and a low-end storage system. It is quite important for such companies to fully utilize their infrastructure and thus all available technologies and this article will describe one aspect how to do this with ONTAP systems. Usually there is no need to dig too deep in to LACP technology but to those who wants to, welcome to this post.

It is important not just to tune and optimize one part of your infrastructure but whole stack to achieve the best performance. For instance, if you will optimize only network then storage system might become a bottleneck in your environment and vice versa.

Majority of modern servers have on-board 1 Gbps or even 10 Gbps Ethernet ports.

In some of the old ONTAP storage systems like FAS255X and more modern FAS26XX have 10Gbps on-board ports.  In this article I will focus on example with a FAS26XX system with 4x 10Gbps ports on each node and two servers with 2x 10Gbps ports and a Cisco switch with 10Gbps ports and support for Multi-chassis EtherChannel. But this article would apply to any small configuration.

Scope

So, we would like to be able to fully utilize network bandwidth in storage system and servers and prevent any bottlenecks. One way to do this is to use iSCSI or FCP protocols which have built-in load balancing and redundancy thus in this article we will overview protocols which do not have such an ability, like CIFS and NFS. Why would users be interested in those NAS protocols which doesn’t have built-in load balancing and redundancy? Because NAS protocols have file granularity and file visibility from ONTAP perspective and in combination in many cases give more agility then SAN protocols wile network “features” of NAS protocols could be easily enough fixed with functionality of network switches build-in nearly in any switch. Of course, technologies not magically work, and, in each approach, there are some nuances and considerations.

In a lot of cases users would like to use both SAN and NAS on top of single pare of Ethernet ports with ONTAP systems and for this reason first thing you should consider is NAS protocols with load balancing and redundancy and only then adapt SAN connection to it. NAS protocols with SAN on top of Ethernet ports often case for customers with smaller ONTAP systems where number of ethernet ports is limited.

Also, in this article I will avoid technologies like vVols over SAN, pNFS, dNFS and SMB multichannel. I would like to write about VVol in another dedicated article while it is not related to NAS or SAN protocols directly but can be part of the solution which provide on one hand file granularity and on another hand can use NFS or iSCSI, where iSCSI could natively load-balance traffic across all available network paths.  pNFS unfortunately currently supported only with RedHat/CentOS systems for enterprise environments, not wide spread and does not provide native load balancing because NFS Trunking currently in draft while SMB multichannel currently not supported with ONTAP 9.3 itself.

In this situation we have few configurations left.

  • One is to use solely NAS protocols with Ethernet port aggregation
  • Another one is to use NAS protocols with Ethernet port aggregation and SAN on top of aggregated ports, which could be divided in two subgroups:
    • Or where you are using iSCSI as SAN protocol
    • Where you are using FCoE as SAN protocol
    • Native FC protocol require dedicated ports and could not work over ethernet ports

    Even though FCoE on top of aggregated Ethernet ports with NAS is possible network configuration with ONTAP system, I will not discuss it in this article because FCoE is supported only with expensive converged network switches like Nexus 5000 or 7000 thus not scope of interest of small companies. Though FC and FCoE provide quite compatible performance, load balancing and redundancy with ONTAP systems (with other vendors it could be different) so there is no reason to pay more.

    NAS protocols with ethernet port aggregation

    Both variants: NAS protocols with ethernet port aggregation and NAS protocols with ethernet port aggregation with iSCSI on top of aggregated ports will have quite similar network configuration and topology. And this is the configuration I will discuss in this article.

    Theoretical part

    Unfortunately, ethernet load balancing works not sophisticated as in SAN protocols in a quite simple way.  I personally even would call it load distribution instead of load balancing because ethernet not paying attention to “balancing” part and not actually trying to evenly distribute load across links instead it just distributing load hoping that there would be plenty of network nodes generating read and write threads and simply because of Probability theory load would be more or less evenly distributed. The less nodes in the network, the less network threads, the les probability that each network link will be evenly loaded across network links and vice versa.

    The simplest algorithm for ethernet load balancing sequentially picks one of the network link for each new thread, one by one. Another algorithm uses hash sum from network address of sender and recipient to peek one network link in the aggrege.  Network address could be IP address or MAC address or something else. And this small nuance plays very important role in this article and your infrastructure. Because in case if for to pare of source and destination addresses hash sum will be same, then algorithm will use same link in the aggregate. In another word it is important to understand how load balancing algorithm works to ensure that combinations of network addresses would be such that you not only will get redundant network connectivity but also to ensure you will utilize all network links. Especially it become important for small companies with few participants in their network.

    It is quite often that 4 servers could not fully utilize 10Gbps links but during peak utilization it is important to evenly distribute network threads between links.

    Typical network topology and configuration for small companies

    In my example we have 2 servers, 2 switches and one storage system with two storage nodes running ONTAP 8.3 or higher with next configuration, and also keep in mind:

    From a storage node two links goes one to first switch, another link to second switch

    Switches configured with technologies like vPC (or similar) or switches are stacked

    Switches configured with Multi-chassis EtherChannel/PortChannel technology, so two links from server connected to two switches aggregated in a single EtherChannel/PortChannel. Links from a storage node connected to two switches aggregated in a single EtherChannel/PortChannel.

    • LACP with IP load balancing configured over EtherChannel
    • 10Gbps switch ports connected to servers and storage configured with Flowcontrol = disable
    • Storage system ports and server ports configured with Flowcontrol = disable (none)
    • 4 links on first storage node aggregated in a single EtherChannel (ifgroup) with configured LACP (multimode_lacp), same with second storage node. In total two ifgroup, one on each storage node
    • Same NFS VLAN created on top of each ifgroup, one on first storage node, second on second storage node
    • On each of two NFS VLAN created 2x IP addresses, 4 in total on two storage nodes
    • Storage nodes each have at least one data aggregate created out of equal number of disks, for example each aggregate could be:
      • 9 data + 2 parity disks and 1 hot spare
      • 20 data + 3 parity disks and 1 hot spare
    • Volumes on top of data aggregates configured as:
      • Either one FlexGroup spanned on all aggregates
      • Or 2 volumes on each storage node - 4 total, which is minimal and sufficient
    • Each server has two 10Gbps ports, one port connected to one switch, second port to second switch
    • On each server 2x 10Gbps links aggregated in EtherChannel with LACP
    • Jumbo frame enabled on all components: storage system ports, server ports and switch ports
    • Each volume mounted on each server as a file share, so each server will be able to use all 4 volumes

    Minimum number of volumes for even traffic distribution is pretty much determined by biggest number of links from either a storage system or a server, in this example we have 4 ports on each storage nodes, which means we need 4 volumes total. In case if you have only 2 network links from each server and two from a storage system node, I will still suggest keeping at least 4 volumes which is good not only for network load balancing but also for storage node CPU load balancing. In case of FlexGroup it is enough to have only one such a group but keep in mind it is currently not optimized for high metadata workloads like virtual machines and data bases.

    One IP addresses for each storage node with two or four links on each node in configurations with two or more hosts each with two or four links and with one IP addresses for each host, almost always enough to provide even network distribution. But with one IP address for each storage node and one IP address for each host, even distribution could be achieved in perfect scenarios where each host will access each IP address evenly what on practice hard to achievable, quite hard to predict, and it could change with time. So, to increase probability of more even network load distribution we need to divide traffic in more threads and the only way to do this with LACP is to increase number of IP addresses. Thus, for small configurations with two of four hosts and two storage nodes each with 2x IP addresses instead of one will help to increase probability of more even network traffic distribution across all network links.

    Unfortunately, conventional NAS protocols do not allow hosts to recognize a file share mounted with different IP addresses as a single entity. So, for example if we will mount an NFS file share to VMware ESXi with two different IP addresses, hypervisor will see them as two different Datastores and in case user will be interested in network link re-balancing a VM need to be migrated on a Datastore with different IP but in order to move that VM, storage vMotion will be involved even though it is the same network file share (volume).  

    Network Design

    Here is recommended and well-known network design often used with NAS protocols.

    Image #1

    But simply cabling and configuring switches with LACP doesn’t guarantee you that network traffic will be balanced across all the links in the most efficient way, well, it depends, and even if it is this can change after a while. To ensure we get maximum from both network and storage system we need to tune them a bit, to do so we need to understand how LACP and storage system works. For more network designs, including wrong designs, see slides here.

    Image #2

    LACP protocol & algorithm

    In ONTAP world nodes in a storage system for NAS protocols works nearly as they separated from each other, so you can percept them as separated servers this architecture called share-nothing. The only difference is if one storage node die second will take it’s disks, workloads and copy IP so hosts will be able to continue to work with their data as nothing happens, this called takeover in a High Availability pare; also with ONTAP you can move IP and Volumes online between storage nodes, but let’s not focus on this. Since we remember that storage nodes as independent servers LACP protocol could aggregate few ethernet ports only within a single node, so it not allows you to aggregate ports from multiple storage nodes. While with Switches we can configure Multi Chassis Ether Channel so LACP protocol will aggregate ports from few switches.

    Now LACP algorithm select only link for the next hop, one step at a time so full path from sender to recipient not established nor handled by initiator as it done in SAN. Communication between same two network nodes could be sent through one path while response could come back through another path. LACP algorithm uses hast sum of source and destination addresses to select path. The only way to ensure your traffic goes by expected paths with LACP protocol is to enable load balancing by IP or MAC addresses hash sum and then calculate hash sum result or test it on your equipment. With right combination of source and destination address you can ensure LACP algorithm will select your preferred path.

    LACP algorithm could be realized in different ways on server, switch and storage system, that’s why traffic from server to storage and from storage to server cold be put in different path.

    There are few addition important circumstances which will influence on your storage partitioning and source & destination IP address selection. There are applications which can share volumes like VMware vSphere where each ESXi host can work with multiple volumes; and configurations where volumes not shared by your applications.

    One volume & one IP per node

    Since we have two ONTAP nodes with share-nothing, and we want to fully utilize storage systems, we need to create volumes on each node and thus at least one IP on each node on top of aggregated ethernet interface. Each aggregated interface consists of two ethernet ports. In the next network designs some of the objects where not displayed (such as network links and server) to focus on some of the aspects, note all the next network designs are based on the very first image “LACP network design”.

    Design #3A 

    Let’s see the same example but from storage perspective. Let me remind you that in the next network designs some of the objects where not displayed (such as network links and server) to focus on some of the aspects, note all the next network designs are based on the very first image “LACP network design”.

    Design #3B

     

    Two volumes & one IP per node

    But some of the configurations does not share volumes between applications running on your servers. So, to utilize network all the links we need to create on each storage node two volumes: one used only by host1, second used only by host2. Volumes and connections to second node not displayed to make image simple, in reality they are existing and are symmetrical to first storage node.

    Design #4A

    Let’s see the same configuration but from storage perspective. As in previous images symmetrical part of connections are not displayed to simplify image: in this case symmetrical connections to blue buckets on each storage node not displayed but in real configuration exists.

    Design #4B 

    Two volumes & two IPs per node

    Now if we will increase number of IP, we can mount each volume over two different IP addresses. In such a scenario each mount will be percepted by hosts as two separate volume even though it is physically the same volume with same data set. In this situation often makes sense to also increase number of volumes, so each volume will be mounted with it’s own IP. Thus, we will achieve more even network load distribution across all the links, ether for shared or non-shared application configuration.

    Design #5A

    In non-Shared volume configuration each volume used by only one host. Designs 5A & 5B are quite similar and differ one from another only by how the volumes are mounted on hosts.

    Design #5B

     

    Four volumes & two IPs per node

    Now if we will add more volumes and IP addresses to our configuration where we have two applications which not share volumes and could achieve even better network load balancing across links with right combination of network share mounts. The same design could be used with application which share volumes and similar to design on image 5.

    Design #6


    For more network designs, including wrong designs, see slides here.

     

    Which design is better?

    Whether your applications using shared volumes or not, I would recommend:

    • Design #3 for environments where you have multiple independent applications, so with multiple apps you will have in total at least 4 or more volumes on each storage node.
    • Or Design #6 if you are running only one application like VMware vSphere and not planning to add new applications and volumes. Use 4 volumes per node minimum whether you have shared or non-shared volumes.

    How to ensure network traffic goes by expected path?

    This is more complex and geek stuff. In real world you can run in situation where your switch can decide to put your traffic through additional hop or hash sum from your source and destination addresses pare of two or more pare could overlap. To ensure your network traffic goes by expected path you need to calculate hash sum. Usually in big enough environments where you have many volumes, file shares and IP addresses you do not care about this because more IP you have more probability that your traffic will distribute load over your links simply because of the Probability theory. But if you care and you have small environment, you can brute force passwords IPs for your server and storage.

     

     

    Configuring ONTAP

    Create data aggregate

    ‌ cluster1::*> aggr create -aggregate aggr -diskcount 13

    Create SVM

    cluster1::*> vserver create -vserver vsm_NAS -subtype default -rootvolume svm_root -rootvolume-security-style mixed -language C.UTF-8 -snapshot-policy default -is-repository false -foreground true -aggregate aggr -ipspace Default

    Create aggregated ports

    cluster1::*> ifgrp create -node cluster1-01 -ifgrp a0a

    cluster1::*> ifgrp create -node cluster1-02 -ifgrp a0a

    Create VLANs for each protocol-mtu

    cluster1::*> vlan create -node * -vlan-name a0a-100

    I would recommend creating dedicated broadcast domains for each combination protocol-mtu. For example:

    • Client-SMB-1500
    • Server-SMB-9000
    • NFS-9000
    • iSCSI-9000

    cluster1::*> broadcast-domain create -broadcast-domain Client-SMB-1500 -mtu 1500 -ipspace Default -ports cluster1-01:a0a-100,cluster1-02:a0a-100

    Create interfaces with IP addresses

    cluster1::*> vserver create -vserver vsm_NAS -subtype default -rootvolume svm_root -rootvolume-security-style mixed -language C.UTF-8 -snapshot-policy default -is-repository false -foreground true -aggregate aggr -ipspace Default

    If you haven’t created dedicated broadcast domains, then configure fail-over policies for each protocol and assign it to LIF interface.

    cluster1::*> network interface failover-groups create -vserver vsm_NAS -failover-group FG_NFS-9000 -targets cluster1-01:a0a-100, cluster1-02:a0a-100

    cluster1::*> network interface modify -vserver vsm_NAS -lif nfs01_1 -failover-group FG_NFS-9000

    Configuring Switches

    This is the place where 90% of human error done. People often forget to add word “active” or add it to right place etc.

    Example of Switch configuration

    Cisco Catalyst 3850 in stack with 1Gb/s ports

    Note “mode active” means “multimode_lacp” in ONTAP, so each interface must have next configuration: “channel-group X mode active”, not Port-channel. Note configuration “flowcontrol receive on” depends on port speed, so if storage sends flow control, then “other side” must receive it. Note it is recommended to use RSTP, in our case with VLANs it is Rapid‐PVST+ and configure switch ports connected to storage and servers with spanning-tree portfast.

    system mtu 9198
    !
    spanning-tree mode  rapid-pvst
    !
    interface Port-channel1
     description N1A-1G-e0a-e0b
     switchport trunk native vlan 1
     switchport trunk allowed vlan 53
     switchport mode trunk
     flowcontrol receive on
     spanning-tree guard loop
    !
    interface Port-channel2
     description N1B-1G-e0a-e0b
     switchport trunk native vlan 1
     switchport trunk allowed vlan 53
     switchport mode trunk
     flowcontrol receive on
     spanning-tree guard loop
    !
    interface GigabitEthernet1/0/1
     description NetApp-A-e0a
     switchport trunk native vlan 1
     switchport trunk allowed vlan 53
     switchport mode trunk
     flowcontrol receive on
     cdp enable
     channel-group 1 mode active
     spanning-tree guard loop
     spanning-tree portfast trunk feature
    !
    interface GigabitEthernet2/0/1
     description NetApp-A-e0b
     switchport trunk native vlan 1
     switchport trunk allowed vlan 53
     switchport mode trunk
     flowcontrol receive on
     cdp enable
     channel-group 1 mode active
     spanning-tree guard loop
     spanning-tree portfast trunk feature
    !
    interface GigabitEthernet1/0/2
     description NetApp-B-e0a
     switchport trunk native vlan 1
     switchport trunk allowed vlan 53
     switchport mode trunk
     flowcontrol receive on
     cdp enable
     channel-group 2 mode active
     spanning-tree guard loop
     spanning-tree portfast trunk feature
    !
    interface GigabitEthernet2/0/2
     description NetApp-B-e0b
     switchport trunk native vlan 1
     switchport trunk allowed vlan 53
     switchport mode trunk
     flowcontrol receive on
     cdp enable
     channel-group 2 mode active
     spanning-tree guard loop
     spanning-tree portfast trunk feature

     

    Cisco Catalyst 6509 in stack with 1Gb/s ports

    Note “mode active” means “multimode_lacp” in ONTAP, so each interface must have next configuration: “channel-group X mode active”, not Port-channel. Note configuration “flowcontrol receive on” depends on port speed, so if storage sends flow control, then “other side” must receive it. Note it is recommended to use RSTP, in our case with VLANs it is Rapid‐PVST+ and configure switch ports connected to storage and servers with spanning-tree portfast.

    system mtu 9198
    !
    spanning-tree mode  rapid-pvst
    !
    interface Port-channel11
     description NetApp-A-e0a-e0b
     switchport trunk native vlan 1
     switchport trunk allowed vlan 53
     switchport mode trunk
     flowcontrol receive on
     spanning-tree guard loop
     spanning-tree portfast trunk feature
    !
    interface Port-channel12
     description NetApp-B-e0a-e0b
     switchport trunk native vlan 1
     switchport trunk allowed vlan 53
     switchport mode trunk
     flowcontrol receive on
     spanning-tree guard loop
     spanning-tree portfast trunk feature
    !
    interface GigabitEthernet1/0/1
     description NetApp-A-e0a
     switchport trunk encapsulation dot1q
     switchport trunk native vlan 1
     switchport trunk allowed vlan 53
     switchport mode trunk
     flowcontrol receive on
     cdp enable
     channel-group 11 mode active
     spanning-tree guard loop
     spanning-tree portfast trunk feature
    !
    interface GigabitEthernet2/0/1
     description NetApp-A-e0b
     switchport trunk encapsulation dot1q
     switchport trunk native vlan 1
     switchport trunk allowed vlan 53
     switchport mode trunk
     flowcontrol receive on
     cdp enable
     channel-group 11 mode active
     spanning-tree guard loop
     spanning-tree portfast trunk feature
    !
    interface GigabitEthernet1/0/2
     description NetApp-B-e0a
     switchport trunk encapsulation dot1q
     switchport trunk native vlan 1
     switchport trunk allowed vlan 53
     switchport mode trunk
     flowcontrol receive on
     cdp enable
     channel-group 12 mode active
     spanning-tree guard loop
     spanning-tree portfast trunk feature
    !
    interface GigabitEthernet2/0/2
     description NetApp-B-e0b
     switchport trunk encapsulation dot1q
     switchport trunk native vlan 1
     switchport trunk allowed vlan 53
     switchport mode trunk
     flowcontrol receive on
     cdp enable
     channel-group 12 mode active
     spanning-tree guard loop
     spanning-tree portfast trunk feature

     

    Cisco Small Business SG500 in stack with 10Gb/s ports

    Note “mode active” means “multimode_lacp” in ONTAP, so each interface must have next configuration: “channel-group X mode active”, not Port-channel. Note configuration “flowcontrol off” depends on port speed, so if storage not using flow control (flowcontrol none), then on “other side” flowcontrol must also be disabled. Note it is recommended to use RSTP and configure switch ports connected to storage and servers with spanning-tree portfast.

    interface Port-channel1
     description N1A-10G-e1a-e1b
     spanning-tree ddportfast
     switchport trunk allowed vlan add 53
     macro description host
     !next command is internal.
     macro auto smartport dynamic_type host
     flowcontrol off
    !
    interface Port-channel2
     description N1B-10G-e1a-e1b
     spanning-tree ddportfast
     switchport trunk allowed vlan add 53
     macro description host
     !next command is internal.
     macro auto smartport dynamic_type host
     flowcontrol off
    !
    port jumbo-frame
    !
    interface tengigabitethernet1/1/1
     description NetApp-A-e1a
     channel-group 1 mode active
     flowcontrol off
    !
    interface tengigabitethernet2/1/1
     description NetApp-A-e1b
     channel-group 1 mode active
     flowcontrol off
    !
    interface tengigabitethernet1/1/2
     description NetApp-B-e1a
     channel-group 2 mode active
     flowcontrol off
    !
    interface tengigabitethernet2/1/2
     description NetApp-B-e1b
     channel-group 2 mode active
     flowcontrol off

     

    HP 6120XG switch in blade chassis HP c7000 and 10Gb/s ports

    Note “trunk 17-18 Trk1 LACP” means “multimode_lacp” in ONTAP. Note configuration “flowcontrol off” not present in here which means it set to “auto” by default so if a network node connected to the switch will have disabled Flowcontrol, then switch will not use it also. Flowcontrol depends on port speed, so if storage not using flow control (flowcontrol none), then on “other side” flowcontrol must also be disabled. Note it is recommended to use RSTP and configure switch ports connected to storage and servers with spanning-tree portfast.

    # HP 6120XG from HP c7000 10Gb/s
     
    trunk 11-12 Trk10 LACP
    trunk 18-19 Trk20 LACP
     
    vlan 201
       name "N1AB-10G-e1a-e1b-201"
       ip address 192.168.201.222 255.255.255.0
       tagged Trk1-Trk2
       jumbo
       exit
    vlan 202
       name "N1AB-10G-e1a-e1b-202"
       tagged Trk1-Trk2
       no ip address
       jumbo
       exit
     
    spanning-tree force-version rstp-operation

     

    Switch trouble shooting

    Let’s take a look on switch output

                                   Rx                           Tx
    Port      Mode    | ------------------------- | -------------------------
                      | Kbits/sec  Pkts/sec  Util | Kbits/sec Pkts/sec  Util
    ------- --------- + ---------- --------- ---- + ---------- ---------- ---
    Storage
    1/11-Trk21 1000FDx| 5000      0         00.50 | 23088     7591      02.30
    1/12-Trk20 1000FDx| 814232    12453     81.42 | 19576     3979      01.95
    2/11-Trk21 1000FDx| 810920    12276     81.09 | 20528     3938      02.05
    2/12-Trk20 1000FDx| 811232    12280     81.12 | 23024     7596      02.30
    Server
    1/17-Trk11 1000FDx| 23000     7594      02.30 | 810848    12275     81.08
    1/18-Trk10 1000FDx| 23072     7592      02.30 | 410320    6242      41.03
    2/17-Trk11 1000FDx| 19504     3982      01.95 | 408952    6235      40.89
    2/18-Trk10 1000FDx| 20544     3940      02.05 | 811184    12281     81.11

    We can clearly see one of the link is not utilized. Why it happens? Because sometimes algorithm which calculates hash sum of pair source and destination generate the same value for two pairs of source and destination addresses.

    SuperFastHash in ONTAP

    Instead of ordinary algorithm widely used by hosts and switches ((source_address XOR destination_address) % number_of_links), ONTAP starting with 7.3.2 using algorithm called SuperFastHash which gives more dynamic, more balanced load distribution for a big number of clients, so each TCP session associated with only one physical port.

    The ONTAL-LACP algorithm available at github under BSD license. Though I did my best to make it precise and fully functional, I do not give any guarantees, so you can use it AS IS.

    You can use online compiler. You need to find storage IP with the biggest number in “SUM Total Used” column.

    This compiler will give you result what physical port will be picked up depending on source and destination address.

    Let’s create a table for network Design #4A using output from out simple code. Here is output example

    With next variables:

        st_ports = 2;

        srv_ports = 2;

        subnet = 53;

        src_start = 21;

        src_end = 22;

        dst_start = 30;

        dst_end = 50;

    Output:

           ¦NTAP       %  ¦NTAP       %  ¦Srv        %  ¦ SUM¦
           ¦OUT      |Path¦IN       |Path¦IN&O     |Path¦Totl¦
       IP  ¦  21|  22|Used¦  21|  22|Used¦  21|  22|Used¦Used¦
     53.30 ¦   1|   0|  75|   1|   0|  75|   1|   0| 100|  83|
     53.31 ¦   1|   1|  37|   0|   1|  62|   0|   1| 100|  66|
     53.32 ¦   0|   1|  75|   1|   0|  75|   1|   0| 100|  83|
     53.33 ¦   0|   1|  75|   0|   1|  75|   0|   1| 100|  83|
     53.34 ¦   0|   1|  75|   1|   0|  75|   1|   0| 100|  83|
     53.35 ¦   0|   0|  37|   0|   1|  62|   0|   1| 100|  66|
     53.36 ¦   1|   0|  75|   1|   0|  75|   1|   0| 100|  83|
     53.37 ¦   1|   0|  75|   0|   1|  75|   0|   1| 100|  83|
     53.38 ¦   0|   0|  37|   1|   0|  62|   1|   0| 100|  66|
     53.39 ¦   0|   1|  75|   0|   1|  75|   0|   1| 100|  83|
     53.40 ¦   1|   0|  75|   1|   0|  75|   1|   0| 100|  83|
     53.41 ¦   1|   0|  75|   0|   1|  75|   0|   1| 100|  83|
     53.42 ¦   1|   0|  75|   1|   0|  75|   1|   0| 100|  83|
     53.43 ¦   0|   1|  75|   0|   1|  75|   0|   1| 100|  83|
     53.44 ¦   0|   0|  37|   1|   0|  62|   1|   0| 100|  66|
     53.45 ¦   0|   1|  75|   0|   1|  75|   0|   1| 100|  83|
     53.46 ¦   1|   1|  37|   1|   0|  62|   1|   0| 100|  66|
     53.47 ¦   0|   0|  37|   0|   1|  62|   0|   1| 100|  66|
     53.48 ¦   1|   0|  75|   1|   0|  75|   1|   0| 100|  83|
     53.49 ¦   1|   0|  75|   0|   1|  75|   0|   1| 100|  83|
     53.50 ¦   1|   0|  75|   1|   0|  75|   1|   0| 100|  83|

     

    So, you can use IP addresses XXX.XXX.53.30 for your first storage node and XXX.XXX.53.32 for your second storage node at Design #4.

     

    Disadvantages in conventional NAS protocols with Ethernet LACP

    Each technology doesn’t work magically and have it’s own advantages and disadvantages, it is important to know and understand them.

    You cannot aggregate two network file shares in to one logical space as with LUNs

    If a storage vendor gives some kind of aggregation of few volumes for NAS on a storage system, data distribution often done with granularity of file-level:

    • Load distribution based on Files depends on their size and could be not equal
    • Load distribution not suitable for high metadata or high re-write workloads
    • With Ethernet LACP Full path between pears not established nor controlled by initiators
    • Each Next Step chosen individually: Path towards and backwards cold be different
    • LACP not allow you to aggregate ports from multiple storage nodes
    • No SAN ALUA-like multi pathing:
    • LACP allows to aggregate only ports in a single server or a single storage node
    • Multi-Chassis ether Chanel require special switches, though it available nearly in any switches
    • Only few switches cold be in an LACP stack. Entry-level stacked switches could be unstable which limits scalability

     

    Because of these disadvantages conventional NAS protocols with LACP usually could not achieve full network link utilization and must be tuned manually to do so. Though LACP not ideal

    • it was available for years nearly in any ethernet switch
    • it is the only best solution currently we have with conventional NAS protocols
    • it is definitely better than conventional NAS without it

     

    Advantages of NAS protocols over Ethernet

    LACP have it’s disadvantages and adds them to conventional NAS protocols which doesn’t have built-in multi pathing and load-balancing, though NAS protocols still more attractive with ONTAP because:

    NAS:

    • NAS have data visibility in Snapshots
    • More space efficient than SAN in many ways
    • File-granular access in snapshots
    • Individual file copy, no FlexClone or SnapRestore licenses needed
    • Individual file restores or clone (FlexClone or SnapRestore licenses needed)
    • Backup data mining for cataloging
    • Accessed directly on storage, no host mounting needed

    Ethernet & LACP:

    • Ethernet switches is cheaper then InfiniBand & FC
    • LACP & Multi Chassis Ether Channel available nearly with any switch
    • 1, 10, 25, 40, 50, 100 Gb/s available as single pipe
    • Multi purposes, Multi-protocol, Multi-tenancy with VLANs
    • Cheaper Multi-site: VPN, VXLAN
    • Routing on top of Ethernet available for FCoE, iSCSI, NFS, CIFS

    Looking to the future

    Though NAS protocols have their disadvantages because they do not have built-in multi pathing and load-balancing they rely on LACP. But they evolve and bit by bit copying abilities from other protocols.

    For example, SMB v3 protocol with Contiguous Availability feature can survive online IP movement between ports and nodes without disruption which is available in ONTAP, thus can be used with MS SQL & Hyper-V. Also, SMB v3 protocol supports multichannel which provides build-in link aggregation and load balancing without relying on LACP, currently not supported in ONTAP.

    NFS from the beginning was not session protocol so with IP move to another storage node application survives. Further NFS evolves and in version 4.1 get feature called pNFS which provide ability to automatically and in transparent way to switch between nodes and ports in case data been moved to follow the data similarly to SAN ALUA, which is also available in ONTAP. Version 4.1 of NFS also include session trunking feature, similarly to SMB v3 multichannel feature it will allow to aggregate links without relying on LACP, currently not supported in ONTAP. NetApp drives NFS v4 protocol with IETF, SNIA and open-source community to accept it as soon as possible.

    Conclusion

    Though NAS protocols have disadvantages, mainly because of underlying Ethernet & more precise LACP it is possible to tune LACP to mostly efficient utilize your network and storage. With big environments usually, no need for tuning but for small environments load balancing might become a bottle neck especially if you are using 1 Gb/s ports. Though it is rare to fully utilize network performance of 10Gb/s ports in small environments, but tuning is better to do at the very beginning then later on production environment. NAS protocols are file granular and since storage system run underlying FS, it can work with files and provide more abilities for thing provisioning, cloning, self-service operations and backup in many ways more agile then SAN. NAS protocols evolving and absorb abilities from other protocols, to be particular, SAN protocols like FC & iSCSI, to fully diminish their disadvantages and already provide additional capabilities to environments which can use new versions of SMB and NFS.

     

    Trouble shooting

    90% of all the problems is network configuration on the switch side, 10% other on host side. Human error. The problem often either with proper MTU configuration, LACP or Flowcontrol.







    ]]>