Storage Selection is Critical

Posted by Lou Person on May 01, 2011 in Cloud Journey

Selecting a storage partner was a tough decision to make. We knew that once we chose a partner, we were making a long term commitment and locking ourselves into a platform. The largest majority of cloud computing costs are in the storage. We also don’t want to oversubscribe storage and incur costs for our customers and ourselves related to resources that are yet needed. Some of the SANS we looked at came loaded with drives and storage was over provisioned. Others, allowed us to add disks as we needed.

The greatest cost of deploying a cloud solution is related to Storage. The goal is to achieve the lowest cost per terabyte possible, while ensuring all the goals and objectives of the storage are achieved. For our design, at the highest level, we were trying to accomplish two things. First, for our Premise-based Cloud Solutions (Hardware as a Service), we need to provide shared storage which will reside at the customer premise. We also want this storage to replicate to our datacenter for Business Continuity purposes. Second, we need a much larger SAN in our datacenter which will be used to receive replicate images from our customers' SANs, as well as provide Cloud Services for customers who are running services off of our Cloud. In both cases, we are hoping to find a solution which would integrate together within the same platform so we could manage everything from the same toolset. 

We are looking at Equallogic because it has all the tools built in, which makes the design and quote very easy to understand. We can do snapshots throughout the day, and we can replicate customer’s SAN’s at their premise to our Datacenter (although there are limitations on the receiving SAN about the number of downstream units that can be in the storage group). Dell was very helpful analyzing the performance data. Dell parts are very easy to understand, they are either 4000 or 6000 series and all the units come with dual controllers and all the software. A unit ending in E has 7,200 RPM drives, X has 10,000 rpm drives, and V has 15,000 RPM drives. S is solid state. A unit with a 10 in it, e.g. 6010X, means it uses 10 gigabit/second (10GBE) connectivity to the LAN. With Dell, as with any SAN decision, it comes down to Input/Output (I/O) and useable storage. The number, size and speed of drives impacts the amount of I/O per SAN node. We weren’t overly concerned about the number of drives (typically Dell comes fully loaded with 16 drives), nor the size since these variables are reflected in the available I/O per SAN. The smallest raw unit Dell has is 8TB, which seemed like overkill at first, but this does not take into account RAID sets. In order to gain performance and higher I/O, RAID 10 will yield better I/O, but will cut down on the useable storage. Equallogic does allow for RAID sets to be changed on the fly, so if we find out that we over provisioned for I/O and need extra capacity, we can change the RAID on the fly. In the demo with Dell, they actually pulled a controller out and the SAN kept running. The controller was pretty tight, it had a CPU, memory, and was all on board. I was used to a controller being a HP P4000, which is a full server. The 4000 comes with 2 1gbe port, the 6000 4 1gbe port. They can be teamed together for greater throughput, but Dell recommends not doing this and letting the SAN control the multipathing.

Unfortunately, the controller cannot be upgraded to move from 1gb/s to 10gb/s network ports . The Dell controller seems engineered to be just a SAN controller, which fits our appliance based mindset. However, everything is in one physical unit, so if there were a catastrophic failure, the entire SAN would be down. I don’t fully understand how the warranty works and if a drive fails if we have to replace it. Equallogic is very easy to manage and maintain which is a big plus.   Equallogic also requires that that SAN be purchased with all drive bays loaded.  This is could be cost prohibitive because we would need to provision storage that we may not need initially, and we would need to sacrifice drive speed for a lower cost.

We also have experience with LeftHand, now the HP P4300. LeftHand has a nice feature called “Network Raid”, such that if we wanted to write to two controllers at the same time across the network, we could. If the controllers were separated by a 1gb/s WAN link, we could replicate across sites synchronously, otherwise, it would have to be asynchronous with snapshots scheduled. HP sells controllers and “Starter SANs”, which is really two controllers bundled together. As with Dell, HP includes all the software for Snapshots, replication and management. The Network RAID feature of Lefthand is a big differentiator as the SAN is spread across physical units. This way, two controllers can be spread out across floors, mitigating a catastrophic failure. HP has a different LeftHand model for most drive sizes and speeds. There is a 42TB SAS SAN with 600GB 15K drives (P4800), 7.2TB SAS SAN with 450 GB 15K (P4300) drives and many other options along the way. We were looking closely at the P4300. If we implemented Network Raid, we would go from 7.2 raw to 3.6 raw before we configure RAID. We also had the option of doing Network Raid selectively, not on all volumes, and using the remaining space for greater I/O across the SAN. If we did not do Network Raid, LeftHand would provide greater I/O simply because there are 2 3.6 (raw) units working separately serving different data at the same time. Network Raid would yield a higher per TB cost on the LeftHand SAN, but there would be a greater peace of mind. LeftHand offers 4 or 2 GBE ports per controller, most upgradeable to 10 GBE. This is another advantage for LeftHand. These ports can be aggregated together, both on the SAN and network switch, to provide greater throughput. LeftHand SANs are really Servers running VSA (Virtual SAN Appliance) software. In my opinion, this is where the world is heading and VMWware recently announced a VSA option going into Beta. Before HP acquired LeftHand, LeftHand would run VSA on Dell Poweredges. To me it would appear that, now, the HP 4000 line is VSA running on HP Servers, bundled together as SANs. VSA runs virtually on the server. 

We reviewed EMC’s new VNX line. The VNX line provides many of the benefits of Dell and HP, but differentiates in that we can add drives as we go. This means we don’t have to oversubscribe the storage and can purchase what we need currently, minimizing our current costs. Then, as we need more storage, we can add more drives. With EMC, snapshot and replication software is a la carte, it is not bundled as it is with Dell and HP.   However, it is a simple part to add these features and the software bundles available ala carte are comprehensive.  The VNXe line includes EMC's de-duplication technology and compression technologies, which maximizes the usable storage.  We found the management interface of the VNX line one of the easiest, if not easiest, making the VNX very easy to manage.

We also have experience working with 2 host servers, Virtual Machines running on those hosts and VSA. We used local storage of the host servers to create the SAN, so the system perceived a separate SAN, but it was really VSA running on the hosts. We loaded each host up with 15,000 rpm SAS drives and put VSA on each host. We then created an active-passive configuration, using Network RAID across the hosts. Effectively, we built our own LeftHand SAN, using the same server equipment that is used with the LeftHand controllers themselves. Since we didn’t need all the capacity (Memory, CPU primarily), we were able to use those servers in a dual purpose configuration to act as hosts for the virtual machine and to run the VSA for the SAN. VSA is a great solution for taking storage from servers or slower NAS systems and adding them into the storage pool for backup or near line storage. This leverages previous investments made in storage in order to integrate that storage into the pool. 

In conclusion, it was not an easy decision to make. We carefully considered all of the factors, long-term and short-term goals, before making a decision that is in the best interests of our customers and brightstack.