Monday, August 27, 2012

NetApp disk reallocation - not all that scary

I recently had the experience of restructuring aggregates on a production NetApp FAS2050 cluster due to some incorrect initial tray structuring. Any time I work with production data there is always an uneasy feeling, but after some research I felt pretty confident.  Here is an overview of why I had to do this and what steps are involved.

The original design had (essentially) split a few trays down the middle and assigned half the disks to each head to attempt to balance a semi-random VMware load. We quickly realized that the bottleneck would not be head resource (CPU/memory/HBA) contention but instead contention from the number of disk spindles. Six active spindles (actually 4 after RAID_DP makes parity disks) doesn't allow for much load, especially in a SATA tray.

To remedy this specific case, we decided it would make more sense to assign a tray per head instead of half a tray per head.  Controller A would get the SATA tray and controller B would get the 15K fibre channel tray, and any future trays would try to match up so that we can build larger aggregates.  The goal was to take the 6 SATA disk aggregate plus the hot spare from controller B and reassign those disks to controller A, bring them into controller A's SATA aggregate, and be left with a 12 disk aggregate and 2 hot spares.  All without losing any data of course.  Then perform the same steps to assign the FC disk tray to controller B.

So, there is not magic way to combine the aggregates that I know of without first evacuating the contents of one.  Luckily in this case, we had enough extra storage that we were able to perform Storage vMotion's and easily get the aggregate empty.  If you do not have the extra space just laying around or you do not have Storage vMotion then you may not be able to proceed.  Depending on the capacity in question and the I/O load, there are some pretty cheap ways to get a decently solid device like a ReadyNAS that could be used temporarily as an NFS datastore.  Maybe be resourceful and get a 30 day trial ReadyNAS, use a trial license for VMware so you get the Enterprise Plus feature set which includes Storage vMotion... or setup Veeam Backup and Replication in trial mode so you can replicate, then failover and failback when you are done.  Just thinking out loud here :)

Anyway, once the LUN/export/volume/aggregate is completely evacuated, you are ready to start destroying things!  Actually, I recommend doing this in phases if you have time.  First and foremost, ensure that your backups are rock solid.  Next, if you have a LUN, take it offline.  If you have an export or a directly accessed volume, take it offline.  This helps you make sure that a) you have the right object and aren't going to ruin your life because something was labeled wrong and b) nothing breaks unexpectedly.  It is very easy to bring it back online and fix the problem.  Not so easy to "undestroy" an aggregate, although it looks like it can be done.

Before you proceed, I recommend taking note of what disks you are actually wanting to migrate so that when you start reassigning disk ownership you get the correct ones.  Do this by typing:

disk show

and looking for any disks owned the original controller and in the aggregate.  Also make note of any current spares that you want to reassign.  Ensure that you get the whole disk identifier such as 1d.29 since there may be multiple disk 29's.

Once you are confident the aggregate is offline, no data is missing, and nothing is broken, now you can proceed with aggregate destruction.  If you have the NetApp System Manager, right click the offline aggregate and click Delete. Otherwise from the console of the controller that owns the target aggregate, type:

aggr status

Confirm you see it offline, then take a deep breath and type:

aggr destroy <your_aggr_name>

You will be prompted and forced to confirm that are about to destroy all the data on the aggregate and that you want to proceed. If you are comfortable with this and confident that you are deleting the correct, empty aggregate, proceed by either clicking the appropriate buttons or typing yes.  If you are not, then stop and call someone for a second opinion, then repeat.  If you delete the wrong one somehow, get NetApp on the phone immediately and hope that your backups restore if needed.  I take NO responsibility for your choices, sorry :)

So at this point, your aggregate should be gone and the disks that were assigned to the aggregate should now be marked as spare drives.  Confirm this from the console by typing:

disk show

You should see all the drives noted earlier marked as spare and still owned by the original controller.  At this point I recommend waiting again if you have time.  Once you proceed from this point the chances of recovering data from your destroyed aggregate are pretty much gone.  Reassigning disk ownership shouldn't do it, but once you add disks to the aggregate they will be zeroed and undestroy will no longer work.  Paranoid, yes.  Employed, yes :)

To reassign the disks to the other controller, login to the console of the controller that still owns the "new" spare disks and do the following:

Reference:  https://communities.netapp.com/docs/DOC-5030

Turn off disk auto assign: 

options disk.auto_assign off

Remove ownership from the disks you want to move by issuing the following command:

disk assign -s unowned disk.id1 [disk.id2 disk.id3 ...]

Now, go to the other controller (the one you want to claim ownership of the disks with) and make sure it sees the disks as not owned:

disk show -n

The disks should be listed as "Not Owned".  You can now assign the disks to the destination by typing one of the following commands (from the head you want to grant ownership to):

If you want to assign all unowned/unassigned disks to this controller:

disk assign all

If you only want to assign the ones we are working with:

disk assign disk.id1 [disk.id2 disk.id3 ...]

If you have made it this far and not ran into anything unexpected then great.  However, here is the first step where data will actually become unrecoverable. If I'm not mistaken, all the previous steps left the bits in tact.  This step will actually zero the newly added disks.  Ready?  Let's go!

Side track for a moment:  In this case, we were going to a 12 disk aggregate with 2 hot spares.  If you are going larger than that, there are specific guidelines set forth by NetApp about RAID group sizes and structure.  Please reference the following article before allocating your disks if you are dealing with more than 12 or think you will eventually be adding a tray that would give you less than 12 and would want to add them to the same aggregate:  https://communities.netapp.com/thread/1587  There are a lot of considerations so think this through such as firmware version, SATA vs SAS vs FC, future plans, etc.  I wish I could go into detail, but the previously mentioned thread covers most of the details.  Specifically note the mini articles posted by the NetApp employees pretty far down into it.  Also, here is a great write-up from the NetApp Community forum that deals with the aggregate structure and the next steps: https://communities.netapp.com/people/dgshuenetapp/blog/2011/12/19/my-first-experience-adding-disks-and-reallocation

Anyway, back to building our little 12 disk aggregate:

aggr add aggr_name -g raid_group -d disk.id1 [disk.id2 disk.id3 ...]

Note that the raid_group mentioned is the RAID group that you will add these disks to.  For small aggregates (sub 14) there is typically a RAID group called rg0.  To find out which RAID group, you will need to type:

aggr status -v

This should display something that shows "/aggr_name/plex#/rg_name".  Make note of the RAID group name.

Now...... you wait.  If you look at the status of the aggregate it will say growing.  If you look at the status of the disks they will say zeroing.  If you look at the CPU on the filer it will be higher than normal.  If you look at your watch, it will say wait :)  Do a couple of sanity checks to make sure things look good still and then go get some coffee.  Looking back at our case, the disks took about 4 hours to initialize according to the syslog.  Once they are done, they will show up in the aggregate as available space.

Now the fun part, actually reclaiming the space and increasing your volume size.  Since our case was backend storage for a VMware environment (VMFS, not NFS) we needed to increase the volume size and increase the LUN size, then increase the VMFS volume.  Since vSphere 5.0 and VMFS 5 support up to 64TB datastores now in a single LUN, we could have created one large volume and one large LUN.  We opted to keep things in less than 2TB volumes though due to some point of no return limitations with deduplication outlined here:  https://communities.netapp.com/thread/4360.  (Update:  I actually tried to create a >2TB LUN and it wouldn't let me.  I guess our FAS2050 doesn't know about VMFS5.)

Do increase the sizes, I've found that NetApp System Manager makes life much simpler.  However, for command line reference, to increase the size of the volume:

vol size /vol/name +###GB

As I said, this was a lot easier through System Manager so I used that.

For the LUN commands reference http://www.wafl.co.uk/lun/. To increase the size of the LUN via command line:

lun resize /vol/name/lun_name absolute_size

Again, System Manager made this much easier.

Go back into the command line and re-enable the disk auto_assign:

options disk.auto_assign on

Before you put additional load on the larger aggregate, I recommend running an reallocate so that the blocks will be optimized across the new disks.  See the previously mentioned article:  https://communities.netapp.com/people/dgshuenetapp/blog/2011/12/19/my-first-experience-adding-disks-and-reallocation.  If you do not perform this, your disks may grandually start to balance out, but you are not going to see the full benefits of having the new spindles without it.  A couple quick notes:  it does require free space in the volume to run (10-20% I believe), it does take a while (ours took approximately 26 hours), and it does cause high CPU and disk activity.  The performance increase was pretty significant though so I highly recommend learning more about reallocate and how/when to use it.  I will try to write a follow up article that talks a little more about this process and what to expect while it runs.

So you know have a larger aggregate, larger volume, and larger LUN.  Now, in VMware, grow the datastore.  Go to the datastore that is backed by this aggregate/volume/LUN and go to the properties of it.  You should be able to increase the disk.  Here is a VMware KB for the steps involved with that:  http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1017662.  This should just allow you to grow the VMFS volume without having to create any extents or splice things together.

That should do it!  You should now have more spindles and more capacity.  Win win!  Let me know if you find any problems with the outlined info above or if you had any success with it.

No comments:

Post a Comment