Tuesday, December 1, 2015

Hyper-V Series - Introduction

I've had the pleasure (I can finally say that now) of migrating a decent sized environment from VMware ESXi/vCenter to Hyper-V/SCVMM over the past few months.  It is still in progress but the bulk of the "hard work" is done at this point and the light at the end of the tunnel is getting brighter.  The migration was much more than just switching hypervisors and management platforms.  It has consisted of redesigning our entire multi-tenant hosting platform while switching out almost every backend piece of supporting infrastructure and doing it in a "swing migration" style to keep costs relatively low.

It has been quite a ride to say the least... and by ride I mean something along the lines of what I imagine riding a turbocharged unicycle off-road, down the side of an erupting volcano while trying not to crash or get burned would be like.  Not sure where exactly that came from as I have never been on a unicycle nor have I been chased by lava, but hopefully you get the idea.  It wasn't easy, full of obstacles, and time was not exactly on my side.  I ran into numerous roadblocks along the way but overall the journey has been totally worth it (so far).  I want to share my experience and also try to help others who may be encountering the same issues I had, or better yet, prevent them from going through the pain by avoid mistakes.  Someone new to Hyper-V may not have nearly the trouble I had switching my mindset from the VMware world to the Microsoft world.  But if you are like me and have lived in VMware for the past 7 years or so, "logical networks" and "logical switches" may not seem so logical at first.

My role and overall experience with virtualization and its supporting technologies put me in the unique position of allowing me to handle this project in its entirety from initial design through post production.  It is still a work in progress and admittedly the initial design has changed a few times at this point, but ultimately the goal has remained constant:  design/build/implement a hosting platform that is easy to consume for tenants, easy to support/grow for our service team, easy to pitch as a vision for our account managers, easy to sell for our sales team, and of course, decrease costs and complexity while increasing profitability, reliability, scalability, supportability, and expanding the feature set.  Simple, right?

I'm still putting together the bullet points for this and trying to keep it from being a long ramble or collection of random thoughts that no one really cares about.  Personally, I would have been lost without the help of bloggers, youtubers and forum posters.  TechNet alone only gives part of the big picture.  Training and certification courses are another small piece.  Previous experience actually made some of it more confusing than it needed to be.  I want to give back to the community.  I hope to be able to contribute some bits of experience to help bring it all together for others.  If nothing else, it will help me reinforce the concepts for myself!

If there is anything you would like to specifically discuss, always feel free to leave a comment or find me on LinkedIn or Twitter (@johnyarbi) and send me a message.  I may not have the answer, but I try to reply to everything and may be able to point you in the right direction.  Looking forward to hearing from you!

Sunday, February 15, 2015

Acronis restore error code 0x590001

I can remember a time when every server I dealt with was physical.  Back when I physically had to get up from my desk, physically travel to a clients site, physically attach their backup storage to their crashed physical server, put a physical disc in the physical drive and use a physical keyboard and mouse to start the recovery.  Even with all that physical activity I was still overweight... but I digress...  Then, if the sun/moon/stars aligned, the driver gods were being nice, and my caffeine levels stayed steady, there was a good chance I might be able to have the client back up and functional by the time they came in to open the next day.  Ah the memories...

Thankfully, those days are long gone for me.  Now most of what I deal with is virtual.  Remote connectivity, VMware/Hyper-V, iLO/iDRAC access to the hardware console, ISO media, SAN storage, NAS backups... I don't even have to be at my physical office to do any of it.  I can't count how many times I have done work while out with friends via an RDP client on my iPhone.  Love this technology driven world when it all works!  There are still times, though, when those physical things that run the virtual world I live in need attention.

I've been working on spinning up a small Hyper-V cluster on a few older HP ProLiant DL360 G7 servers.  I took one of the servers, installed Windows Server 2012 R2, ran the gauntlet of updates to get it current, performed some tweaks, installed all the HP integrations, etc.  In the end, I had a server ready to be imaged/sysprep'ed and used as my base template for spinning up the other two servers.  Hardware was the same across all three so a simple image based backup/restore using Acronis seemed like the way to go.  Acronis is my trusty dusty for stuff like this.  P2V conversions, hardware replacement without reinstalling anything, anything to do with recovery, it is my go to tool because it just works... usually.

This case seemed pretty straightforward.  Took the image from the source server using the bootable ISO and stored it to a NAS.  Used the same ISO on the first clone to start the restore process, deleted the original volumes, selected the MBR and all the volumes (just the OS and system reserved), and started the restore.  *yawn*

At 95% I see an error... code 0x590001.  *grrr*  Actually a few errors:
  • Error code 5,832,705 (0x590001), module 89
  • Error code 1, module 89, message:  Direct R/W operation has failed
  • Error code 65521, module 0, message:  Input/output error



A quick search through the world of webs revealed a lot of suggestions for running a chkdsk but no one ever came back and said "wow that fixed it!"  Those posts usually ended with "reinstalled and went from there :'(".  Pfft... this is all a fresh build, no errors in the event logs, no problems up to this point... ain't nobody got time for that!

My source image must have been bad.  Great, I guess I didn't validate the integrity of the source image.  Repeated the process making sure I enabled validation... clean backup, same error on restore.  New media, new backup target, sector by sector backup, same problem.  Hm...  Oh!  I rebuilt the RAID array, I probably didn't set the logical disk as bootable.  Fixed that, still failed.  Oh!  I found the original disk was provisioned as GPT instead of MBR.  Man I'm getting rusty... converted to MBR... still failed.  Tested restoring the individual pieces, first the MBR... fail.  Second, the data volumes... success.....?  Ok..... so it's just the MBR restore that's failing.  Hm...  I can probably fix that manually!  Rebooted with the Windows 2012 R2 media in and ran a repair... viola!  Windows boots.

At this point I'm still not sure why Acronis is having trouble restoring the MBR.  The only thing I can think is when I built the new logical disk in the RAID array, I did not actually initialize it.  I may go back and try that and see if my results are any different.  For this lab build out, though, I'm simply going to restore the data volumes and then run a repair.  Stupid physical machines.  Moving on...