Quote:
Originally Posted by solarsystems
Im in the same boat, support are unable to give me a time to when it will be back up and running. I have had a few issues where the 100% service has failied but still dont understand why the other services/servers dont take over.
|
Hi,
What you are referring to is hardware failover. This is a very easy thing to do and it just means a reboot will occur when hardware dies. It has happened both on the host side and storage side a few times this year. You don't notice it because it works.
This issue was related to the actual file system. Having redundant storage is somewhat useless if the physical file system in use has problems. They are very rare, but they can happen. We are still liasing with our storage system vendor in regards to this and it is too early for us to give an official explanation.
The actual problem was fixed in around an hour, but having that large amount of VMs suddenly lose their storage creates massive problems. Some VMs are fine, some need a reboot, some need someone watching a console to the console and some just boot up and then nothing works inside. To add to this our main vcenter (management server) had a corrupt sql database during this crisis which meant navigating around impacted VMs was very difficult.
More will follow tomorrow when we have a more official explanation.