rssLink RSS for all categories
 
icon_red
icon_green
icon_red
icon_red
icon_blue
icon_green
icon_green
icon_red
icon_red
icon_red
icon_orange
icon_green
icon_green
icon_green
icon_green
icon_blue
icon_green
icon_orange
icon_red
icon_green
icon_red
icon_red
icon_green
icon_red
icon_red
icon_red
icon_red
icon_orange
icon_green
 

FS#6425 — FS#10325 — filerz55.240

Attached to Project— Hosting
Incident
240plan
CLOSED
100%
The server is not responding.
We are rebooting it.
Date:  Thursday, 27 March 2014, 15:51PM
Reason for closing:  Done
Comment by OVH - Wednesday, 26 February 2014, 07:47AM

The server is back.


Comment by OVH - Wednesday, 26 February 2014, 07:48AM

We have detected a failure on the server.
We are performing a hardware check.


Comment by OVH - Wednesday, 26 February 2014, 07:49AM

The server is back to normal.


Comment by OVH - Wednesday, 26 February 2014, 07:49AM

We are replacing the server with a spare.


Comment by OVH - Wednesday, 26 February 2014, 07:50AM

The whole cluster is affected by the filer


Comment by OVH - Wednesday, 26 February 2014, 07:50AM

We are transferring the data disks in the new system.


Comment by OVH - Wednesday, 26 February 2014, 10:12AM

The server is up again.


Comment by OVH - Wednesday, 26 February 2014, 10:13AM

We continue to monitor the server to check that the problem does not recur.


Comment by OVH - Wednesday, 26 February 2014, 10:14AM

The system is unstable.
We are changing the given pool configuration.


Comment by OVH - Wednesday, 26 February 2014, 10:16AM

We have doubled the log disk redundancy and launched a verification of the whole data pool.

The service is up but remains disrupted by the operation in progress which should take over 6 hours.


Comment by OVH - Wednesday, 26 February 2014, 12:29PM

The service is still unstable for this filer, we have to disable it.
We are enabling a cluster which will be dedicate to filerz55.


Comment by OVH - Thursday, 27 February 2014, 07:36AM

The filer is having instabilities again, we are intervening


Comment by OVH - Thursday, 27 February 2014, 15:01PM

We had a series of hardware problems on the server which
created a ZFS filesystem corruption. Data is readable but
the server is unstable (the system is crashing every 30 minutes).
We're trying to restabilise the system and begin to recover the
data on a new filer. But we need to find a way to block all
automatic ZFS operations, to making the pool read-only so
that it doesn't make everything crash again.

We're also taking down the last backup stored in Roubaix
The operation would take 24 hours so to speed things up, we
retrieved the backup disks directly in Roubaix and we will go
directly to Paris with them. It will be faster.

So in 3-4 hours times, the filer and the backup data should be up.
This will give life to the 1209 websites affected by the failure.
We hope that refreshing this backup with data from the instable
filer we think we would retrieve in approx. 24 hours.
We need to look into or even patch the ZFS code to make the filer
stable at least in read-only mode.

We apologise for the trouble. Total failure of a filer is very very
rare. In this instance the backup is there, we have it so there's
nothing to worry about, and our engineers are working on the most
recent data on the filer.


Comment by OVH - Monday, 03 March 2014, 17:06PM

80% of the website have been moved to a new filer in read-write mode with the backup data. The migrations should be completed tonight.

We are also still working to recover the data from the original filer.


Comment by OVH - Wednesday, 05 March 2014, 09:28AM

99% of the accounts haves been migrated.
We are checking the remaining accounts.


Comment by OVH - Thursday, 27 March 2014, 15:51PM

All accounts have been migrated.