OVHcloud Web Hosting Status

Current status
Legend
  • Operational
  • Degraded performance
  • Partial Outage
  • Major Outage
  • Under maintenance
FS#10915 — Class 4
Scheduled Maintenance Report for Web Cloud
Completed
We are currently experiencing difficulties in connecting incoming calls on one particular device.

The device in question has been rebooted.

We are monitoring the activity.

Update(s):

Date: 2014-06-16 08:09:24 UTC
The task will be followed up here
http://status.ovh.co.uk/?do=details&id=7068

Date: 2014-06-16 08:08:53 UTC
The reallocation plan has been defined.
We will launch the reorganisation of the circuits one by one to isolate the input/output.
The circuits are blocked before migration, no impact on calls in progress.

Date: 2014-06-14 00:18:53 UTC
the current status:
- The infrastructure through the VAC cleans some packets not supposed to be there.
- This weekend we will switch all circuits one by one to reallocate them to an equipment. We have 3 input/output with FT and 3 input/output with sfr. We will isolate them one of the other
- Cirpack keeps looking in the dumps which packet causes the problem
- We are checking with another manufacturer to fix a bug in its box.He should deliver patches in a few hours. If so,we will test to switch an interconnect above and see if it is stable. If it is the case, we'll ask them to unearth other 2 boxes tomorrow or Sunday then toggle 3 sfr on 3 boxes.

Date: 2014-06-13 14:00:46 UTC
The voice stream is passing correctly via the VAC.

Date: 2014-06-13 14:00:39 UTC
We passing C5B throught the VAC.

Date: 2014-06-13 14:00:29 UTC
We will vacuum the voice relay IP using the VAC in order to monitor the traffic.

Date: 2014-06-13 07:33:01 UTC
We have withdrawn the last patch fromthe class 5 infrastructures which was tested several months ago on C5C and set up this weekend on C5A and C5B.
It fixes the call loop issue but it seems that it's not freeing up the resources very well and is causing an overload.
Since yesterday's removal at around 16:30, we have not detected any congestion and thus no loss of cals. This theory needs to be confirmed tomorrow.

We are also setting up a new class 4 infrastructure for the interconnections, in order to reduce the load and spread the impact if a problem was to arise.
All the details of the intervention will be followed up in a new status update.

Date: 2014-06-12 14:12:22 UTC
Further congestion on the interconnection cards has been detected.
We will set up another class4 tonight and separate the incoming and outgoing interconnections.

Date: 2014-06-12 14:10:56 UTC
The situation seems to be stablising. The call logs and tickets are correct
and are not picking up the issues of this morning and yesterday.
The cards are communicating correctly with the controler.

We will continue to monitor the infrastructure.

Date: 2014-06-12 12:26:47 UTC
The new card has been installed.
We are redeploying the service.

Date: 2014-06-12 12:13:23 UTC
The card is being replaced.

Date: 2014-06-12 11:25:10 UTC
A card crashed and is not restarting.
The 2 new cards ordered this morning will arrive within a few minutes.

We will then move from 2 cards (yesterday morning) to 8 in order to manage the inbound and outbound calls of our telephone networks.


Date: 2014-06-12 10:41:20 UTC
All circuits are back up, the first were back up in under 2 minutes.

Date: 2014-06-12 10:41:13 UTC
Before the switchover, we rebooted the backup devices as a precaution.
Once this has been done, we will check that they are working properly and then launch the switchover operation.

Date: 2014-06-12 10:38:06 UTC
We will start the switchover to the backup devices.
The operation should take 3 minutes.


Date: 2014-06-12 09:59:26 UTC
We're also checking all VoIP infrastructure connections in the datacentre.

Date: 2014-06-12 09:59:20 UTC
The card causing the issue has been blocked, but the other cards are saturated like yesterday.
Resources are being used but with no correspondence to any calls in progress.
Various lines of enquiry are being studied by the Cirpack/OVH team.
The following actions are in progress:
- cards are currently in transit to replace the one that is randomly rebooting
- switchover of infrastructures onto scheduled redundant devices around midday when calls ease off: the aim is to exclude a potential fault on the main devices
- analysis of the IP traffic

Date: 2014-06-12 09:07:21 UTC
One of the new cards installed at night is unstable.
We are in the process of blocking the card and preparing a replacement asap with Cirpack, estimated time 3hrs max.

Date: 2014-06-12 07:42:38 UTC
All circuits have been migrated.
The resources of each card will be used in a uniform manner.

Date: 2014-06-12 07:40:39 UTC
Half the outbound circuits have been moved.
We are finishing the last half.

Date: 2014-06-12 07:40:33 UTC
The incoming call circuits are balanced on all cards.

We're moving on to the outgoing circuits.

Date: 2014-06-12 07:39:20 UTC
The migration process is very long.
The E1 forwarding needs to be totally redone and the circuits checked one by one.
20% of the migration has been completed.

Date: 2014-06-12 07:38:10 UTC
The cards have been started and configured.
They are ready to receive the E1s.


Date: 2014-06-12 07:37:21 UTC
The fibre optics have been pulled from the datacentres.
The new cards are on. We're launching the configuration and moving the links.

Date: 2014-06-12 07:36:01 UTC
We're launching the migration operations.

Date: 2014-06-11 14:44:19 UTC
Replacing the card has not resolved the issue.
We are actively working with the manufacturer to find the cause of the problem: over-consumption of resources that does not correspond to the number of simultaneous calls in progress.
We are also preparing 2 new cards to decrease the level of use on this card. These cards are already being connected up in the datacentre.
They will be installed this evening.

Date: 2014-06-11 13:11:00 UTC
We have received the new cards.

The main card that was causing problems this morning is causing problems again.
We will replace it.

Date: 2014-06-11 13:10:37 UTC
The indicators posed by us and our provider on the new interconnection are not mounting faults on the new links.
We are still monitoring how calls are forwarded.

Date: 2014-06-11 10:29:30 UTC
The circuit tests have been completed.
We have opened the traffic onto the new interconnection.

We will monitor how the situation evolves.

Date: 2014-06-11 10:24:55 UTC
For an unknown reason, our conversion gateways between the IP network and the Telecom network are not forwarding all call requests.

The new interconnection is currently being mounted and should be finished by 14:00.


Date: 2014-06-11 09:45:22 UTC
The card is in transit.
It will be installed as soon as possible once it has been received.


Date: 2014-06-11 09:15:42 UTC
This task is due to an output card issue. The side effect detected is the saturated conversion card.
It is unable to use the conversion resources and as a result, some calls outside the OVH telephone network are not going through.

Following yesterday's issue, a new card was rebooted this morning. We are actively working to resolve the problem.

Until today, w have been waiting for feedback from SFR on setting up the new interconnection (setup began in January).
Due to the current issue, the incident has become a priority on their side, the aim is to relieve the output cards.

We should have some news from them by late morning/early afternoon.

Date: 2014-06-11 08:33:00 UTC
The conversion card has an issue. We will replace it ASAP.

Date: 2014-06-11 08:32:54 UTC
The card is stable again.


Date: 2014-06-11 08:32:43 UTC
We are going to fully reboot the card.
Relaunching the applications was not enough to fix the issue.


Date: 2014-06-11 08:31:58 UTC
One of our conversion gateways between the IP network and the Telecom network
is overloaded. We will investigate.


Date: 2014-06-10 15:10:36 UTC
One of our manufacturers has reset some circuits of our interconnections in an unusual manner.

This caused one of our cards to malfunction.

The problem is not systematic, we will investigate to contain the issue in order to reboot the card.


Date: 2014-06-10 15:05:32 UTC
We have detected some congestion which is causing errors on some calls (inbound and outbound).

We are looking into the cause with the manufacturer.

Connected calls are not affected.

Date: 2014-06-05 14:29:50 UTC
It is running normally.

The monitoring system has enabled us to detect a limited number of calls in error, which is above usual for the number of calls in progress on this device.
The device has been isolated for security purposes and other devices have taken over. Errors are no longer detectable.
The traces are being analysed to determine whether it was an isolated issue on certain calls, which seems to be the case.
Posted Jun 05, 2014 - 14:25 UTC