OVHcloud Network Status

Current status
Legend
  • Operational
  • Degraded performance
  • Partial Outage
  • Major Outage
  • Under maintenance
FS#16628 — bhs4-16a/b-n56
Scheduled Maintenance Report for Network & Infrastructure
Completed
We will upgrade this couple of nexus to release 7.1.3.N1.2
The intervention is planned for February 18, 2016, starting at 8:00 am CET ( 2:00 am EST )

The primary goal is to bug fix on 2348:
- better interop between 2348 and the Intel 10gBaseT x555/557 card
- Bug counter fex2348
- Bug fix, notably the vpc mismatch speed, but also crashes on pfstat, ptplc, port-channel, port-sec.

The update will take place in two stages:
- NXOS upgrade on the nexus and the fex (non-disruptive)
- PHY broadcom update in the fex2348 (low-level firmware that controls the eth) (disruptive)


A sh install all impact shows that it will be hitless en ISSU
bhs4-16a-n56# show install all impact system n6000-uk9.7.1.3.N1.2.bin kickstart n6000-uk9-kickstart.7.1.3.N1.2.bin

Verifying image bootflash:/n6000-uk9-kickstart.7.1.3.N1.2.bin for boot variable \"kickstart\".
[####################] 100% -- SUCCESS

Verifying image bootflash:/n6000-uk9.7.1.3.N1.2.bin for boot variable \"system\".
[####################] 100% -- SUCCESS

Verifying image type.
[####################] 100% -- SUCCESS

Extracting \"system\" version from image bootflash:/n6000-uk9.7.1.3.N1.2.bin.
[####################] 100% -- SUCCESS

Extracting \"kickstart\" version from image bootflash:/n6000-uk9-kickstart.7.1.3.N1.2.bin.
[####################] 100% -- SUCCESS

Extracting \"bios\" version from image bootflash:/n6000-uk9.7.1.3.N1.2.bin.
[####################] 100% -- SUCCESS

Extracting \"fexth\" version from image bootflash:/n6000-uk9.7.1.3.N1.2.bin.
[####################] 100% -- SUCCESS

Extracting \"fex4\" version from image bootflash:/n6000-uk9.7.1.3.N1.2.bin.
[####################] 100% -- SUCCESS

Performing module support checks.
[####################] 100% -- SUCCESS

Notifying services about system upgrade.
[####################] 100% -- SUCCESS



Compatibility check is done:
Module bootable Impact Install-type Reason
------ -------- -------------- ------------ ------
1 yes non-disruptive reset
2 yes non-disruptive rolling
100 yes non-disruptive rolling
101 yes non-disruptive rolling
102 yes non-disruptive rolling
103 yes non-disruptive rolling
104 yes non-disruptive rolling
105 yes non-disruptive rolling
106 yes non-disruptive rolling
107 yes non-disruptive rolling
108 yes non-disruptive rolling
109 yes non-disruptive rolling
110 yes non-disruptive rolling
111 yes non-disruptive rolling
112 yes non-disruptive rolling
113 yes non-disruptive rolling
114 yes non-disruptive rolling
115 yes non-disruptive rolling
116 yes non-disruptive rolling
117 yes non-disruptive rolling
118 yes non-disruptive rolling
119 yes non-disruptive rolling
120 yes non-disruptive rolling



Images will be upgraded according to following table:
Module Image Running-Version New-Version Upg-Required
------ ---------------- ---------------------- ---------------------- ------------
1 system 7.1(2)N1(1) 7.1(3)N1(2) yes
1 kickstart 7.1(2)N1(1) 7.1(3)N1(2) yes
1 bios v2.1.2(07/16/2014) v2.1.2(07/16/2014) no
1 power-seq v4.0 v4.0 no
1 fabric-power-seq v4.0 v4.0 no
2 power-seq v4.0 v4.0 no
100 fexth 7.1(2)N1(1) 7.1(3)N1(2) yes
101 fexth 7.1(2)N1(1) 7.1(3)N1(2) yes
102 fexth 7.1(2)N1(1) 7.1(3)N1(2) yes
103 fex4 7.1(2)N1(1) 7.1(3)N1(2) yes
104 fex4 7.1(2)N1(1) 7.1(3)N1(2) yes
105 fex4 7.1(2)N1(1) 7.1(3)N1(2) yes
106 fex4 7.1(2)N1(1) 7.1(3)N1(2) yes
107 fex4 7.1(2)N1(1) 7.1(3)N1(2) yes
108 fex4 7.1(2)N1(1) 7.1(3)N1(2) yes
109 fex4 7.1(2)N1(1) 7.1(3)N1(2) yes
110 fex4 7.1(2)N1(1) 7.1(3)N1(2) yes
111 fexth 7.1(2)N1(1) 7.1(3)N1(2) yes
112 fex4 7.1(2)N1(1) 7.1(3)N1(2) yes
113 fex4 7.1(2)N1(1) 7.1(3)N1(2) yes
114 fex4 7.1(2)N1(1) 7.1(3)N1(2) yes
115 fex4 7.1(2)N1(1) 7.1(3)N1(2) yes
116 fex4 7.1(2)N1(1) 7.1(3)N1(2) yes
117 fex4 7.1(2)N1(1) 7.1(3)N1(2) yes
118 fex4 7.1(2)N1(1) 7.1(3)N1(2) yes
119 fex4 7.1(2)N1(1) 7.1(3)N1(2) yes
120 fex4 7.1(2)N1(1) 7.1(3)N1(2) yes
1 microcontroller v0.0.0.15 v0.0.0.15 no


However, for the fex to upgrade the PHY, it must absolutely reload (it's a low-level component that doesn't tolerate hot upgrades).
We will reload the fexes one by one after the NXOS upgrade. This means a 2-3min downtime per fex.


Update(s):

Date: 2016-02-19 20:25:06 UTC
There was an error on one of the ports of the 16b:
bhs4-16b-n56# sh int status | i Fail
Eth114/1/38 server-SP-MG intFailEr trunk full auto 1G/10G
bhs4-16b-n56#

In order to launch the ISSU under the right conditions, we need to the 16b in order to reach some stability.
We will do the reload at 8:00 am CET ( 2:00 am EST ) , and then will go to the ND-ISSU

*****************

Comment by OVH - Thursday, 18 February 2016, 08:11AM

We will reload the B

*****************


Comment by OVH - Thursday, 18 February 2016, 08:28AM

The fexes are coming back up


*****************

Comment by OVH - Thursday, 18 February 2016, 08:34AM

ethpm is full, the CPU is having a bit of a hard time.
Monitoring isn't showing any failures.

bhs4-16b-n56# sh system internal processes cpu
top - 08:29:09 up 16 min, 6 users, load average: 2.84, 1.82, 0.99
Tasks: 246 total, 5 running, 240 sleeping, 0 stopped, 1 zombie
Cpu(s): 10.0%us, 4.9%sy, 0.0%ni, 84.1%id, 0.2%wa, 0.0%hi, 0.8%si, 0.0%st
Mem: 8243352k total, 3476900k used, 4766452k free, 172k buffers
Swap: 0k total, 0k used, 0k free, 1331076k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4634 root 20 0 319m 64m 15m R 93.7 0.8 4:03.39 ethpm
4091 root 20 0 281m 20m 9012 S 46.9 0.2 0:43.28 aclmgr
4630 root 20 0 317m 58m 13m S 46.9 0.7 2:31.32 ipqosmgr
4055 root -2 0 828m 98m 15m R 9.4 1.2 1:42.50 bigsurusd
4106 root 20 0 348m 26m 12m R 9.4 0.3 1:02.12 snmpd
15690 nico 20 0 3620 1536 1140 R 7.5 0.0 0:00.08 top
4616 root 20 0 394m 120m 17m R 3.7 1.5 0:47.85 afm
27 root 15 -5 0 0 0 S 1.9 0.0 0:00.77 events/0
4102 root 20 0 613m 320m 30m S 1.9 4.0 0:58.58 fwm
4260 root 20 0 828m 56m 19m S 1.9 0.7 0:13.39 netstack
4383 root 20 0 296m 34m 26m S 1.9 0.4 0:06.09 cfs
4629 root 20 0 298m 29m 12m S 1.9 0.4 0:05.98 vlan_mgr
4636 root 20 0 378m 33m 14m S 1.9 0.4 0:14.48 mcecm
bhs4-16b-n56#

SAP 175 est lie a ethpm, les buffers sont charges

bhs4-16b-n56# sh system internal mts buffers summary
node sapno recv_q pers_q npers_q log_q
sup 2534 7 0 0 0
sup 175 1 1082 0 0
sup 377 0 0 0 44
sup 480 0 1 0 0
sup 284 0 5 0 0
sup 980 5 0 0 0
sup 27 0 0 1 0
sup 351 0 0 0 1
sup 647 1 0 0 0
sup 2078 0 0 1 0
bhs4-16b-n56# sh system internal mts buffers summary
node sapno recv_q pers_q npers_q log_q
sup 175 0 1018 0 0
sup 284 0 9 0 0


*****************

Comment by OVH - Thursday, 18 February 2016, 08:41AM

bhs4-16b-n56# sh system internal mts buffers
MTS buffers in use = 941
bhs4-16b-n56#

We need to wait for the buffers to go down before launching the ND-ISSU (typically under 100)


*****************

Comment by OVH - Thursday, 18 February 2016, 09:30AM

We're launching the ND-ISSU

hs4-16b-n56# install all system n6000-uk9.7.1.3.N1.2.bin kickstart n6000-uk9-kickstart.7.1.3.N1.2.bin

Verifying image bootflash:/n6000-uk9-kickstart.7.1.3.N1.2.bin for boot variable \"kickstart\".
[####################] 100% -- SUCCESS

Verifying image bootflash:/n6000-uk9.7.1.3.N1.2.bin for boot variable \"system\".
[# ] 0%


*****************

Comment by OVH - Thursday, 18 February 2016, 09:54AM

The install crashed on the 16B, bringing down the fexes and the 16A with it ...

There's still no trace of the crash on the 16B; however, we know the reason for the 16A AFM HAP RESET.

Full downtime on the entire couple.

The 16B is UP on the new version, but not on the VPC yet.
The fexes will need to upgrade to 7.1.3, and download the new 16B images; this can take 20/30min for all the fexes.

We are with the BU and the nexus techs for analysis


*****************

Comment by OVH - Thursday, 18 February 2016, 09:54AM

bhs4-16b-n56# sh fex
FEX FEX FEX FEX Fex
Number Description State Model Serial
------------------------------------------------------------------------
100 fex100 Connected N2K-C2232TM-E-10GE SSI191604KM
101 fex101 Connected N2K-C2232TM-E-10GE SSI191802N4
102 fex102 Connected N2K-C2232TM-E-10GE SSI1916019B
103 fex103 Connected N2K-C2348TQ-10GE FOC1930R15D
104 fex104 Connected N2K-C2348TQ-10GE FOC1930R3TU
105 fex105 Connected N2K-C2348TQ-10GE FOC1930R4ZF
106 fex106 Connected N2K-C2348TQ-10GE FOC1930R42R
107 fex107 Connected N2K-C2348TQ-10GE FOC1930R14G
108 fex108 Connected N2K-C2348TQ-10GE FOC1930R12B
109 fex109 Connected N2K-C2348TQ-10GE FOC1930R439
110 fex110 Connected N2K-C2348TQ-10GE FOC1930R3ZA
111 fex111 Connected N2K-C2248TP-E-1GE FOX1851GJ1Y
112 fex112 Connected N2K-C2348TQ-10GE FOC1922R0TE
113 fex113 Connected N2K-C2348TQ-10GE FOC1930R1KV
114 fex114 Connected N2K-C2348TQ-10GE FOC1930R3X2
115 fex115 Connected N2K-C2348TQ-10GE FOC1930R41A
116 fex116 Connected N2K-C2348TQ-10GE FOC1903R163
117 fex117 Connected N2K-C2348TQ-10GE FOC1930R3ZU
118 fex118 Connected N2K-C2348TQ-10GE FOC1930R12N
119 fex119 Image Download N2K-C2348TQ-10GE FOC1930R3YJ


*****************

Comment by OVH - Thursday, 18 February 2016, 10:02AM

They're starting to go online

bhs4-16b-n56# sh fex
FEX FEX FEX FEX Fex
Number Description State Model Serial
------------------------------------------------------------------------
100 fex100 Image Download N2K-C2232TM-E-10GE SSI191604KM
101 fex101 Image Download N2K-C2232TM-E-10GE SSI191802N4
102 fex102 Online N2K-C2232TM-E-10GE SSI1916019B
103 fex103 Offline N2K-C2348TQ-10GE FOC1930R15D
104 fex104 Offline N2K-C2348TQ-10GE FOC1930R3TU
105 fex105 Image Download N2K-C2348TQ-10GE FOC1930R4ZF
106 fex106 Offline N2K-C2348TQ-10GE FOC1930R42R
107 fex107 Image Download N2K-C2348TQ-10GE FOC1930R14G
108 fex108 Image Download N2K-C2348TQ-10GE FOC1930R12B
109 fex109 Image Download N2K-C2348TQ-10GE FOC1930R439
110 fex110 Image Download N2K-C2348TQ-10GE FOC1930R3ZA
111 fex111 Online N2K-C2248TP-E-1GE FOX1851GJ1Y
112 fex112 Image Download N2K-C2348TQ-10GE FOC1922R0TE
113 fex113 Offline N2K-C2348TQ-10GE FOC1930R1KV
114 fex114 Image Download N2K-C2348TQ-10GE FOC1930R3X2
115 fex115 Image Download N2K-C2348TQ-10GE FOC1930R41A
116 fex116 Image Download N2K-C2348TQ-10GE FOC1903R163
117 fex117 Image Download N2K-C2348TQ-10GE FOC1930R3ZU
118 fex118 Offline N2K-C2348TQ-10GE FOC1930R12N
119 fex119 Offline N2K-C2348TQ-10GE FOC1930R3YJ
120 fex120 Offline N2K-C2348TQ-10GE FOC1930R1B1


*****************

Comment by OVH - Thursday, 18 February 2016, 10:26AM

All fexs are online, but we haven't retreived the ping on 800 ip instead of the 4000+
That is a weird status.

We're launching the disruptive upgrade on the A (which is currently isolated)



Comment by OVH - Thursday, 18 February 2016, 10:49AM

The explanation for the IPs that didn't ping is simple: ethpm overloaded.
The switch refused to learn the macs.

During the time it took to upgrade the A, the load on ethpm on the le B has gone down, and all the Ips are pinging.

The A is updated, we're bringing the fexes back up:

bhs4-16a-n56# sh fex
FEX FEX FEX FEX Fex
Number Description State Model Serial
------------------------------------------------------------------------
100 fex100 Online N2K-C2232TM-E-10GE SSI191604KM
101 fex101 Online N2K-C2232TM-E-10GE SSI191802N4
102 fex102 Online N2K-C2232TM-E-10GE SSI1916019B
103 fex103 Online N2K-C2348TQ-10GE FOC1930R15D
104 fex104 Online N2K-C2348TQ-10GE FOC1930R3TU
bhs4-16a-n56# sh system internal mts buffers
MTS buffers in use = 6
bhs4-16a-n56#
bhs4-16a-n56#
bhs4-16a-n56#
bhs4-16a-n56# conf te
Enter configuration commands, one per line. End with CNTL/Z.
bhs4-16a-n56(config)# int po105-109
bhs4-16a-n56(config-if-range)# no shut


*****************

Comment by OVH - Thursday, 18 February 2016, 14:35PM

Everything is UP

We're discussing the issue with Cisco.
Posted Feb 19, 2016 - 20:11 UTC