Placed

Troubleshooting VSS Failure

Posted by: admin  :  Category: CCIE, CCNP, Switching, Troubleshooting




 

VSS-Troubleshooting

 

In this article we are going to discuss how a VSS pair reacts during a failure. There are three possible failure scenarios that may occur.

  1. Link failure in the multi-chassis port channel link.
  2. Active supervisor engine failure.
  3. VSL failure.

Link failure in the multi-chassis port channel link:

Availability is not affected for those data flows that do not use the failed link. For those traffic flows that use the failed link, the effect consists of the time it takes to detect the link failure and reprogram the indices within the system.

VSS-Troubleshooting-1

When all link connected to a Cisco 6500 are failed (in this case there is only one link for each 6500), the port bundle is converted from a multi-chassis Cisco EtherChannel link to a standard Cisco EtherChannel link, and is treated as a single-homed port.

VSS-Troubleshooting-2

Remember: The supervisor engine on the active virtual switch is also responsible for programming the hardware forwarding information onto all the distributed forwarding cards across the entire Cisco Virtual Switching System. It also programs the policy feature card on the standby virtual switch supervisor engine. For these reasons, both the active and the hot-standby supervisor engine PFCs are active, and are used to perform packet lookups for centralized lookups on each chassis.
For these reasons, if a packet reaches the standby virtual switch there are two different behaviors:

  • If a packet is software switched, the packet is sent to the active virtual switch through the VSL.
  • If a packet is hardware switched, the packet is managed by the standby virtual switch.

 Active supervisor engine failure:

The standby supervisor engine can detect the failure of the active supervisor engine using one of the following methods:

  • VSL Protocol (VSLP)
  • Cisco Generic Online Diagnostics (GOLD) failure event
  • Full VSL link down

VSS-Troubleshooting-3

Upon detecting the failure of the active supervisor, the hot-standby supervisor engine performs an SSO switchover and assumes the role of the active supervisor.

VSS-Troubleshooting-4

During the transition, there is a disruption to the traffic that must transition away from the failed chassis. The duration of traffic disruption is determined by the time required to transition the role of the hot-standby supervisor engine to the active supervisor engine, and for the neighboring device to modify its path selection to the newly active chassis.

VSL failure:

The failure of a single VSL link is discovered by the active supervisor engine, either through a link-down event or through the failure of periodic VSLP messages sent across the link to check the VSL link state. Availability is not affected for those data flows that do not use the VSL.

VSS-Troubleshooting-6

The active supervisor engine discovers the failure of the “entire” VSL either through a link-down event or through the failure of the periodic VSLP messages sent across the member links to check the VSL link status. From the perspective of the active virtual switch chassis, the standby virtual switch is lost. The standby virtual switch chassis also views the active virtual switch chassis as failed and transitions to active virtual switch state through an SSO switchover. This scenario is known as a dual-active scenario and the duplication of this configuration can possibly have adverse effects to the network topology and traffic.

To avoid this disruptive scenario, you should configure one of these methods:

  • Enhanced PAgP
  • Layer 3 BFD
  • Fast Hello

In this case the Fast hello link method is implemented.

VSS-Troubleshooting-5

Upon detecting the dual-active condition, the original active chassis enters into recovery mode and brings down all of its interfaces except the VSL and nominated management interfaces. This effectively removes the device from the network.

VSS-Troubleshooting-7

You will see the following messages on the active virtual switch to indicate that a dual-active scenario has occurred:

VSS-PAIR#
Jan 23 11:57:37.647: %VSLP-SW1_SP-3-VSLP_LMP_FAIL_REASON: Te1/5: Link down
Jan 23 11:57:37.647: %VSLP-SW1_SP-2-VSL_DOWN: Last VSL interface Te1/5 went down
Jan 23 11:57:37.735: %VSLP-SW1_SP-2-VSL_DOWN: All VSL links went down while switch is in ACTIVE role
Jan 23 11:57:37.799: %LINEPROTO-SW1_SP-5-UPDOWN: Line protocol on Interface TenGigabitEthernet1/5, changed state to down
Jan 23 11:57:37.803: %LINEPROTO-SW1_SP-5-UPDOWN: Line protocol on Interface Port-channel1, changed state to down
Jan 23 11:57:37.803: %LINK-SW1_SP-3-UPDOWN: Interface Port-channel1, changed state to down
Jan 23 11:57:37.807: %LINK-SW1_SP-3-UPDOWN: Interface TenGigabitEthernet1/5, changed state to down
Jan 23 11:57:37.875: %DUAL_ACTIVE-SW1_SP-1-DETECTION: Fast-hello running on Gi1/3/24 detected dual-active condition
Jan 23 11:57:37.875: %DUAL_ACTIVE-SW1_SP-1-RECOVERY: Dual-active condition detected: Starting recovery-mode, all non-VSL and non-excluded interfaces have been shut down
VSS-PAIR(recovery-mode)#

The following messages on the standby virtual switch console indicate that a dual-active scenario has occurred:

VSS-PAIR-sdby#
Jan 23 11:57:37.647: %VSLP-SW2_SPSTBY-3-VSLP_LMP_FAIL_REASON: Te2/5: Link down
Jan 23 11:57:37.647: %VSLP-SW2_SPSTBY-2-VSL_DOWN:   Last VSL interface Te2/5 went down
Jan 23 11:57:37.651: %VSLP-SW2_SPSTBY-2-VSL_DOWN:   All VSL links went down while switch is in Standby role
Jan 23 11:57:37.651: %DUAL_ACTIVE-SW2_SPSTBY-1-VSL_DOWN: VSL is down – switchover, or possible dual-active situation has occurred
Jan 23 11:57:37.651: %PFREDUN-SW2_SPSTBY-6-ACTIVE: Initializing as Virtual Switch ACTIVE processor
Jan 23 11:57:39.559: %LINK-3-UPDOWN: Interface TenGigabitEthernet2/5, changed state to down
Jan 23 11:57:39.559: %LINEPROTO-SW2_SP-5-UPDOWN: Line protocol on Interface TenGigabitEthernet2/5, changed state to down
Jan 23 11:57:40.899: %OIR-SW2_SP-6-INSREM: Switch 1 Physical Slot 2 – Module Type LINE_CARD  removed
Jan 23 11:57:40.991: %OIR-SW2_SP-6-INSREM: Switch 1 Physical Slot 3 – Module Type LINE_CARD  removed
Jan 23 11:57:41.107: %OIR-SW2_SP-6-INSREM: Switch 1 Physical Slot 4 – Module Type LINE_CARD  removed
Jan 23 11:58:00.335: %VSLP-SW2_SP-2-VSL_DOWN:   All VSL links went down while switch is in ACTIVE role
VSS-PAIR#

This is confirmed by the show command:

VSS-Pair#show switch virtual redundancy
My Switch Id = 2
Peer Switch Id = 1
Last switchover reason = active unit removed
Configured Redundancy Mode = sso
Operating Redundancy Mode = sso

Switch 2 Slot 1 Processor Information :
———————————————–
Current Software state = ACTIVE
Uptime in current state = 0 minutes
Image Version = Cisco IOS Software, s72033_rp Software (s72033_rp-ADVENTERPRISEK9-M), Version 15.1(2)SY, RELEASE SOFTWARE (fc4)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2013 by Cisco Systems, Inc.
Compiled Wed 04-Sep-13 13:05 by prod_rel_team
BOOT = bootdisk:s72033-adventerprisek9-mz.151-2.SY.bin,12;
Configuration register = 0x2102
Fabric State = ACTIVE
Control Plane State = ACTIVE

Peer information is not available because
it is in ‘DISABLED’ state
VSS-PAIR#

When the VSL is restored, the following messages are displayed on the console and the switch in recovery mode (previous active virtual switch) reloads:

Jan 26 13:23:34.877: %DUALACTIVE-1-VSL_RECOVERED: VSL has recovered during dual-active situation: Reloading switch 1
Jan 26 13:23:34.909: %SYS-5-RELOAD: Reload requested Reload Reason: Reload Command.

VSS-Troubleshooting

After the reloading, the VSS is recovered; the control plane remains active on the previous standby virtual switch. To force a switchover use the command:

VSS-Pair#redundancy force-switchover

If you have found this lab helpful please help us keep this site running.

 

 Also check out these other assume training resources:


Leave a Reply

What is 10 + 13 ?
Please leave these two fields as-is:
IMPORTANT! To be able to proceed, you need to solve the following simple math (so we know that you are a human) :-)

*

http://s51.sitemeter.com/meter.asp?site=s51ciscolab Site Meter