Tag Archives: Fault 1197

EPG learnng disabled

If you are getting the 1197 errors in your fabric then the ACI fabric has disabled learning on 1 or more EPGs.

In my case it was caused by MAC flapping from VMware. With the DVS health check enable (which it is by default) The DVS spams the fabric on each VLAN but with the same MAC address. This causes the fabric to disable learning to protect itself.

The VMware KB on it is:
https://kb.vmware.com/selfservice/search.do?cmd=displayKC&docType=kc&docTypeID=DT_KB_1_1&externalId=2034795

In my case the trace had the following characteristics:

All of the non-broadcast protocol 0x8922 packets from Src: Vmware_d0:c3:a0 to Dst: Vmware_d0:07:f8 came in on encapsulating vlan-1510
The VMware broadcast 0x8922 packets were sent in untagged from Src: Vmware_d0:c3:a0 to Dst: Vmware_d0:07:f8
Then there were random vmware mac addresses trying to reach 2 specific vmware mac addresses (00:50:56:d0:44:40 and 8 bits later 00:50:56:d0:44:48) (00:50:56:d0:83:f0 and 8 bits later 00:50:56:d0:83:f8) using protocol 0x8922
10.7.20.11 and 10.7.20.12 were multicasting to 224.0.0.222 [unassigned multicast] and were playing Distributed Interactive Simulation (DIS) which is an IEEE standard for conducting real-time platform-level wargaming across multiple host computers and is used worldwide, especially by military organizations but also by other agencies such as those involved in space exploration and medicine.

A Bunch of 0x8922 packets being broadcast from Source: Vmware_d0:27:e8 across vlan 49, 55, 57, 59, 61, 62, 98, 107, 131, 132, 133, 138. This would cause mac flapping across the vlans.

The same source mac address broadcast without vlan tags.
There were a lot of vms responding to the source mac address in 1 using vlan 450, 451, 1402, 1209, 1212, 1213, 1223, 1230, 1402, 1424

I picked one to see if it looped. eth.addr == 00:50:56:d0:c3:a0 showed it was across the vlans. It looks like you used a specific source ip address instead of letting the switch use its node id as the last octet of the address.
The ERSPAN source can be either a specific IP or subnet prefix. If a specific source IP is configured, all leaf switches in the vPC will use the same IP address as the source IP address in the ERSPAN packet headers.
If a subnet prefix is configured, leaf switches will try to use their own node ID if possible as the last octet in the address. This allows you to differentiate between which leaf switch sent the packet to the destination ip address.

Long and the short of it is disable VMware health checks in Vcenter for the DVS that is causing the problems.

Update: 24-May-16
VMware released a document about this specific issue after we pointed it out to them.

When you have VC tunnel mode connecting into Cisco ACI, there are some scenarios you need to pay attention in order to have the right connectivity.

We conducted some testing in DCA-Lab and this is some information to help you with understanding the nature of the issue.

https://hongjunma.wordpress.com/2016/05/19/cisco-aci-integration-with-virtual-connect-tunnel-mode/

This problem nature is very similar to this VC advisory when working with layer 2 load balancer/bridging device.

http://h20564.www2.hpe.com/hpsc/doc/public/display?docId=emr_na-c02684783&sp4ts.oid=3794423

Also same applies to vCloud Director Network Isolation (vCDNI) which is MAC-in-MAC encapsulation.

 

http://www.wooditwork.com/2013/03/21/vcloud-director-network-isolation-vcdni-doesnt-work-with-hp-virtual-connect-in-tunnel-mode/

Advertisements