EPG learnng disabled

If you are getting the 1197 errors in your fabric then the ACI fabric has disabled learning on 1 or more EPGs.

In my case it was caused by MAC flapping from VMware. With the DVS health check enable (which it is by default) The DVS spams the fabric on each VLAN but with the same MAC address. This causes the fabric to disable learning to protect itself.

The VMware KB on it is:

In my case the trace had the following characteristics:

All of the non-broadcast protocol 0x8922 packets from Src: Vmware_d0:c3:a0 to Dst: Vmware_d0:07:f8 came in on encapsulating vlan-1510
The VMware broadcast 0x8922 packets were sent in untagged from Src: Vmware_d0:c3:a0 to Dst: Vmware_d0:07:f8
Then there were random vmware mac addresses trying to reach 2 specific vmware mac addresses (00:50:56:d0:44:40 and 8 bits later 00:50:56:d0:44:48) (00:50:56:d0:83:f0 and 8 bits later 00:50:56:d0:83:f8) using protocol 0x8922 and were multicasting to [unassigned multicast] and were playing Distributed Interactive Simulation (DIS) which is an IEEE standard for conducting real-time platform-level wargaming across multiple host computers and is used worldwide, especially by military organizations but also by other agencies such as those involved in space exploration and medicine.

A Bunch of 0x8922 packets being broadcast from Source: Vmware_d0:27:e8 across vlan 49, 55, 57, 59, 61, 62, 98, 107, 131, 132, 133, 138. This would cause mac flapping across the vlans.

The same source mac address broadcast without vlan tags.
There were a lot of vms responding to the source mac address in 1 using vlan 450, 451, 1402, 1209, 1212, 1213, 1223, 1230, 1402, 1424

I picked one to see if it looped. eth.addr == 00:50:56:d0:c3:a0 showed it was across the vlans. It looks like you used a specific source ip address instead of letting the switch use its node id as the last octet of the address.
The ERSPAN source can be either a specific IP or subnet prefix. If a specific source IP is configured, all leaf switches in the vPC will use the same IP address as the source IP address in the ERSPAN packet headers.
If a subnet prefix is configured, leaf switches will try to use their own node ID if possible as the last octet in the address. This allows you to differentiate between which leaf switch sent the packet to the destination ip address.

Long and the short of it is disable VMware health checks in Vcenter for the DVS that is causing the problems.

Update: 24-May-16
VMware released a document about this specific issue after we pointed it out to them.

When you have VC tunnel mode connecting into Cisco ACI, there are some scenarios you need to pay attention in order to have the right connectivity.

We conducted some testing in DCA-Lab and this is some information to help you with understanding the nature of the issue.


This problem nature is very similar to this VC advisory when working with layer 2 load balancer/bridging device.


Also same applies to vCloud Director Network Isolation (vCDNI) which is MAC-in-MAC encapsulation.




Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s