DHCP issue in tenants

Ran into an issue where my common tenant EPGs had no issues getting DHCP addresses.  All other tenants would not get DHCP address.

DHCP server is Windows 2012 R2 which supports option 82.

After working with TAC it appears the sub option 5 needs to be configured for my ACI tenants to get addresses.

I’ll let you all know when I get it working.

vMotioned VMs dropping off the network

When a server is vMotioned to another blade chassis the server can connect to other devices within the EPG but not outside the EPG.

This was occurring for LINUX and Windows servers.

The quick and easy fix is to bounce the network interface on the LINUX servers.  On Windows servers this did not always fix the problem.

What is really happening is that the endpoint location is not being updated in the COOP table on the spines correctly.  And get this it’s a known bug with no fix at the moment.  https://bst.cloudapps.cisco.com/bugsearch/bug/CSCva72341/?reffering_site=dumpcr

So how do you fix it inside the fabric?

On your boarder leaves run the following command on both of them as close to the same time as possible.

leaf1# bash

leaf1# clear system internal epm endpoint key vrf YOURVRFHER:VRFNAME ip IPADDRESS

To verify that the VPC leaf is actually passing the traffic correctly use the following steps:

Rrun the following ELAM on the two leaves that the device is connected to see if ARP packets are coming in and see if the “status” triggered. You would have to do it on both leafs at same time because it’s in vpc.

1. vsh_lc

2. debug platform internal ns elam asic 0

3. trigger reset

4. trigger init ingress in-select 3 out-select 0

5. set outer l2 dst_mac ffff.ffff.ffff src_mac YOUR DEVICE MAC ADDRESS HERE

6. start

7. status < — to see if it triggered or stays as Armed //Armed means no traffic has meet what was defined in step 5

8. report | egrep “ce_|ar_”

Watch out for DOCKER hosts

Had an issue with endpoint learning that was perplexing.  I traced the MAC address to a VM that was running DOCKER.

Interestingly enough the IP address that I did the show endpoint for does not exist in the fabric.  I masked the IP addresses so they are not the actual IPs but you’ll see the results.

Leaf_105# show endpoint ip 10.299.66.16
Legend:
O – peer-attached H – vtep a – locally-aged S – static
V – vpc-attached p – peer-aged L – local M – span
s – static-arp B – bounce
+———————————–+—————+—————–+————–+————-+
VLAN/ Encap MAC Address MAC Info/ Interface
Domain VLAN IP Address IP Info
+———————————–+—————+—————–+————–+————-+
105 vlan-1615 0050.56bf.30d7 LV po7
common:CM_Primary_PN vlan-1615 10.299.38.20 LV po7
common:CM_Primary_PN vlan-1615 172.299.221.37 LV po7
common:CM_Primary_PN vlan-1615 172.299.221.38 LV po7
common:CM_Primary_PN vlan-1615 172.299.49.19 LV po7
common:CM_Primary_PN vlan-1615 10.300.112.19 LV po7
common:CM_Primary_PN vlan-1615 10.300.88.40 LV po7
common:CM_Primary_PN vlan-1615 10.300.88.33 LV po7
common:CM_Primary_PN vlan-1615 10.299.38.24 LV po7
common:CM_Primary_PN vlan-1615 10.299.66.110 LV po7
common:CM_Primary_PN vlan-1615 172.299.213.70 LV po7
common:CM_Primary_PN vlan-1615 172.299.223.71 LV po7
common:CM_Primary_PN vlan-1615 172.299.213.96 LV po7
common:CM_Primary_PN vlan-1615 10.300.156.71 LV po7
common:CM_Primary_PN vlan-1615 10.300.88.20 LV po7
common:CM_Primary_PN vlan-1615 10.300.88.35 LV po7
common:CM_Primary_PN vlan-1615 172.299.222.116 LV po7
common:CM_Primary_PN vlan-1615 10.400.120.116 LV po7
common:CM_Primary_PN vlan-1615 10.300.112.32 LV po7
common:CM_Primary_PN vlan-1615 10.400.120.42 LV po7
common:CM_Primary_PN vlan-1615 10.300.9.163.106

<80 more lines of the same stuff>

Solution was to check the “enforce subnet check for IP learning” check box in the bridge domain L3 configuration tab.

BD-Setting

You can read up on DOCKER fun-ness https://docs.docker.com/v1.6/articles/networking/

This does not occur in “traditional” networks because the endpoint learning is in the hardware now and it learns IP’s many different ways.

Another ACI bug

Love being the 1st to find these 🙂
The main issue is with the new code version 1.3(1g) binding vCenter to an EPG brings up the expected screen but there is now a 2nd required field (Primary VLAN) that was not required previously.

NewVMMBug

Work around options for now:
1. create the association as dynamic.
2. include junk info, then modify it.
3. Use the REST API.

Bug ID CSCuz47137

EPG learnng disabled

If you are getting the 1197 errors in your fabric then the ACI fabric has disabled learning on 1 or more EPGs.

In my case it was caused by MAC flapping from VMware. With the DVS health check enable (which it is by default) The DVS spams the fabric on each VLAN but with the same MAC address. This causes the fabric to disable learning to protect itself.

The VMware KB on it is:
https://kb.vmware.com/selfservice/search.do?cmd=displayKC&docType=kc&docTypeID=DT_KB_1_1&externalId=2034795

In my case the trace had the following characteristics:

All of the non-broadcast protocol 0x8922 packets from Src: Vmware_d0:c3:a0 to Dst: Vmware_d0:07:f8 came in on encapsulating vlan-1510
The VMware broadcast 0x8922 packets were sent in untagged from Src: Vmware_d0:c3:a0 to Dst: Vmware_d0:07:f8
Then there were random vmware mac addresses trying to reach 2 specific vmware mac addresses (00:50:56:d0:44:40 and 8 bits later 00:50:56:d0:44:48) (00:50:56:d0:83:f0 and 8 bits later 00:50:56:d0:83:f8) using protocol 0x8922
10.7.20.11 and 10.7.20.12 were multicasting to 224.0.0.222 [unassigned multicast] and were playing Distributed Interactive Simulation (DIS) which is an IEEE standard for conducting real-time platform-level wargaming across multiple host computers and is used worldwide, especially by military organizations but also by other agencies such as those involved in space exploration and medicine.

A Bunch of 0x8922 packets being broadcast from Source: Vmware_d0:27:e8 across vlan 49, 55, 57, 59, 61, 62, 98, 107, 131, 132, 133, 138. This would cause mac flapping across the vlans.

The same source mac address broadcast without vlan tags.
There were a lot of vms responding to the source mac address in 1 using vlan 450, 451, 1402, 1209, 1212, 1213, 1223, 1230, 1402, 1424

I picked one to see if it looped. eth.addr == 00:50:56:d0:c3:a0 showed it was across the vlans. It looks like you used a specific source ip address instead of letting the switch use its node id as the last octet of the address.
The ERSPAN source can be either a specific IP or subnet prefix. If a specific source IP is configured, all leaf switches in the vPC will use the same IP address as the source IP address in the ERSPAN packet headers.
If a subnet prefix is configured, leaf switches will try to use their own node ID if possible as the last octet in the address. This allows you to differentiate between which leaf switch sent the packet to the destination ip address.

Long and the short of it is disable VMware health checks in Vcenter for the DVS that is causing the problems.

Update: 24-May-16
VMware released a document about this specific issue after we pointed it out to them.

When you have VC tunnel mode connecting into Cisco ACI, there are some scenarios you need to pay attention in order to have the right connectivity.

We conducted some testing in DCA-Lab and this is some information to help you with understanding the nature of the issue.

https://hongjunma.wordpress.com/2016/05/19/cisco-aci-integration-with-virtual-connect-tunnel-mode/

This problem nature is very similar to this VC advisory when working with layer 2 load balancer/bridging device.

http://h20564.www2.hpe.com/hpsc/doc/public/display?docId=emr_na-c02684783&sp4ts.oid=3794423

Also same applies to vCloud Director Network Isolation (vCDNI) which is MAC-in-MAC encapsulation.

 

http://www.wooditwork.com/2013/03/21/vcloud-director-network-isolation-vcdni-doesnt-work-with-hp-virtual-connect-in-tunnel-mode/

How to do a packet capture on a leaf

When I say packet capture I literally mean 1 packet that matches a specific criteria.  Why would you only want to capture 1 packet?  To show to other (mainly server folks) that the fabric is sending the packet to the correct port.

Leaf_103# vsh_lc
module-1# debug platform internal ns elam a 0

module-1(NS-elam)# trigger init ingress in-select <<< Which header to look at
3 4 5 6 7

3 Outerl2-outerl3-outerl4
4 Innerl2-innerl3-innerl4
5 Outerl2-innerl2
6 Outerl3-innerl3
7 Outerl4-innerl4

module-1(NS-elam)# trigger init ingress in-select 3 out-select 0

module-1(NS-elam-insel3)# set outer
arp ipv4 ipv6 l2 l4

module-1(NS-elam-insel3)# set outer ipv4 src_ip 10.7.38.21 ds
dscp dst_ip
module-1(NS-elam-insel3)# set outer ipv4 src_ip 10.7.38.21 dst_ip 10.7.39.50

module-1(NS-elam-insel3)# start

module-1(NS-elam-insel3)# stat
Status: Triggered
module-1(NS-elam-insel3)# report

NOTE:  Output has been greatly condensed to show only the proof that the packet is coming from the right place to the right place

GBL_C++: [INFO] ip_da: 0000000000000000000000000A072732 <<< Destination IP address in HEX
GBL_C++: [INFO] ip_sa: 0000000000000000000000000A072615 <<< Source IP address in HEX
GBL_C++: [INFO] ip_v6_hbh: 0
GBL_C++: [INFO] ce_da: 0022BDF819FF <<< Destination MAC address
GBL_C++: [INFO] ce_sa: 000C29083F09 <<< Soure MAC address

Another ACI bug initiated by me :)

This is a good one where a delete a network and the bridge domain and the route still lives in the routing table.

https://tools.cisco.com/bugsearch/bug/CSCux76657

Workaround:
Wipe the leafs that have the stale routes using the leaf-specific portion of the instructions found here:http://www.cisco.com/c/en/us/td/docs/switches/datacenter/aci/apic/sw/1-x/troubleshooting/b_APIC_Troubleshooting/b_APIC_Troubleshooting_chapter_01001.html

Just to clarify, full wipe of fabric is not required. Just wipe of the leafs that contain the “stale” route.

ACI hell part 1

When connecting access ports with static paths within an EPG that has trunking what a pain.

So basically if you have a static path binding using 802.1p then try and put an access port with 802.1p Access Untagged things may not work.

The reason is that the 802.1p Access Untagged setting it sets the vlan to 0 in the header, but it still has a vlan tag in there.  Some access devices don’t accept it because they are not  expecting a tag period.  This is especially meaningful with appliances.

If you set your mode to 802.1p Access Untagged and use the same Encapsulation VLAN tag as trunked ports, it will not work.  ACI will give you an error saying that you can’t have tagged and untagged in the same EPG.  Yet you can if you change the encapsulation VLAN ID to a different number it will work.

Remember that a VLAN in ACI is just bogus because ACI uses VXLAN, but endpoint devices care about that VLAN number.  Below is an example of 1 EPG with multiple endpoints in the same bridge domain with different VLAN encapsulations.

ACI8021P