Category Archives: vManage

Massive Cisco SD-WAN Bug CSCvu29389

Early this week our SD-WAN “Powered by Cisco” started have strange issues that manifested as VRRP being stuck in INIT/INIT state. And random routers becoming unavailable over the normal data plane.

The routers had control connections and were accessible hopping through vManage because the control plane was up. The command from vmanage is request execute ssh username@vedgeIP

My setup is with 2 routers at each remote site with TLOC extensions between them.

In my configuration under the vrrp config we had the track-omp configured.
vrrp 2
priority 110
track-omp

Even with OMP routes the VRRP would not transition over to the back up router.
Show OMP routes
10 0.0.0.0/0 omp – – – – 256.257.258.259 biz-internet ipsec –
10 0.0.0.0/0 omp- – – – 256.257.258.260 public-internet ipsec


Show BFD sessions returns no output.

Thinking it was related to being stuck in init/init state we removed the configuration from all our routers. But VRRP would just stay active on the master router. We still had random sites drop the next day.

You can get access back by typing clear control connections but that is very reactive. After another day of devices disconnecting randomly we decided to reboot all our routers. Rebooting the routers fixed the issue for now. Had no random reboots the day after rebooting our entire environment.

The bug CSCvu29389 currently has no fix or patch available as of this writing. You can’t even read about this bug. Must be internal only.

Cisco ve1000 CLI software upgrade error

Ran into an issue where I pulled a ve1000 router out that was on the shelf for a couple of years and of course it would not connect to vManage.  The reason is because the old router was on 16.2 code and my vManage instance is 18.3 code.  Therefore, in this case you need to get the ve1000 into at least 17.2 version.

To get the software upgraded I needed to use the usb port to get the upgrade file onto the ve1000 router.

These are the steps I followed:

STEP 1:  Enable the usb slot on the ve1000.

vedge# conf t

Entering configuration mode terminal

vedge(config)# system

vedge(config-system)# usb-controller

vedge(config-system)# commit

The following warnings were generated:

‘system usb-controller’: For this configuration to take effect, this command will cause an immediate device reboot

Proceed? [yes,no] yes

 

STEP 2: Verify that USB controller is enabled:
vedge# show running-config system usb-controller

system

usb-controller

vedge# show hardware environment |tab

HW

DEV

HW CLASS HW ITEM INDEX STATUS MEASUREMENT

——————————————————————————————————-

Temperature Sensors DRAM 0 OK 39 degrees C/102 degrees F

Temperature Sensors Board 0 OK 35 degrees C/95 degrees F

Temperature Sensors Board 1 OK 36 degrees C/97 degrees F

Temperature Sensors Board 2 OK 34 degrees C/93 degrees F

Temperature Sensors Board 3 OK 34 degrees C/93 degrees F

Temperature Sensors CPU junction 0 OK 47 degrees C/117 degrees F

Fans Tray 0 fan 0 OK Spinning at 5040 RPM

Fans Tray 0 fan 1 OK Spinning at 4980 RPM

PEM Power supply 0 OK Powered On: yes; Fault: no

PEM Power supply 1 Down Powered On: no; Fault: no

PIM Interface module 0 OK Present: yes; Powered On: yes; Fault: no

USB External USB controller 0 OK 2 USB Ports

STEP 3: Copy the vedge mips image to the USB stick (formatted in FAT fs) [NOTE: A 2Gb USB stick works.  100Gb stick does not]

 

STEP 4: Insert the USB stick into the vedge

tail -100 /var/log/kern.log

If you tail the /var/log/kern.log, you should see these messages and the stick will be auto mounted. If it does not, please remove the USB stick, reboot the node and then when the device is backup insert the USB stick.

kern.info: Jun 21 16:05:33 vedge kernel: usb-storage 3-1:1.0: USB Mass Storage device detected

kern.info: Jun 21 16:05:33 vedge kernel: scsi3 : usb-storage 3-1:1.0

kern.notice: Jun 21 16:05:34 vedge kernel: scsi 3:0:0:0: Direct-Access     Kingston DataTraveler 2.0 1.00 PQ: 0 ANSI: 2

kern.notice: Jun 21 16:05:34 vedge kernel: sd 3:0:0:0: [sdb] 3913664 512-byte logical blocks: (2.00 GB/1.86 GiB)

kern.notice: Jun 21 16:05:34 vedge kernel: sd 3:0:0:0: [sdb] Write Protect is off

kern.debug: Jun 21 16:05:34 vedge kernel: sd 3:0:0:0: [sdb] Mode Sense: 03 00 00 00

kern.err: Jun 21 16:05:34 vedge kernel: sd 3:0:0:0: [sdb] No Caching mode page found

kern.err: Jun 21 16:05:34 vedge kernel: sd 3:0:0:0: [sdb] Assuming drive cache: write through

kern.err: Jun 21 16:05:34 vedge kernel: sd 3:0:0:0: [sdb] No Caching mode page found

kern.err: Jun 21 16:05:34 vedge kernel: sd 3:0:0:0: [sdb] Assuming drive cache: write through

kern.info: Jun 21 16:05:34 vedge kernel:  sdb: sdb1

kern.err: Jun 21 16:05:34 vedge kernel: sd 3:0:0:0: [sdb] No Caching mode page found

kern.err: Jun 21 16:05:34 vedge kernel: sd 3:0:0:0: [sdb] Assuming drive cache: write through

kern.notice: Jun 21 16:05:34 vedge kernel: sd 3:0:0:0: [sdb] Attached SCSI removable disk

vedge:vshell

vedge:~$ df -h

Filesystem      Size  Used Avail Use% Mounted on

rootfs          5.9G   35M  5.6G   1% /

none             64K     0   64K   0% /dev

/dev/sda1      1013M  116M  847M  12% /boot

/dev/loop0       64M   64M     0 100% /rootfs.ro

/dev/sda2       5.9G   35M  5.6G   1% /rootfs.rw

aufs            5.9G   35M  5.6G   1% /

tmpfs            64K     0   64K   0% /dev

shm             1.5G   24K  1.5G   1% /dev/shm

tmp             1.5G  488K  1.5G   1% /tmp

tmpfs           1.5G  120K  1.5G   1% /run

tmpfs           1.5G  120K  1.5G   1% /run/netns

/dev/sdb1       1.9G  100M  1.8G   6% /media/sdb1

Verify the code is on the USB stick and visible to the vEdge

vedge:~$ cd /media/sdb1

vedge:/media/sdb1$ dir

System\ Volume\ Information  viptela-17.2.5-mips64.tar.gzStep 5: copy the image to /home/admin

STEP 5: copy the image to /home/admin

vedge:~$ cd /media/sdb1/

vedge:/media/sdb1$ ls

System Volume Information  viptela-17.2.5-mips64.tar.gz

vedge:/media/sdb1$ cp 17.2.5-mips64.tar.gz /home/admin

vedge:/media/sdb1$ exit

STEP 6: Activate the new image

vedge# request software install /home/admin/viptela-17.2.5-mips64.tar.gz reboot

 

What do you do if you get the following error?

vedge# request software install /home/admin/viptela-17.2.10-mips64.tar.gz
gzip: invalid magic
tar: Child returned status 1
tar: Error is not recoverable:

This is very frustrating because it’s extremely vague.  The short answer is that you should re-download the IOS file and try again.  But wait and verify that the copy is complete from the USB to the /home/admin directory.  Then it will work (or at least it did for me).

Cisco Viptela vManage stuck processes

I’ve been using the Viptela product for over a year now.  It is a really good product.  It actually does what the sales people say it will do.

Ran into an issue recently where when I was applying template changes to devices.  Sometimes over a 100 and sometimes as few as 5; that a process gets stuck on vManage.

When a process is stuck you can not make any changes to existing applied templates, or even bring a device online.  The only option that I had until today was call support and have them kill the process.  Depending on who you get taking the ticket it could be a 5 minute or less wait or a couple of days.

If you need to kill a stuck process on your vManage here is the process:

To see what process is running on vmanage go to the following URL

https://<vmanage-ip>/dataservice/device/action/status/tasks

You will see something similar to the following:
{“runningTasks”:[{“userSessionUserName”:”xxxxx”,”detailsURL”:”/dataservice/device/action/status”,”tenantName”:”Provider”,”processId”:”push_feature_template_configuration-3af60b4b-3947-4ab5-b7db-cdd9dc73c88c”,”name”:”Push Feature Template Configuration”,”tenantId”:”default”,”userSessionIP”:”1.2.3.4″,”action”:”push_feature_template_configuration”,”startTime”:1522439470351,”endTime”:0,”status”:”in_progress”}]}

Look for the following: “processId”: “xxxxx” The information after the processID”: that is within the quotes is the process that is running. From the above example the process is:
push_feature_template_configuration-3af60b4b-3947-4ab5-b7db-cdd9dc73c88c

Take the process and add it to the following URL:

https://<vmanage-IP>/dataservice/device/action/status/tasks/clean?processId=

Afer the equal sign paste in your process that is stuck running. From the example above it is: push_feature_template_configuration-3af60b4b-3947-4ab5-b7db-cdd9dc73c88c

So the complete URL in this instance is:
https://<vmanage-IP>/dataservice/device/action/status/tasks/clean?processId=push_feature_template_configuration-3af60b4b-3947-4ab5-b7db-cdd9dc73c88c

You will then get the following after the process has be terminated:
{“Success”:true}