IOS-XR / ASR9K Concepts

Introduction

One of the first things that caught my attention the first time I logged into an IOS-XR device was the characteristic prompt, something like this: RP/0/RSP0/CPU0:XRv-1#

Which would basically let you know that you’re working on the Route Switch Processor 0 (RSP0). These RSP cards are like the brain of the system in the sense that they communicate with the configuration process (CLI for instance), they request access to the fabric and transmit traffic across.

This post is intended to be a brief summary of some of the operational aspects that I found useful when working with ASR9Ks, if you are looking for more information on the hardware architecture and how everything ties together, I encourage you to look for Cisco Live sessions throughout 2013 and 2017, most of these go deep into the architecture while others focus on deployment models such as NFV and vRR. Also, the Cisco support posts from Xander Thuijs are extremely useful when troubleshooting.

The reason it is important to know how the system works is because the hardware architecture is much more exposed to the end user than other previous Cisco products, for instance, the CLI allows (and sometimes forces) the administrator to have a basic understanding of the existing Nodes or Line Cards (LC), i.e. sometimes you need to explicitly tell the system in which Node to look. This is mainly because the LCs have their own CPU that handles information such as ARP, ICMP, BFD, Netflow, etc, also, in newer LCs they have their own switch fabric that talks to the RSP’s fabric via what they call crossbar bundles.

Picture below shows an ASR9006 (left) and ASR9010 (right) chassis.

Architecture

Here’s some summarized architecture information:

RSP – Route Switch Processor

Basically an x86 PC.

Ethernet out of Band Channel (EOBC) inside the RSP.

Broadcom chipset sits in the back of the card, talks to the RPs and the LCs.

Most common deployment is dual RSP440-SE in Active/Standby.

CPU

Handles routing, MPLS, multicast, HSRP/VRRP, etc.

Here are the main components of a LC:

PHY

Optical ASICs, SFPs

CPU

Programs the forwarding plane, all management software runs here.

Handles ARP, ICMP, BFD, Netflow, OAM, etc.

NP – Network Processor

General purpose CPU optimized for routing and switching functions.

BW is scaled by stacking NPs on the LC

FIA – Fabric Interface Asic

Connects the NP to the switch fabric.

Arbitrates “requests going to the same place at the same time”

Maintain some type of queuing instruction

IOS-XR Specific

Some IOS-XR specific pointers:

  • It is basically a set of databases that talk to each other to exchange information.
  • Modular operating system
  • Processes are completely independent; we will see how to work with these.
  • LPTS (Local Packet Transport Services) is a tunable native control plane policing for the system.
  • Software code is distributed via PIE files (Package Install Envelope) on a per-need basis (i.e. only install the packages that you need)
  • Software patches are distributed as SMU files (Software Maintenance Update).

LC Operations

Let’s look at some LCs with the #show platform command, to the present date, three LC families have been released; Trident, Typhoon and Tomahawk:

Trident:

A9K-40GE, A9K-2T20G, A9K-4T, A9K-8T/4, A9K-8T, A9K-16T/8

Typhoon:

A9K-24×10, A9K-36×10, A9K-MOD80/160, A9K-2×100

Tomahawk:

A9K-8x100G, A9K-4×100, A9K-MOD400/200, A9K-24x10G, A9K-48x10G

In the output of #show platform below, we can see this ASR9010 has dual RSP cards, the command #show redundancy will tell you more information with brief logs about latest redundancy events. Also, from the same output, we can see Trident cards in slots 0, 1, 6 and 7, while slot 2 has a MOD160 Typhoon card with 2 4x10GE modules

The output reads Rack / Slot / Module

Sometimes you may have LC failures, evidenced by platform error syslog messages, this is a great reference for such messages. The following are a few useful commands to execute LC reload operations, a reload is often a valid troubleshooting step when dealing with LC failures.

This is how a Typhoon LC would look like (this is my rework from original Cisco Live illustration):

Notice there is a Switch Fabric chip inside the LC, this was not the case in prior LCs such as Trident. So traffic that ingresses and egresses a port in the same LC doesn’t really leave the LC, for other scenarios, the Switch Fabric inside the LC talks to the other one in the RSP cards.

The Route Processor (RP) programs the two Forwarding Engine (FE) based on the routing protocols, the two FE reside on the LC itself, one for ingress and one for egress:

Ingress FE: contains destination networks and destination LC.

Egress FE: maintains Adjacency Information Base (AIB) for networks attached to it, also identifies next-hop and performs MAC re-write operations.

The most common packet drops in the platform are NP drops, the NP processes both ingress/egress packets. The first step is to identify the NP that we need to troubleshoot, there is an interesting relationship between the NP number and the PHY (port) number, we can use a command to find the NP number, bridge and FIA associated to a certain port, for instance, if we need to find the NP for ports in LC1:

Notice the order of the ports does not line up with the NP number, the main oversimplified reason lies in the distance between each port and NP chip on the physical LC, there needs to be a given distance between each in order to achieve time synchronization when forwarding packets, there is a good explanation of this in Xander’s 2014 Cisco Live video.

And then once you know where to look, you can use these commands to troubleshoot packet drops:

CEF / FIB / RIB

In ASR9Ks the FIB is used by the LCs and the RSP to forward packets, RIB processes do not build the FIBs, instead, RIB downloads a set of selected best routes to the LC’s FIB for processing; this is why we may see different outputs of the CEF table when specifying the node location, if we don’t specify the node then the system by default will consult the active RSP. Also, review this interesting post about CSCse46790.

The following is an output from a CEF drop adjacency, this is because the RIB doesn’t have an entry for the route and there is no default route either:

For a directly connected subnet, CEF will show a local adjacency if there is an existing L2 adjacency (ARP or v6 Neighbor) for the destination, otherwise, CEF will show a glean adjacency to indicate that the packet will be punted for L2 adjacency resolution assistance, i.e. an ARP request will be needed:

Config Management (and others)

The configuration management in IOS-XR is immensely better compared to regular IOS, especially when dealing with multiple users working on the same chassis.

These are some of the commands I found useful in day to day operations:

configure exclusive  –  enter config mode and lock configuration so that no other user can commit while your configuration session is active, this is handy when there are multiple users working on the unit at the same time.

commit confirmed [secs | mins]  –  commit the configuration for the given period of time, configuration has to be confirmed again before the timer expires, otherwise, changes are rolled back, this is useful when you are not entirely sure how the changes will affect the operations.

commit label  –  it is good practice to give a label to significant configuration changes, this label is useful to quickly identify the meaning of the changes when looking at rollbacks and commit history.

root  –  brings you from inner hierarchies of configuration back to the root (global) config level, this is just to be able to move faster.

abort  –  exit configuration mode without committing changes.

rollback configuration “commit ID”  –  rolls the configuration back to the specified configuration ID.

show configuration rollback changes “commit ID”  –  show differences between current configuration and specified rollback target.

show configuration commit list  –  show the list of the last 100 configuration commits with the timestamp and the user, this is one of my favorites.

show configuration commit changes “commit ID”  –  show changes made under specified commit ID.


These are some useful troubleshooting commands too:

show “protocol” trace [reverse]  –  this command displays the protocol/level specific log file, for instance; show pim/ospf/l2vpn trace, there are additional options depending on the protocol/section you are looking at.

monitor interface “intf”  –  gives you input/output/error packet statistics along with the current rate in Kbps, it works for both physical and main interfaces as well as bundle ports.

And then as far as process management:

show processes cpu  –  very similar to regular IOS command

monitor processes  –  this is a somewhat “top like” command

show processes “job id” 

process shutdown|restart “job id”  –  in order to shutdown or restart a job by its name or ID