The purpose of this post is to provide a basic description and implementation of Quality of Service techniques, specifically in a Cisco deployment, additionally, this will serve as a QoS study guide for the CCIE R&S exam. Please refer to vendor documentation and third party literature for more in-depth concepts.
QoS is typically broken out into different aspects, which makes it a little hard to understand at the beginning, the most basic definition would be, that QoS is a way or technique for treating multiple types of traffic (for any kind of application) in different ways, providing different types of classes or traffic priorities, meaning we will manipulate the traffic in a determined manner depending on the classification or marking that we have previously given to each traffic flow.
Tools that we need to know
Classification
Marking
Congestion Avoidance
Congestion Management
NBAR
Policing
Shaping
We then apply these tools in a hierarchical fashion
Hierarchical Queueing Framework or Modular Queueing CLI
Consider this, every bit of information that we attempt to send out an interface, needs to be transformed into electric or optical signals and be put on the interface hardware in a queue called the TxRing or hardware queue, the process of doing this is called serialization, also, this hardware queue is always going to have a FIFO behavior or first in first out.
This is the topology that we will use for the examples throughout this post:
Classification and Marking
Let’s look at the definitions found on End-to-End QoS Network Design book:
Classification tools sort packets into different traffic types, to which different policies then, can be applied. Classifiers inspect one or more fields in a packet to identify the type of traffic that the packet is carrying, after being identified the traffic can then be passed to re-marking, queuing, policing, shaping, etc.
Marking (or re-marking) typically establishes a trust boundary on which scheduling tools later depend. Markers write a field within the packet, frame, cell or label to preserve the classification decision that was reached at the trust boundary, subsequent nodes do not have to perform the same in-depth classification.
What we should take away from here, is that classification allows organizing traffic into traffic classes or categories on the basis of whether the traffic matches specific criteria or not, for instance, we can use an ACL to match traffic originating from a certain subnet and put all this traffic in a class named CLASS_SOURCING_A , such configuration would look like this:
0 1 2 3 4 |
CSR1(config)#ip access-list extended SOURCING_A CSR1(config-ext-nacl)#permit ip 192.168.10.0 0.0.0.255 any CSR1(config-ext-nacl)#exit CSR1(config)#class-map match-any CLASS_SOURCING_A CSR1(config-cmap)#match access-group name SOURCING_A |
This is how we would normally implement classification, we create a class-map and match on a given criteria, these are all the options that we have available on IOS-XE 15.4 code:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
CSR1(config-cmap)#match ? access-group Access group any Any packets application Application to match class-map Class map cos IEEE 802.1Q/ISL class of service/user priority values destination-address Destination address discard-class Discard behavior identifier dscp Match DSCP in IPv4 and IPv6 packets group-object Match object-group input-interface Select an input interface to match ip IP specific values metadata Metadata to match mpls Multi Protocol Label Switching specific values not Negate this match result packet Layer 3 Packet length precedence Match Precedence in IPv4 and IPv6 packets protocol Protocol qos-group Qos-group source-address Source address vlan VLANs to match |
On the other hand, marking allows tuning the attributes for the traffic on the network, which determines how the traffic will be treated, based on how the attributes for the network traffic are set. When marking, we will manipulate the Differentiated Services Code Point (DSCP) field, Class Selector (CS) code point, Class of Service (CoS), IP Precedence (IPP in ToS field) and Traffic Identifier (TID), these are all different terms used to indicate a designated field in a L2 or L3 header.
For instance, we can modify the ToS field in the IP header to give different priorities to certain traffic flows, RFC-791 indicates the different priorities and values on the ToS field first three bits:
Most of the times, we will mark or re-mark with either the IPP or the DSCP codes (using either decimal, binary or Per Hop Behavior (PHB) notation). To make it clear, DSCP is a more granular version of IPP, DSCP is made of 6 bits and IPP 3 bits, IPP is contained in DSCP, and at the same time, DSCP has a PHB nomenclature comprised of BE, AF and EF, being EF the most critical and its decimal value is 46, binary is 101110. The following is an image from End-to-End QoS book, who’s Classification and Marking section is a highly recommended reading in order to understand the DSCP and IPP concepts:
Also, RFC-4594 describes the service classes configured with DiffServ and recommends how they can be used and how to construct them using DSCPs. Cisco QoS marking recommendations follow this RFC with one single exception between CS5 (DSCP decimal 40 or IPP 5) and AF31 (DSCP decimal 26 or IPP 3), the following is an image describing these recommendations, also another reference for DSCP/PHB/IPP conversion lies here.
Look at the difference in the default DSCP values between an ICMP packet and an EIGRP Hello packet:
Intelligently, ICMP by default receives DSCP CS0 or “Best-Effort” behavior, while EIGRP Hellos being a necessary keepalive for EIGRP adjacencies to remain established, receive DSCP CS6.
Let’s generate ICMP traffic from OS1 going to CSR2’s loopback (2.2.2.2), the traffic would go from OS1 to CSR1 and then to CSR2 through G21 link, we will apply a policy-map to CSR1’s Gi3 interface matching a class-default class (this class is always there and is always matching all traffic) and remark the traffic with DSCP 46 (Expedited Forwarding), here’s how:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
CSR1(config)#policy-map REMARK_ALL CSR1(config-pmap)#class class-default CSR1(config-pmap-c)#set ip dscp ? <0-63> Differentiated services codepoint value af11 Match packets with AF11 dscp (001010) af12 Match packets with AF12 dscp (001100) af13 Match packets with AF13 dscp (001110) af21 Match packets with AF21 dscp (010010) af22 Match packets with AF22 dscp (010100) af23 Match packets with AF23 dscp (010110) af31 Match packets with AF31 dscp (011010) af32 Match packets with AF32 dscp (011100) af33 Match packets with AF33 dscp (011110) af41 Match packets with AF41 dscp (100010) af42 Match packets with AF42 dscp (100100) af43 Match packets with AF43 dscp (100110) cs1 Match packets with CS1(precedence 1) dscp (001000) cs2 Match packets with CS2(precedence 2) dscp (010000) cs3 Match packets with CS3(precedence 3) dscp (011000) cs4 Match packets with CS4(precedence 4) dscp (100000) cs5 Match packets with CS5(precedence 5) dscp (101000) cs6 Match packets with CS6(precedence 6) dscp (110000) cs7 Match packets with CS7(precedence 7) dscp (111000) default Match packets with default dscp (000000) ef Match packets with EF dscp (101110) tunnel set tunnel packet dscp CSR1(config-pmap-c)#set ip dscp ef |
0 1 |
CSR1(config-pmap-c)#interface Gi3 CSR1(config-if)#service-policy in REMARK_ALL |
0 1 2 3 4 5 6 7 8 9 10 11 |
CSR1(config-if)#do show run | sec policy-map policy-map REMARK_ALL class class-default set ip dscp ef CSR1(config-if)#do show run interface Gi3 interface GigabitEthernet3 ip address 10.0.15.1 255.255.255.0 ip ospf 1 area 0 negotiation auto service-policy input REMARK_ALL end |
And if we now do a capture on CSR1’s Gi2 interface, we would be able to see the re-marked ICMP packets with a DSCP value of EF (PHB notation), we can also see that our policy-map has gotten some hits, based on the show policy-map interface Gi3 output:
0 1 2 3 4 5 6 7 8 9 10 11 |
CSR1(config)#do show policy-map interface Gi3 GigabitEthernet3 Service-policy input: REMARK_ALL Class-map: class-default (match-any) 4 packets, 316 bytes 5 minute offered rate 0000 bps, drop rate 0000 bps Match: any QoS Set ip dscp ef Marker statistics: Disabled |
Congestion Avoidance
There are a few things we can do to avoid having a congested link, as opposed to congestion management techniques, here, we are going to deal with traffic patterns and spikes to avoid collapsing the link. Congestion avoidance is achieved through packet dropping, some of the common ways of dropping packets are Tail Drop, RED and WRED, let’s examine each.
Tail Drop
TCP Synchronization is the behavior of the TCP traffic traversing an interface when such traffic’s TCP Window size starts cutting in half all at the same time and slow starts begin, usually the interface will drop tons of packets per flow, this is the so-called sawtooth behavior of TCP traffic, synchronization incurs poor bandwidth performance for the link.
Tail drop can result in global TCP synchronization, which we need to avoid, tail drop treats all traffic equally and does not differentiate between classes, when an output queue is full and we are doing tail dropping, packets are dropped until the congestion ceases or the queue is no longer full. Tail drop is enabled by default.
RED and WRED
Random Early Detection works by letting the end hosts know when they should temporarily slow down on the traffic flow, because most of the traffic across networks is TCP based, RED takes advantage of this by randomly dropping packets from the queue before the buffer is 100% full, in order to avoid the congestion of a link, this results in more even traffic patterns (less sawtooth).
In IOS-XE, we can enable RED by adding random-detect keyword to a policy-map, for instance, let’s enable RED on CSR2’s Gi1 interface by applying a random-detect enabled policy-map called “RED” on the interface, we will also lower the bandwidth of Gi1 to 10 Kbps and then let’s generate a 100 Kbps TCP flow of traffic from OS1 directed to CSR3’s Loopback, on top of that CSR1 will ping CSR3’s Loopack with a 1500 bytes size, here’s how:
0 1 2 3 4 5 6 7 8 9 10 |
CSR2#show run | sec policy-map policy-map RED class class-default random-detect CSR2#show run interface Gi1 interface GigabitEthernet1 bandwidth 10 ip address 10.0.23.2 255.255.255.0 negotiation auto service-policy output RED |
So I’ve let the traffic flow for a while and now, if we check the output of show policy-map interface Gi1 on CSR2, we can see that we have randomly dropped some packets, both IPP 0 and IPP 5, and we have also tail dropped some of them:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
CSR2#show policy-map interface Gi1 GigabitEthernet1 Service-policy output: RED Class-map: class-default (match-any) 662879 packets, 49950020 bytes 5 minute offered rate 82000 bps, drop rate 7000 bps Match: any queue limit 64 packets (queue depth/total drops/no-buffer drops) 0/37500/0 (pkts output/bytes output) 624288/45119843 Exp-weight-constant: 4 (1/16) Mean queue depth: 0 packets class Transmitted Random drop Tail drop Minimum Maximum Mark pkts/bytes pkts/bytes pkts/bytes thresh thresh prob 0 5293/7980143 4/6056 1716/2596587 16 32 1/10 1 0/0 0/0 0/0 18 32 1/10 2 0/0 0/0 0/0 20 32 1/10 3 0/0 0/0 0/0 22 32 1/10 4 0/0 0/0 0/0 24 32 1/10 5 618995/37139700 1795/107700 33985/2039100 26 32 1/10 6 0/0 0/0 0/0 28 32 1/10 7 0/0 0/0 0/0 30 32 1/10 |
WRED works very similar, but instead we leverage the DSCP for more granularity and instead of randomly dropping traffic flows, the dropping is going to be based on the priority of each specific traffic flow, more technically, the drop rate is based on the “Mark Probability Denominator”, which increases as queue depth increases, again, if the queue exceeds the maximum, tail drop starts occurring. WRED is configured in the same fashion as RED but we are going to add dscp-based keyword to the random-detect line:
0 1 2 3 |
CSR2#show run | sec policy-map policy-map RED class class-default random-detect dscp-based |
Congestion Management
Congestion Management typically deals with the following scenario: “The link is congested, we have 100% utilization on the output queue, what are we going to do now ?”. This applies outbound only, once the software queue is full, we need to figure out what to do with the traffic, do we re-order the frames ? do we sacrifice one for another (drop) ? This is what congestion management tools deal with, by default all queues have a FIFO behavior, but we can also leverage Weighted Fair Queueing to prioritize some traffic that is delay sensitive, for instance, the VoIP flows.
WFQ / CBWFQ / LLQ
When using Weighted Fair Queuing, the system will automatically allocate an equal share of bandwidth to each traffic flow, packets with the same source/destination IP and TCP/UDP ports belong to the same flow. WFQ is simply enabled by typing the fair-queue keyword per class under the policy-map:
0 1 2 |
CSR1(config)#policy-map OUTQUEUE CSR1(config-pmap)#class class-default CSR1(config-pmap-c)#fair-queue |
We will focus on Class Based Weighted Fair Queuing, because it is the one we would implement using the HQF, CBWFQ simply designates a weighted queue per user-defined class, meaning every time we specify a class under a policy-map, we are enabling WFQ on that class, and the weight for this class will be defined by the bandwidth command.
Let’s look at an example, we will create two class-maps matching HTTP and ICMP respectively, and then a policy-map doing bandwidth reservation and LLQ that we will apply to Gi2 outbound (I have deleted every previous class-map and/or policy-maps):
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
CSR1(config)#do show run class-map class-map match-all MATCH_HTTP match protocol http class-map match-all MATCH_ICMP match protocol icmp CSR1(config)#do show run policy-map policy-map OUTQUEUE class MATCH_HTTP bandwidth percent 50 class MATCH_ICMP priority percent 5 CSR1(config-if)#do show run int Gi2 interface GigabitEthernet2 ip address 10.0.12.1 255.255.255.0 ip ospf 1 area 0 negotiation auto service-policy output OUTQUEUE |
So in the output above, we have defined two classes matching the desired type of traffic, based on NBAR (try doing a show ip nbar port-map) and then created a policy-map calling both classes.
For the HTTP traffic, we have given it a bandwidth reservation of 50% (this is based on the software defined bandwidth of the link that we will apply this policy-map to), what this means is that the minimum bandwidth that will be reserved for this type of traffic in times when the link is congested, will be 50%. We could have also used a fixed value like 50 Mbps, but I rather use percentage. The default bandwidth reservation for the class-default class in this code is 1%, but the system will let you allocate up to 100% of the link for user-defined classes, keep in mind that should the link become congested and 100% of the BW is class-based defined, we could start dropping packets belonging to unclassified traffic.
As far as the ICMP traffic, we have given it a priority of 5%, priority (LLQ) means the maximum percentage guaranteed, so in this case, up 5% of the bandwidth of the link is guaranteed for ICMP traffic, if this type of traffic goes above 5% and the link is congested, a built-in policer could start dropping traffic based on whether the TxRing is full at that moment or not. If on the other hand, the ICMP traffic goes over 5% of the link and the link is not congested, it will start receiving a FIFO treatment.
So if we start generating some HTTP and ICMP traffic from OS1 going to CSR2’s Loopback, we should start seeing some counters clocking when we do a show policy-map interface Gi2:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
CSR1#show policy-map interface Gi2 GigabitEthernet2 Service-policy output: OUTQUEUE queue stats for all priority classes: Queueing queue limit 512 packets (queue depth/total drops/no-buffer drops) 0/0/0 (pkts output/bytes output) 2005/228570 Class-map: MATCH_HTTP (match-all) 213 packets, 17804 bytes 5 minute offered rate 0000 bps, drop rate 0000 bps Match: protocol http Queueing queue limit 64 packets (queue depth/total drops/no-buffer drops) 0/0/0 (pkts output/bytes output) 213/17804 bandwidth 50% (500000 kbps) Class-map: MATCH_ICMP (match-all) 2005 packets, 228570 bytes 5 minute offered rate 0000 bps, drop rate 0000 bps Match: protocol icmp Priority: 5% (50000 kbps), burst bytes 1250000, b/w exceed drops: 0 Class-map: class-default (match-any) 3976 packets, 313933 bytes 5 minute offered rate 0000 bps, drop rate 0000 bps Match: any queue limit 64 packets (queue depth/total drops/no-buffer drops) 0/0/0 (pkts output/bytes output) 320/19447 |
Policing
Policers are typically used to rate-limit the traffic as it enters an interface, they can also be applied outbound but in most cases, we would use them to, for instance, limit the rate at which traffic coming from a customer enters a Provider Edge (PE) router interface. The way policers work is such as Token Bucket, which is a formal definition of a rate of transfer, this is composed of three components, the burst committed (bc), the committed information rate (cir) and the time interval (tc).
The cir will be in most cases the one we will consider when limiting the traffic, the bc specifies in bytes/bits per burst how much traffic can be sent within a given unit of time, and the tc specifies the amount of time between each burst. Traffic flow at a given rate per second is said to be conformed if it falls within the cir rate, otherwise, it is said to be exceeded.
cir = bc / tc
Let’s look at an example, we will police on CSR3’s Gi3 interface to rate-limit traffic coming in matching everything (class-default) to 8 Kbps, which is the minimum allowed in this code that I’m using, by default conformed traffic gets marked to transmit and exceeded traffic gets marked to be dropped, dropping is a marking action, we could also mark it so that the packet receives a low priority after a certain threshold instead.
0 1 2 3 4 5 6 7 8 9 |
CSR3#show run policy-map policy-map POLICER class class-default police 24000 CSR3#show run interface Gi3 interface GigabitEthernet3 ip address 10.0.36.3 255.255.255.0 negotiation auto service-policy input POLICER |
And if we look at the show policy-map interface command, we see that the bc value has been automatically calculated for us:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
GigabitEthernet3 Service-policy input: POLICER Class-map: class-default (match-any) 25458 packets, 1527480 bytes 5 minute offered rate 0000 bps, drop rate 0000 bps Match: any police: cir 24000 bps, bc 1500 bytes conformed 1726 packets, 103560 bytes; actions: transmit exceeded 23732 packets, 1423920 bytes; actions: drop conformed 0000 bps, exceeded 0000 bps |
This is what’s called a 2 color marker, the two markers are conformed and exceeded. The bc being 1500 bytes means we are allowed to receive 1500 bytes on a per interval basis in order to achieve the target cir of 24 Kbps. So if we transform 1500 bytes to bits, we get 12000 bits, and because tc = bc/cir, we can conclude that tc = 0.5 which is the same as 500ms, meaning the policer will run 2 times per second because 1000 ms / 2 = 500 ms.
So if we wanted to make this policer more strict, we would lower the bc in order to get a lower tc, but in any case, the user-manipulated tc ideally would have to be within 1 and 125ms, otherwise the router will automatically determine an internal tc value that it believes will be more stable with.
In the output above, we can see that 1726 packets have been conformed and transmitted, and 23732 packets have not. We also could have said something like, if the cir is exceeded, we will not drop the packets but instead will remark them to be assigned to a scavenger class or give them a best-effort behavior, like this:
0 1 2 3 4 5 6 7 8 9 10 11 12 |
policy-map POLICER class class-default police cir 24000 conform-action transmit exceed-action set-prec-transmit 1 or ... policy-map POLICER class class-default police cir 24000 conform-action transmit exceed-action set-dscp-transmit 0 |
Shaping
A shaper typically delays excess of traffic by using a buffer or queueing mechanism to hold down the packets whenever the rate of transmission exceeds a certain user-defined threshold. A good shaper in most cases will match the settings of the counter policer, for instance, we could use a shaper outbound to match the cir and bc at which the service provider is policing us on the gateway interface. The shapers work by sending the bc amount of data every tc interval at the physical port speed (serialization). The default queue type on a shaper is FIFO, but we can enable WFQ inside the shaper.
So the goals of a shaper could be summarized as: smooth out traffic bursts, prepare traffic for ingress policing and delay/queue up exceeding traffic.
Let’s go into CSR2 and create a shaper on Gi3, we will assume we want to match a 5 Mbps policer on our GW to the internet:
0 1 2 3 4 5 6 7 8 |
policy-map SHAPER class class-default fair-queue shape average 5000000 interface GigabitEthernet3 ip address 192.168.0.252 255.255.255.0 negotiation auto service-policy output SHAPER |
Then if we look at the show policy-map interface Gi3 output:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
GigabitEthernet3 Service-policy output: SHAPER Class-map: class-default (match-any) 705 packets, 51137 bytes 5 minute offered rate 0000 bps, drop rate 0000 bps Match: any Queueing queue limit 64 packets (queue depth/total drops/no-buffer drops/flowdrops) 0/0/0/0 (pkts output/bytes output) 427/30531 Fair-queue: per-flow queue limit 16 packets shape (average) cir 5000000, bc 20000, be 20000 target shape rate 5000000 |
We can see that the bc and be values have been automatically calculated for us, and notice be = bc, they are always going to be the same unless be is manually specified, be is the burst excess, back to our Token Bucket model, be is the amount of bits allowed to be transmitted if the bc bucket did not get emptied completely after a tc interval. Also notice the queue limit is 64 packets but we could have changed this easily by adding the queue-limit keyword to the fair-queue command under the policer.
So now Gig3 will be allowed to send out bursts of 20 Kbits every tc interval, remember that these bursts are always sent at the Access Rate (the actual speed of the interface).
HQF
If you have read all the way to this point, HQF is what we have been doing all this time, HQF basically refers to the “new way” or the new syntax when configuring QoS in Cisco devices, which is a nested structure using class-maps to match and classify the traffic, policy-maps to manipulate these classes and then applying these policy-maps using the service-policy command.
This comes handy when for instance, we have logical interfaces like subinterfaces and tunnel interfaces, we can implement a traffic-limiting feature at the parent level and queueing at the lower levels, let’s look at an example.
We will create nested policies, the parent policy will be used to shape the cir on an interface to whatever we want, and then we will allocate different bandwidth reservations to different services, like VoIP or MC PIM traffic:
0 1 2 3 4 5 6 7 8 9 10 |
CSR2#show run policy-map policy-map SERVICES class MATCH_VOIP bandwidth percent 50 class MATCH_MC bandwidth percent 30 policy-map SHAPE_100M class class-default shape average 100000000 service-policy SERVICES |
So we have defined a parent policy-map called SHAPE_100M, which shapes all traffic (because it is matching class-default) to 100 Mbps and from this parent policy we have called the child policy called SERVICES, this policy-map called SERVICES matches on different protocols and allocates a different percentage of bandwidth to each. This is how you would create nested policies using the HQF, the bandwidth allocations for the different services will be based on the parent policy, which is shaping at 100 Mbps:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
GigabitEthernet3 Service-policy output: SHAPE_100M Class-map: class-default (match-any) 0 packets, 0 bytes 5 minute offered rate 0000 bps, drop rate 0000 bps Match: any Queueing queue limit 64 packets (queue depth/total drops/no-buffer drops) 0/0/0 (pkts output/bytes output) 0/0 shape (average) cir 100000000, bc 400000, be 400000 target shape rate 100000000 Service-policy : SERVICES Class-map: MATCH_VOIP (match-all) 0 packets, 0 bytes 5 minute offered rate 0000 bps, drop rate 0000 bps Match: protocol rtp Queueing queue limit 64 packets (queue depth/total drops/no-buffer drops) 0/0/0 (pkts output/bytes output) 0/0 bandwidth 50% (50000 kbps) Class-map: MATCH_MC (match-all) 0 packets, 0 bytes 5 minute offered rate 0000 bps, drop rate 0000 bps Match: protocol pim Queueing queue limit 64 packets (queue depth/total drops/no-buffer drops) 0/0/0 (pkts output/bytes output) 0/0 bandwidth 30% (30000 kbps) Class-map: class-default (match-any) 0 packets, 0 bytes 5 minute offered rate 0000 bps, drop rate 0000 bps Match: any queue limit 64 packets (queue depth/total drops/no-buffer drops) 0/0/0 (pkts output/bytes output) 0/0 |