Intro to QoS // The HQF – FIRE PROTOCOL

The purpose of this post is to provide a basic description and implementation of Quality of Service techniques, specifically in a Cisco deployment, additionally, this will serve as a QoS study guide for the CCIE R&S exam. Please refer to vendor documentation and third party literature for more in-depth concepts.

QoS is typically broken out into different aspects, which makes it a little hard to understand at the beginning, the most basic definition would be, that QoS is a way or technique for treating multiple types of traffic (for any kind of application) in different ways, providing different types of classes or traffic priorities, meaning we will manipulate the traffic in a determined manner depending on the classification or marking that we have previously given to each traffic flow.

Tools that we need to know

Classification

Marking

Congestion Avoidance

Congestion Management

NBAR

Policing

Shaping

We then apply these tools in a hierarchical fashion

Hierarchical Queueing Framework or Modular Queueing CLI

Consider this, every bit of information that we attempt to send out an interface, needs to be transformed into electric or optical signals and be put on the interface hardware in a queue called the TxRing or hardware queue, the process of doing this is called serialization, also, this hardware queue is always going to have a FIFO behavior or first in first out.

This is the topology that we will use for the examples throughout this post:

Classification and Marking

Let’s look at the definitions found on End-to-End QoS Network Design book:

Classification tools sort packets into different traffic types, to which different policies then, can be applied. Classifiers inspect one or more fields in a packet to identify the type of traffic that the packet is carrying, after being identified the traffic can then be passed to re-marking, queuing, policing, shaping, etc.

Marking (or re-marking) typically establishes a trust boundary on which scheduling tools later depend. Markers write a field within the packet, frame, cell or label to preserve the classification decision that was reached at the trust boundary, subsequent nodes do not have to perform the same in-depth classification.

What we should take away from here, is that classification allows organizing traffic into traffic classes or categories on the basis of whether the traffic matches specific criteria or not, for instance, we can use an ACL to match traffic originating from a certain subnet and put all this traffic in a class named CLASS_SOURCING_A , such configuration would look like this:

CSR1(config)#ip access-list extended SOURCING_A

CSR1(config-ext-nacl)#permit ip 192.168.10.0 0.0.0.255 any

CSR1(config-ext-nacl)#exit

CSR1(config)#class-map match-any CLASS_SOURCING_A

CSR1(config-cmap)#match access-group name SOURCING_A

This is how we would normally implement classification, we create a class-map and match on a given criteria, these are all the options that we have available on IOS-XE 15.4 code:

CSR1(config-cmap)#match ?

access-group Access group

any Any packets

application Application to match

class-map Class map

cos IEEE 802.1Q/ISL class of service/user priority values

destination-address Destination address

discard-class Discard behavior identifier

dscp Match DSCP in IPv4 and IPv6 packets

group-object Match object-group

input-interface Select an input interface to match

ip IP specific values

metadata Metadata to match

mpls Multi Protocol Label Switching specific values

not Negate this match result

packet Layer 3 Packet length

precedence Match Precedence in IPv4 and IPv6 packets

protocol Protocol

qos-group Qos-group

source-address Source address

vlan VLANs to match

On the other hand, marking allows tuning the attributes for the traffic on the network, which determines how the traffic will be treated, based on how the attributes for the network traffic are set. When marking, we will manipulate the Differentiated Services Code Point (DSCP) field, Class Selector (CS) code point, Class of Service (CoS), IP Precedence (IPP in ToS field) and Traffic Identifier (TID), these are all different terms used to indicate a designated field in a L2 or L3 header.

For instance, we can modify the ToS field in the IP header to give different priorities to certain traffic flows, RFC-791 indicates the different priorities and values on the ToS field first three bits:

Most of the times, we will mark or re-mark with either the IPP or the DSCP codes (using either decimal, binary or Per Hop Behavior (PHB) notation). To make it clear, DSCP is a more granular version of IPP, DSCP is made of 6 bits and IPP 3 bits, IPP is contained in DSCP, and at the same time, DSCP has a PHB nomenclature comprised of BE, AF and EF, being EF the most critical and its decimal value is 46, binary is 101110. The following is an image from End-to-End QoS book, who’s Classification and Marking section is a highly recommended reading in order to understand the DSCP and IPP concepts:

Also, RFC-4594 describes the service classes configured with DiffServ and recommends how they can be used and how to construct them using DSCPs. Cisco QoS marking recommendations follow this RFC with one single exception between CS5 (DSCP decimal 40 or IPP 5) and AF31 (DSCP decimal 26 or IPP 3), the following is an image describing these recommendations, also another reference for DSCP/PHB/IPP conversion lies here.

Look at the difference in the default DSCP values between an ICMP packet and an EIGRP Hello packet:

Intelligently, ICMP by default receives DSCP CS0 or “Best-Effort” behavior, while EIGRP Hellos being a necessary keepalive for EIGRP adjacencies to remain established, receive DSCP CS6.

Let’s generate ICMP traffic from OS1 going to CSR2’s loopback (2.2.2.2), the traffic would go from OS1 to CSR1 and then to CSR2 through G21 link, we will apply a policy-map to CSR1’s Gi3 interface matching a class-default class (this class is always there and is always matching all traffic) and remark the traffic with DSCP 46 (Expedited Forwarding), here’s how:

CSR1(config)#policy-map REMARK_ALL

CSR1(config-pmap)#class class-default

CSR1(config-pmap-c)#set ip dscp ?

<0-63> Differentiated services codepoint value

af11 Match packets with AF11 dscp (001010)

af12 Match packets with AF12 dscp (001100)

af13 Match packets with AF13 dscp (001110)

af21 Match packets with AF21 dscp (010010)

af22 Match packets with AF22 dscp (010100)

af23 Match packets with AF23 dscp (010110)

af31 Match packets with AF31 dscp (011010)

af32 Match packets with AF32 dscp (011100)

af33 Match packets with AF33 dscp (011110)

af41 Match packets with AF41 dscp (100010)

af42 Match packets with AF42 dscp (100100)

af43 Match packets with AF43 dscp (100110)

cs1 Match packets with CS1(precedence 1) dscp (001000)

cs2 Match packets with CS2(precedence 2) dscp (010000)

cs3 Match packets with CS3(precedence 3) dscp (011000)

cs4 Match packets with CS4(precedence 4) dscp (100000)

cs5 Match packets with CS5(precedence 5) dscp (101000)

cs6 Match packets with CS6(precedence 6) dscp (110000)

cs7 Match packets with CS7(precedence 7) dscp (111000)

default Match packets with default dscp (000000)

ef Match packets with EF dscp (101110)

tunnel set tunnel packet dscp

CSR1(config-pmap-c)#set ip dscp ef

0 1	CSR1(config-pmap-c)#interface Gi3 CSR1(config-if)#service-policy in REMARK_ALL

CSR1(config-if)#do show run | sec policy-map

policy-map REMARK_ALL

class class-default

set ip dscp ef

CSR1(config-if)#do show run interface Gi3

interface GigabitEthernet3

ip address 10.0.15.1 255.255.255.0

ip ospf 1 area 0

negotiation auto

service-policy input REMARK_ALL

end

And if we now do a capture on CSR1’s Gi2 interface, we would be able to see the re-marked ICMP packets with a DSCP value of EF (PHB notation), we can also see that our policy-map has gotten some hits, based on the show policy-map interface Gi3 output:

CSR1(config)#do show policy-map interface Gi3

GigabitEthernet3

Service-policy input: REMARK_ALL

Class-map: class-default (match-any)

4 packets, 316 bytes

5 minute offered rate 0000 bps, drop rate 0000 bps

Match: any

QoS Set

ip dscp ef

Marker statistics: Disabled

Congestion Avoidance

There are a few things we can do to avoid having a congested link, as opposed to congestion management techniques, here, we are going to deal with traffic patterns and spikes to avoid collapsing the link. Congestion avoidance is achieved through packet dropping, some of the common ways of dropping packets are Tail Drop, RED and WRED, let’s examine each.

Tail Drop

TCP Synchronization is the behavior of the TCP traffic traversing an interface when such traffic’s TCP Window size starts cutting in half all at the same time and slow starts begin, usually the interface will drop tons of packets per flow, this is the so-called sawtooth behavior of TCP traffic, synchronization incurs poor bandwidth performance for the link.

Tail drop can result in global TCP synchronization, which we need to avoid, tail drop treats all traffic equally and does not differentiate between classes, when an output queue is full and we are doing tail dropping, packets are dropped until the congestion ceases or the queue is no longer full. Tail drop is enabled by default.

RED and WRED

Random Early Detection works by letting the end hosts know when they should temporarily slow down on the traffic flow, because most of the traffic across networks is TCP based, RED takes advantage of this by randomly dropping packets from the queue before the buffer is 100% full, in order to avoid the congestion of a link, this results in more even traffic patterns (less sawtooth).

In IOS-XE, we can enable RED by adding random-detect keyword to a policy-map, for instance, let’s enable RED on CSR2’s Gi1 interface by applying a random-detect enabled policy-map called “RED” on the interface, we will also lower the bandwidth of Gi1 to 10 Kbps and then let’s generate a 100 Kbps TCP flow of traffic from OS1 directed to CSR3’s Loopback, on top of that CSR1 will ping CSR3’s Loopack with a 1500 bytes size, here’s how:

CSR2#show run | sec policy-map

policy-map RED

class class-default

random-detect

CSR2#show run interface Gi1

interface GigabitEthernet1

bandwidth 10

ip address 10.0.23.2 255.255.255.0

negotiation auto

service-policy output RED

So I’ve let the traffic flow for a while and now, if we check the output of show policy-map interface Gi1 on CSR2, we can see that we have randomly dropped some packets, both IPP 0 and IPP 5, and we have also tail dropped some of them:

CSR2#show policy-map interface Gi1

GigabitEthernet1

Service-policy output: RED

Class-map: class-default (match-any)

662879 packets, 49950020 bytes

5 minute offered rate 82000 bps, drop rate 7000 bps

Match: any

queue limit 64 packets

(queue depth/total drops/no-buffer drops) 0/37500/0

(pkts output/bytes output) 624288/45119843

Exp-weight-constant: 4 (1/16)

Mean queue depth: 0 packets

class Transmitted Random drop Tail drop Minimum Maximum Mark

pkts/bytes pkts/bytes pkts/bytes thresh thresh prob

0 5293/7980143 4/6056 1716/2596587 16 32 1/10

1 0/0 0/0 0/0 18 32 1/10

2 0/0 0/0 0/0 20 32 1/10

3 0/0 0/0 0/0 22 32 1/10

4 0/0 0/0 0/0 24 32 1/10

5 618995/37139700 1795/107700 33985/2039100 26 32 1/10

6 0/0 0/0 0/0 28 32 1/10

7 0/0 0/0 0/0 30 32 1/10

WRED works very similar, but instead we leverage the DSCP for more granularity and instead of randomly dropping traffic flows, the dropping is going to be based on the priority of each specific traffic flow, more technically, the drop rate is based on the “Mark Probability Denominator”, which increases as queue depth increases, again, if the queue exceeds the maximum, tail drop starts occurring. WRED is configured in the same fashion as RED but we are going to add dscp-based keyword to the random-detect line:

CSR2#show run | sec policy-map

policy-map RED

class class-default

random-detect dscp-based

Congestion Management

Congestion Management typically deals with the following scenario: “The link is congested, we have 100% utilization on the output queue, what are we going to do now ?”. This applies outbound only, once the software queue is full, we need to figure out what to do with the traffic, do we re-order the frames ? do we sacrifice one for another (drop) ? This is what congestion management tools deal with, by default all queues have a FIFO behavior, but we can also leverage Weighted Fair Queueing to prioritize some traffic that is delay sensitive, for instance, the VoIP flows.

WFQ / CBWFQ / LLQ

When using Weighted Fair Queuing, the system will automatically allocate an equal share of bandwidth to each traffic flow, packets with the same source/destination IP and TCP/UDP ports belong to the same flow. WFQ is simply enabled by typing the fair-queue keyword per class under the policy-map:

CSR1(config)#policy-map OUTQUEUE

CSR1(config-pmap)#class class-default

CSR1(config-pmap-c)#fair-queue

We will focus on Class Based Weighted Fair Queuing, because it is the one we would implement using the HQF, CBWFQ simply designates a weighted queue per user-defined class, meaning every time we specify a class under a policy-map, we are enabling WFQ on that class, and the weight for this class will be defined by the bandwidth command.

Let’s look at an example, we will create two class-maps matching HTTP and ICMP respectively, and then a policy-map doing bandwidth reservation and LLQ that we will apply to Gi2 outbound (I have deleted every previous class-map and/or policy-maps):

CSR1(config)#do show run class-map

class-map match-all MATCH_HTTP

match protocol http

class-map match-all MATCH_ICMP

match protocol icmp

CSR1(config)#do show run policy-map

policy-map OUTQUEUE

class MATCH_HTTP

bandwidth percent 50

class MATCH_ICMP

priority percent 5

CSR1(config-if)#do show run int Gi2

interface GigabitEthernet2

ip address 10.0.12.1 255.255.255.0

ip ospf 1 area 0

negotiation auto

service-policy output OUTQUEUE

So in the output above, we have defined two classes matching the desired type of traffic, based on NBAR (try doing a show ip nbar port-map) and then created a policy-map calling both classes.

For the HTTP traffic, we have given it a bandwidth reservation of 50% (this is based on the software defined bandwidth of the link that we will apply this policy-map to), what this means is that the minimum bandwidth that will be reserved for this type of traffic in times when the link is congested, will be 50%. We could have also used a fixed value like 50 Mbps, but I rather use percentage. The default bandwidth reservation for the class-default class in this code is 1%, but the system will let you allocate up to 100% of the link for user-defined classes, keep in mind that should the link become congested and 100% of the BW is class-based defined, we could start dropping packets belonging to unclassified traffic.

As far as the ICMP traffic, we have given it a priority of 5%, priority (LLQ) means the maximum percentage guaranteed, so in this case, up 5% of the bandwidth of the link is guaranteed for ICMP traffic, if this type of traffic goes above 5% and the link is congested, a built-in policer could start dropping traffic based on whether the TxRing is full at that moment or not. If on the other hand, the ICMP traffic goes over 5% of the link and the link is not congested, it will start receiving a FIFO treatment.

So if we start generating some HTTP and ICMP traffic from OS1 going to CSR2’s Loopback, we should start seeing some counters clocking when we do a show policy-map interface Gi2:

CSR1#show policy-map interface Gi2

GigabitEthernet2

Service-policy output: OUTQUEUE

queue stats for all priority classes:

Queueing

queue limit 512 packets

(queue depth/total drops/no-buffer drops) 0/0/0

(pkts output/bytes output) 2005/228570

Class-map: MATCH_HTTP (match-all)

213 packets, 17804 bytes

5 minute offered rate 0000 bps, drop rate 0000 bps

Match: protocol http

Queueing

queue limit 64 packets

(queue depth/total drops/no-buffer drops) 0/0/0

(pkts output/bytes output) 213/17804

bandwidth 50% (500000 kbps)

Class-map: MATCH_ICMP (match-all)

2005 packets, 228570 bytes

5 minute offered rate 0000 bps, drop rate 0000 bps

Match: protocol icmp

Priority: 5% (50000 kbps), burst bytes 1250000, b/w exceed drops: 0

Class-map: class-default (match-any)

3976 packets, 313933 bytes

5 minute offered rate 0000 bps, drop rate 0000 bps

Match: any

queue limit 64 packets

(queue depth/total drops/no-buffer drops) 0/0/0

(pkts output/bytes output) 320/19447

Policing

Policers are typically used to rate-limit the traffic as it enters an interface, they can also be applied outbound but in most cases, we would use them to, for instance, limit the rate at which traffic coming from a customer enters a Provider Edge (PE) router interface. The way policers work is such as Token Bucket, which is a formal definition of a rate of transfer, this is composed of three components, the burst committed (bc), the committed information rate (cir) and the time interval (tc).

The cir will be in most cases the one we will consider when limiting the traffic, the bc specifies in bytes/bits per burst how much traffic can be sent within a given unit of time, and the tc specifies the amount of time between each burst. Traffic flow at a given rate per second is said to be conformed if it falls within the cir rate, otherwise, it is said to be exceeded.

cir = bc / tc

Let’s look at an example, we will police on CSR3’s Gi3 interface to rate-limit traffic coming in matching everything (class-default) to 8 Kbps, which is the minimum allowed in this code that I’m using, by default conformed traffic gets marked to transmit and exceeded traffic gets marked to be dropped, dropping is a marking action, we could also mark it so that the packet receives a low priority after a certain threshold instead.

CSR3#show run policy-map

policy-map POLICER

class class-default

police 24000

CSR3#show run interface Gi3

interface GigabitEthernet3

ip address 10.0.36.3 255.255.255.0

negotiation auto

service-policy input POLICER

And if we look at the show policy-map interface command, we see that the bc value has been automatically calculated for us:

GigabitEthernet3

Service-policy input: POLICER

Class-map: class-default (match-any)

25458 packets, 1527480 bytes

5 minute offered rate 0000 bps, drop rate 0000 bps

Match: any

police:

cir 24000 bps, bc 1500 bytes

conformed 1726 packets, 103560 bytes; actions:

transmit

exceeded 23732 packets, 1423920 bytes; actions:

drop

conformed 0000 bps, exceeded 0000 bps

This is what’s called a 2 color marker, the two markers are conformed and exceeded. The bc being 1500 bytes means we are allowed to receive 1500 bytes on a per interval basis in order to achieve the target cir of 24 Kbps. So if we transform 1500 bytes to bits, we get 12000 bits, and because tc = bc/cir, we can conclude that tc = 0.5 which is the same as 500ms, meaning the policer will run 2 times per second because 1000 ms / 2 = 500 ms.

So if we wanted to make this policer more strict, we would lower the bc in order to get a lower tc, but in any case, the user-manipulated tc ideally would have to be within 1 and 125ms, otherwise the router will automatically determine an internal tc value that it believes will be more stable with.

In the output above, we can see that 1726 packets have been conformed and transmitted, and 23732 packets have not. We also could have said something like, if the cir is exceeded, we will not drop the packets but instead will remark them to be assigned to a scavenger class or give them a best-effort behavior, like this:

policy-map POLICER

class class-default

police cir 24000

conform-action transmit

exceed-action set-prec-transmit 1

or ...

policy-map POLICER

class class-default

police cir 24000

conform-action transmit

exceed-action set-dscp-transmit 0

Shaping

A shaper typically delays excess of traffic by using a buffer or queueing mechanism to hold down the packets whenever the rate of transmission exceeds a certain user-defined threshold. A good shaper in most cases will match the settings of the counter policer, for instance, we could use a shaper outbound to match the cir and bc at which the service provider is policing us on the gateway interface. The shapers work by sending the bc amount of data every tc interval at the physical port speed (serialization). The default queue type on a shaper is FIFO, but we can enable WFQ inside the shaper.

So the goals of a shaper could be summarized as: smooth out traffic bursts, prepare traffic for ingress policing and delay/queue up exceeding traffic.

Let’s go into CSR2 and create a shaper on Gi3, we will assume we want to match a 5 Mbps policer on our GW to the internet:

policy-map SHAPER

class class-default

fair-queue

shape average 5000000

interface GigabitEthernet3

ip address 192.168.0.252 255.255.255.0

negotiation auto

service-policy output SHAPER

Then if we look at the show policy-map interface Gi3 output:

GigabitEthernet3

Service-policy output: SHAPER

Class-map: class-default (match-any)

705 packets, 51137 bytes

5 minute offered rate 0000 bps, drop rate 0000 bps

Match: any

Queueing

queue limit 64 packets

(queue depth/total drops/no-buffer drops/flowdrops) 0/0/0/0

(pkts output/bytes output) 427/30531

Fair-queue: per-flow queue limit 16 packets

shape (average) cir 5000000, bc 20000, be 20000

target shape rate 5000000

We can see that the bc and be values have been automatically calculated for us, and notice be = bc, they are always going to be the same unless be is manually specified, be is the burst excess, back to our Token Bucket model, be is the amount of bits allowed to be transmitted if the bc bucket did not get emptied completely after a tc interval. Also notice the queue limit is 64 packets but we could have changed this easily by adding the queue-limit keyword to the fair-queue command under the policer.

So now Gig3 will be allowed to send out bursts of 20 Kbits every tc interval, remember that these bursts are always sent at the Access Rate (the actual speed of the interface).

HQF

If you have read all the way to this point, HQF is what we have been doing all this time, HQF basically refers to the “new way” or the new syntax when configuring QoS in Cisco devices, which is a nested structure using class-maps to match and classify the traffic, policy-maps to manipulate these classes and then applying these policy-maps using the service-policy command.

This comes handy when for instance, we have logical interfaces like subinterfaces and tunnel interfaces, we can implement a traffic-limiting feature at the parent level and queueing at the lower levels, let’s look at an example.

We will create nested policies, the parent policy will be used to shape the cir on an interface to whatever we want, and then we will allocate different bandwidth reservations to different services, like VoIP or MC PIM traffic:

CSR2#show run policy-map

policy-map SERVICES

class MATCH_VOIP

bandwidth percent 50

class MATCH_MC

bandwidth percent 30

policy-map SHAPE_100M

class class-default

shape average 100000000

service-policy SERVICES

So we have defined a parent policy-map called SHAPE_100M, which shapes all traffic (because it is matching class-default) to 100 Mbps and from this parent policy we have called the child policy called SERVICES, this policy-map called SERVICES matches on different protocols and allocates a different percentage of bandwidth to each. This is how you would create nested policies using the HQF, the bandwidth allocations for the different services will be based on the parent policy, which is shaping at 100 Mbps:

GigabitEthernet3

Service-policy output: SHAPE_100M

Class-map: class-default (match-any)

0 packets, 0 bytes

5 minute offered rate 0000 bps, drop rate 0000 bps

Match: any

Queueing

queue limit 64 packets

(queue depth/total drops/no-buffer drops) 0/0/0

(pkts output/bytes output) 0/0

shape (average) cir 100000000, bc 400000, be 400000

target shape rate 100000000

Service-policy : SERVICES

Class-map: MATCH_VOIP (match-all)

0 packets, 0 bytes

5 minute offered rate 0000 bps, drop rate 0000 bps

Match: protocol rtp

Queueing

queue limit 64 packets

(queue depth/total drops/no-buffer drops) 0/0/0

(pkts output/bytes output) 0/0

bandwidth 50% (50000 kbps)

Class-map: MATCH_MC (match-all)

0 packets, 0 bytes

5 minute offered rate 0000 bps, drop rate 0000 bps

Match: protocol pim

Queueing

queue limit 64 packets

(queue depth/total drops/no-buffer drops) 0/0/0

(pkts output/bytes output) 0/0

bandwidth 30% (30000 kbps)

Class-map: class-default (match-any)

0 packets, 0 bytes

5 minute offered rate 0000 bps, drop rate 0000 bps

Match: any

queue limit 64 packets

(queue depth/total drops/no-buffer drops) 0/0/0

(pkts output/bytes output) 0/0