Wednesday, December 23, 2009

MPLS Ping and Traceroute

Intro
In this post we are going to discuss operations of the “traceroute” and “ping” command in MPLS environment. The reader is supposed to have solid understanding of MPLS VPN technologies, prior to read this document. Note the use of terms “MPLS ping/traceroute” which are interchangeable with “LSP ping/traceroute”
The following is the testbed topology we are going to use for simulations. All PE/P routers are 7206s running IOS version 12.0(33)S. Unfortunately, MPLS ping and traceroute commands are just a recent addition to IOS code, and thus you only see them in later 12.4T versions and recent 12.0S images. The IOS versions currently used in the CCIE SP lab do not support the MPLS ping/trace features.

Classic Ping and Traceroute

Two classic network utilities – ping and traceroute – form the basis of almost any layer 3 troubleshooting process. The first operation allows validating end-to-end connectivity and the second allows detecting the point where the network path breaks. The ping operation is relatively simple, so first we are going to quickly recap how the traceroute operation works.
1) Originating router emits UDP packets (or ICMP echo packets, for Windows traceroute) with [destination IP] = [target IP] (where we are tracing to) and [destination port] in range 33434-33464. This range might be different – the only requirement is that it should cover ports not used by any application in the target machine.
2) The first packet has IP TTL=1 and UDP port=33434 (or any other first port in the selected range). Every next probe has the TTL value and the port number incremented by one (the core part is incrementing TTL by one every time the host sends a new probe).
3) Every transit node on the path to the destination forwards the probe using regular IP routing rules. If the TTL in the packet expires, the respective router drops the packet and returns ICMP error message “TTL expired”. It will also include the header of the original packet in the response.
Based on this response, the originating node may determine the following:
a) IP address of the transit node.
b) hop-count to the transit node based on the port number in the encapsulated payload or the internal probe counter.
4) If the packet finally reaches the ultimate destination, the final hop will try to route the UDP datagram to the (hopefully) unused UDP port. Since the port is unused, the host will return ICMP port unreachable message, encapsulating part of the original packet in the response. Based on the response, the originating node may determine the following:
a) The node returning the message is the ultimate destination, since it returned the ICMP port-unreachable.
b) The hop count to reach the final node.
The core procedure is incrementing TTL by one every turn and sending out probe packets. Commonly, traceroute utilities send 3 probes for every TTL value, to get more detailed delay statistics for every node on the path and compensate for potential packet loss.
Note, that every node in the path performs routing decision based on the [destination IP] address of the probe. The generated ICMP replies are routed using the source IP address found in the original UDP packet.
Classic Ping in MPLS environment
Nothing special here. ICMP message are labeled using LFIB lookups for the destination prefix. Based on the “level” of the ping operation, the encapsulating label stack could be single label (e.g. when you ping between two P routers) dual label (LDP and VPNv4 label, when you ping between two CE routers) or even larger, when you use advanced scenarios like TE tunnels or CsC. The biggest disadvantage of using the classic ping operation in MPLS networks is that it could not detect breaks in MPLS LSPs.
Consider the following examples.
a) You ping between two P routers, or from a PE router to a P router. At some point, label switching path breaks. However, routers continue forwarding ICMP packets using the regular IP routing procedure. Essentially, it looks like end-to-end connectivity is preserved, while LSP is actually broken. This situation results is traffic between CE devices being dropped, even though connectivity looks fine in the core.
b) You ping between two CE/C devices and across the MPLS core. If the ping operation fails, you cannot tell whether it’s due to breaking of LSP path in the core or breaking of the classic layer 3 path within the customer site.
While the second situation is not really important, the first one poses serious issue. The classic ping operation should be modified so that it could explicitly detect breakings in MPLS LSPs.
Classic Traceroute in MPLS environment
MPLS environment replaced routing lookup with label-based switching. This allows for creating of virtual “tunnels” across Layer 3 clouds and effective switching of IP packets based on the tag values, not destination IP addresses. One of the most common uses for MPLS is MPLS BGP VPNs. With the VPNs, an IP packet may have source/destination IP addresses (i.e. customer IP addresses) having no meaning to a particular MPLS core router that simply switches IP packets based on the topmost MPLS label (“tunnel” header). This makes impossible for a transit router (e.g. a core router) to return an ICMP error message to the proper source, since the respective IP address is not in the routing table/LFIB. Additionally, in order to better understand the routing path and aid in troubleshooting, it is desirable to see the MPLS labels used for packet forwarding in the traceroute command output.
To resolve these issues, the classic traceroute implementation has been extended as detailed below (this is not the true MPLS traceroute yet!):
a) Traceroute utility acts like it has no idea about MPLS LSPs. In simple passes the IP/UDP datagram’s to the routing layer. The router performs the destination prefix lookup in LFIB, adding the MPLS tags as required. Note that the router copies the TTL value found in the packet into the topmost MPLS label inserted in the packet.
b) Every router in MPLS cloud switches the packet based on the topmost label. As soon as the TTL found in the label expires, the router punts the packet to the RP (route processor) and tries to generate an ICMP response message (“TTL expired”).
c) If the packet is tagged (“tunneled”) the router may have no idea about the original source IP. The router does not attempt to look the source in LFIB. Instead, the newly generated ICMP message has the [destination IP] = [source IP found in the original probe], and [source IP] = [IP address of the router interface] where the probe was received. Using the MPLS extensions for ICMP, the router adds the label stack found in the original probe in the ICMP response message After this, the router uses the label stack found in the original packet to perform label-switching operation on the new packet. The topmost label is swapped or popped, and the resulting lable stack is prepended to the ICMP response.
d) The above procedure switches the ICMP response downstream the original LSP, down to the PE router closest to the actual destination. This means that (in contrary to the regular traceroute operations) the ICMP response first travels away from the originating node. The reason behind this is that only the PE closest to the ultimate destination has complete understanding of the address space for the source IP found in the original probe packet.
e) As the ICMP reaches the final PE, the latter one performs layer 3 lookup (“aggregate” lookup) on its destination IP address. At this point, it forwards the ICMP response “back”, using the LSP targeted at the PE closest to the originating node.
d) If the probe had TTL large enough to reach the PE closest to the ultimate destination, then there is not need to take extra “path”, and the response will flow back immediately, since all routers behind the particular PE has understanding of the destination IP address found in the ICMP response.
Note that the above description is somewhat simplified, and is specifically tailored for use of traceroute in MPLS VPN scenarios.
Look at the Fig.2 below. Here SW1 tries to trace the route to SW2 (172.16.28.8). R1 is the PE for SW1, and it has VPNv4 label 20 for the prefix 172.16.28.0/24 and LDP label 10 for the Loopback0 IP address of R2 (which is the PE for SW2). At the same time, R2 bind prefix 172.16.17.0/24 to the label 40, and assigns R1’s Loopback0 label of 30.

a) When SW1 emits the first probe with TTL=1, R1 responds back with ICMP TTL expired.
b) When SW1 emits the second probe with TTL=2, R1 will encapsulate the packet with the label stack [10:20] and switch it to R3, with TTL=1 in the topmost label.
c) R3 will decrement TTL and drop the packet, generating a new ICMP TTL expired message, destined to the source IP found in the probe packet. It also includes the original label stack in the ICMP response. R3 will then look up the outgoing label for the LDP label 10, and find that it is implicit-null (PHP). After this, R3 encapsulates the ICMP message using the label 20 and switches it to R2. Note that this response ICMP message has “normal” TTL value.
d) R2 receives the packet with tag 20, and finds that it should perform aggregate lookup in the VRF associated with the respective VPN. The lookup tells the router to forward the packet back to R1 using the respective VPNv4 tag. The packet now has the new label stack [30:40] corresponding to the LDP label of R1 Loopback0 interface and VPNv4 label of 172.16.17.0/24.
c) R3 switches the “returning” packet back to R1, and R1 passes it down to SW1. The originator finally receives the response to the probe.
d) The originator of the traceroute operation (SW1) increments TTL again and send another probe out. This time, the probe reaches R2 before TTL expires, and now R2 turns the response back immediately, since it has the VRF table to lookup the source IP address in the probe packet.
Using the above-described traceroute generalization, we can safely run the traceroute command from any host or router, be it MPLS-aware or not. However, the price we pay for this is high. The following is the list of the “features” associated with this MPLS implementation.
a) The same issue as with the ping utility. If, for some reason, there is an LSP breaking (e.g. unlabeled outgoing interface), the router will attempt to forward the packet using regular IP routing. If this happens within the “same” level of MPLS hierarchy (i.e. routers may interpret to source/destination IP addresses in the packet), the packet will reach the final destination, and the response will flow back to source. We may still notice the breakage in the LSP, since at the some point the response packets will not carry MPLS information.
b) If we trace a route from C box to another C box across MPLS core, we cannot detect the LSP breaking point in the MPLS core. This is because the ICMP responses should first be forwarded down the original LSP, before returned back to the originator. If there is a breaking anywhere on the path, the traceroute within a particular VPN will stop at the nearest PE.
c) We cannot collect any detailed information about LSP breaking, since the router simply returns the ICMP “time-exceeded” message and does not perform any additional verification.
MPLS Ping & Traceroute
This operation does not use any ICMP echo/echo-reply packets and may only be initiated at a label switching router. You cannot run this operation off a regular host or router. The reason is that it tests connectivity downstream to a particular FEC, not a destination IP address.
When you specify a destination FEC for MPLS ping, it could be a VPNv4 prefix or global routing prefix that has label learned via LDP (right now, Cisco IOS supports only FECs learned via LDP or MPLE TE FECs). The router looks up the FEC in local LFIB and builds the label stack corresponding to the proper LSP. After this, it creates a special IP/UDP packet, with the source IP address set to the router’s own IP address, TTL set to 1, destination IP address within the reserved loopback IP range 127/8 and reserved destination UDP port 3503. The router also adds the special Router Alert IP Option in the IP packet header, which signals to “punt” the IP packet up to the RP. The TTL in the topmost label is set to 255, to ensure end-to-end connectivity testing. The new packet also carries rich payload information, allowing to directly request additional information from the router at the end of LSP (e.g. downstream mappings).
The special “xmas tree” combination in the IP packet header ensures that any router on the path will not attempt to route the packet using the regular IP routing once the label stack is stripped. No matter what, the packet could not be routed using classic IP routing procedure, based on the combination of the destination IP address, the TTL value and the Router Alert option. As the packet reaches the end of LSP, the terminal router will process it and generate a response.
Every router on the path switches the MPLS ping probe using the label-switching procedure. If, the some reason, the packet could not be label-switched (abnormal or normal LSP termination), the respective router will intercept the packet and return an error-response message. The response message will be destined to the originator of the MPLS ping operation, using the source IP address found in the original probe. The response message contains detailed information on the reasons of the LSP breaking.
MPLS traceroute operation builts upon using of MPLS pings with incrementing TTL values. Essentially, it probes every router on the LSP path, and looks like a sequence of MPLS ping operations.
Some Examples
Using our testbed topology, run the following command:
R2#traceroute mpls ipv4 10.1.1.1/32
Tracing MPLS Label Switched Path to 10.1.1.1/32, timeout is 2 seconds

Codes: '!' - success, 'Q' - request not sent, '.' - timeout,
  'L' - labeled output interface, 'B' - unlabeled output interface,
  'D' - DS Map mismatch, 'F' - no FEC mapping, 'f' - FEC mismatch,
  'M' - malformed request, 'm' - unsupported tlvs, 'N' - no label entry,
  'P' - no rx intf label prot, 'p' - premature termination of LSP,
  'R' - transit router, 'I' - unknown upstream index,
  'X' - unknown return code, 'x' - return code 0

Type escape sequence to abort.
  0 10.0.23.2 MRU 1500 [Labels: 32 Exp: 0]
I 1 10.0.23.3 MRU 1504 [Labels: implicit-null Exp: 0] 8 ms
! 2 10.0.13.1 20 ms
We can see the command reporting MRU (maximum receive unit) on every outgoing interface. This MRU includes the maximum side of IP packet including the label stack that could be forwarded out of the particular interface. The penultimate hop shows MRU of 1504, since the packets are essentially sent out as untagged.
The following command:
R2#traceroute mpls ipv4 10.1.1.1/32 force-explicit-null
Tracing MPLS Label Switched Path to 10.1.1.1/32, timeout is 2 seconds

Codes: '!' - success, 'Q' - request not sent, '.' - timeout,
  'L' - labeled output interface, 'B' - unlabeled output interface,
  'D' - DS Map mismatch, 'F' - no FEC mapping, 'f' - FEC mismatch,
  'M' - malformed request, 'm' - unsupported tlvs, 'N' - no label entry,
  'P' - no rx intf label prot, 'p' - premature termination of LSP,
  'R' - transit router, 'I' - unknown upstream index,
  'X' - unknown return code, 'x' - return code 0

Type escape sequence to abort.
  0 10.0.23.2 MRU 1500 [Labels: 32/explicit-null Exp: 0/0]
I 1 10.0.23.3 MRU 1504 [Labels: explicit-null Exp: 0] 8 ms
! 2 10.0.13.1 8 ms
Forces the originating router to insert explicit-null label on the bottom of the original stack. This could be really helpful to detect unlabeled interfaces on the penultimate hop, which may send packets as unlabeled. By forcing the explicit-null label we ensure that the packet is only forwarded out of a labeled interface, even if PHP strips all labels.
Now look at the particular ping command below:
R2#ping mpls ipv4 10.1.1.1/32 ttl 1 dsmap repeat 2
Sending 2, 100-byte MPLS Echos to 10.1.1.1/32,
     timeout is 2 seconds, send interval is 0 msec:

Codes: '!' - success, 'Q' - request not sent, '.' - timeout,
  'L' - labeled output interface, 'B' - unlabeled output interface,
  'D' - DS Map mismatch, 'F' - no FEC mapping, 'f' - FEC mismatch,
  'M' - malformed request, 'm' - unsupported tlvs, 'N' - no label entry,
  'P' - no rx intf label prot, 'p' - premature termination of LSP,
  'R' - transit router, 'I' - unknown upstream index,
  'X' - unknown return code, 'x' - return code 0

Type escape sequence to abort.
L
Echo Reply received from 10.0.23.3
  DSMAP 0, DS Router Addr 127.0.0.1, DS Intf Addr 0
    Depth Limit 0, MRU 1504 [Labels: implicit-null Exp: 0]
    Multipath Addresses:

L
Echo Reply received from 10.0.23.3
  DSMAP 0, DS Router Addr 127.0.0.1, DS Intf Addr 0
    Depth Limit 0, MRU 1504 [Labels: implicit-null Exp: 0]
    Multipath Addresses:

Success rate is 0 percent (0/2)
We see two “tricks” used with the ping command. First, we forcefully limit the TTL to 1. This means in our example ping packet will only reach R3. Second, we requested downstream mapping information from the terminal node. This means that the router will return the downstream labels used for the target FEC. This procedure allows you to selectively interrogate a particular router about its mapping table and find the potential point of LSP breaking.
Summary & Further Reading
In this post we demonstrated the limitations of classic ping/traceroute operations in MPLS network and described the functions of MPLS ping and traceroute operations in detecting of LSP breakings. Due to the blog format we did not demonstrate the use of MPLS ping/trace for L2 VPN testing or advanced scenarios like MPLS TE. You can find these in under the links provided below. Also we did not discuss the numerous compatibility issues for MPLS ping/traceroute thatarise due to mismatching implementations (different RFC draft revisions, etc).
The RFC for LSP Ping/Traceroute
The RFC for Router Alert Option
The tech-note on the classic traceroute used in MPLS environment
The MPLS Ping/Traceroute documentation at Cisco

By Petr Lapukhov, CCIE #16379

No comments:

Post a Comment