Troubleshooting Common Networking Problems with Wireshark, Pt. 3: MTU Problems


NOTE: You can now take course by the author with video and example traces on Wireshark. Check this post for more details.

Author’s Note: This is the third part in a six-part series about finding and solving many networking anomalies using the Wireshark network protocol analyzer. If you are new to the series, you can find part 1 here, and the whole series here.
Maximum Transmission Unit (MTU) problems are still one of the most common calls received in enterprise networking, and is a problem with several varieties. Luckily, if you have simultaneous traces, MTU problems are very easy to discover, regardless of the specific issue. The most common MTU related problems involve PMTU Discovery failure (MS 05-019, KB 898060) and Black Hole Routers.
For PMTU discovery, the problem system (which may be the client, server, or both, depending) fails to respond to ICMP Type 3 Code 4 (Destination Unreachable, Fragmentation Needed) messages after a certain period of time. Because of this, despite being told the MTU supported by the router, the system continues to send messages that exceed that limit. In traces, you can see this easily by using the following filters:
Icmp.type == 3 and ip.len >= 1400
Once applied, you should see something similar to the following if the system is experiencing this problem:

Frame 25718 (1514 bytes on wire, 1514 bytes captured)
Ethernet II, Src: 167.118.50.100 (02:bf:a7:76:32:64), Dst: 167.118.50.1 (00:00:0c:07:ac:0a)
    Destination: 167.118.50.1 (00:00:0c:07:ac:0a)
    Source: 167.118.50.100 (02:bf:a7:76:32:64)
    Type: IP (0x0800)
Internet Protocol, Src: 167.118.50.101 (167.118.50.101), Dst: 167.118.117.26 (167.118.117.26)
    Version: 4
    Header length: 20 bytes
    Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00)
    Total Length: 1500
    Identification: 0x95af (38319)
    Flags: 0x04 (Don't Fragment)
    Fragment offset: 0
    Time to live: 128
    Protocol: TCP (0x06)
    Header checksum: 0x6900 [correct]
    Source: 167.118.50.101 (167.118.50.101)
    Destination: 167.118.117.26 (167.118.117.26)
Transmission Control Protocol, Src Port: 1390 (1390), Dst Port: microsoft-ds (445), Seq: 138, Ack: 186, Len: 1460
NetBIOS Session Service
SMB (Server Message Block Protocol)
[Unreassembled Packet: SMB]


Frame 25720 (70 bytes on wire, 70 bytes captured)
Ethernet II, Src: Cisco_14:c3:3c (00:11:5d:14:c3:3c), Dst: 167.118.50.100 (02:bf:a7:76:32:64)
    Destination: 167.118.50.100 (02:bf:a7:76:32:64)
    Source: Cisco_14:c3:3c (00:11:5d:14:c3:3c)
    Type: IP (0x0800)
Internet Protocol, Src: 167.118.218.98 (167.118.218.98), Dst: 167.118.50.101 (167.118.50.101)
    Version: 4
    Header length: 20 bytes
    Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00)
    Total Length: 56
    Identification: 0xd530 (54576)
    Flags: 0x00
    Fragment offset: 0
    Time to live: 254
    Protocol: ICMP (0x01)
    Header checksum: 0x8bdf [correct]
    Source: 167.118.218.98 (167.118.218.98)
    Destination: 167.118.50.101 (167.118.50.101)
Internet Control Message Protocol
    Type: 3 (Destination unreachable)
    Code: 4 (Fragmentation needed)
    Checksum: 0x96fd [correct]
MTU of next hop: 1427
    Internet Protocol, Src: 167.118.50.101 (167.118.50.101), Dst: 167.118.117.26 (167.118.117.26)
        Version: 4
        Header length: 20 bytes
        Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00)
Total Length: 1500
        Identification: 0x95af (38319)
        Flags: 0x04 (Don't Fragment)
        Fragment offset: 0
        Time to live: 126
        Protocol: TCP (0x06)
        Header checksum: 0x6b00 [correct]
        Source: 167.118.50.101 (167.118.50.101)
        Destination: 167.118.117.26 (167.118.117.26)
    Transmission Control Protocol, Src Port: 1390 (1390), Dst Port: microsoft-ds (445)

Frame 25850 (1514 bytes on wire, 1514 bytes captured)
Ethernet II, Src: 167.118.50.100 (02:bf:a7:76:32:64), Dst: 167.118.50.1 (00:00:0c:07:ac:0a)
    Destination: 167.118.50.1 (00:00:0c:07:ac:0a)
    Source: 167.118.50.100 (02:bf:a7:76:32:64)
    Type: IP (0x0800)
Internet Protocol, Src: 167.118.50.101 (167.118.50.101), Dst: 167.118.117.26 (167.118.117.26)
    Version: 4
    Header length: 20 bytes
    Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00)
Total Length: 1500
    Identification: 0x95f0 (38384)
    Flags: 0x04 (Don't Fragment)
    Fragment offset: 0
    Time to live: 128
    Protocol: TCP (0x06)
    Header checksum: 0x68bf [correct]
    Source: 167.118.50.101 (167.118.50.101)
    Destination: 167.118.117.26 (167.118.117.26)
Transmission Control Protocol, Src Port: 1390 (1390), Dst Port: microsoft-ds (445), Seq: 138, Ack: 186, Len: 1460
NetBIOS Session Service
SMB (Server Message Block Protocol)
[Unreassembled Packet: SMB]

In the first frame, you can see the server sending a 1500 byte packet at the Network layer, with the DF flag set, to port 445 on the remote system. Next, in frame 25720, we see the router respond with an ICMP Type 3 Code 4 message, listing the MTU for the hop to be 1427. The router also definitively identifies the packet to which it is referring by listing the IP Identification flag on the original packet (0x95af), along with the source and destination addresses and ports. Finally, in frame 25850, we see the server respond on the same session with another 1500 byte packet, proving that the server is not responding correctly to the ICMP messages. For this situation, you will need to apply the latest tcpip.sys update (currently KB 913446).
For black hole routers, the problem can be slightly more difficult. Black hole routers, by definition, do not respond with ICMP Type 3 Code 4 messages to alert the sender to the MTU of the next hop, they simply silently discard any packets that have the DF bit set and don’t meet the MTU requirement. For this reason, what you are typically looking for with PMTU failures are retransmissions on the sending side and missing packets on the receiving side. Perhaps the easiest way to search for the black hole router in network traces is to use the following filter on both the sending and receiving traces:
ip.addr == and ip.addr == and ip.len >= 1472
Replace with the client’s IP address and with the server’s IP address.
When examining traces with this filter, make sure the traces were taken during the problem occurrence, and then apply the filter to both traces and compare them. If you have frames that appear in one trace and those same frames are missing from the other, first make sure that those frames are not missing due to slight timing differences in when the traces were started or stopped. For instance, if Frame 1 in your server trace is missing from the client trace, there is a good chance the server trace was started before the client trace. However, if you have frames before and after the missing frames that you can verify on both traces, then you have identified a very good black hole router candidate. Follow the steps in KB 159211 or 314825 to further diagnose and resolve the problem.

, , , ,

  1. No comments yet.
(will not be published)