So, this is a problem that I’ve spent a day or so working on at work, and it goes like this…
A router (Linux, running ZeroShell) is configured with three interfaces, as shown in the following diagram:
In this case, the network has a configuration problem in that PC1 is configured with a default gateway of 192.168.3.1, which is obviously on the wrong network. Since PC1 and 192.168.3.1 are separated by a router, so an ARP broadcast from PC1 can’t reach that address. So, in order for PC1 to reach the server, logic (and good mental health of all involved) dictates that PC1 needs a default gateway address on the same subnet, meaning it must be in the range of 192.168.1.0 – 192.168.1.254. And this was the problem – In our test environment, PC1 could ping the server just peachy, despite having the wrong gateway.
You see, this configuration is just plain wrong. It violates one of the cardinal rules of TCP/IP networking – ‘Though shalt not require a default gateway to get to thine default gateway.’
It should never work…but it did.
So, to figure out what exactly was going on here, I turned to my trusty friend, Wireshark. Looking in the traces, I saw something interesting and a little confusing. I saw the PC sending out an ARP request, it being answered by the router, and then the PC sending the unicast to the server. On first glance, this looked an awful lot like Proxy ARP, which is a huge problem in some networks due to Cisco’s insistence on enabling it by default. So without looking into it further, I ran with that.
NOTE: You can now take course by the author with video and example traces on Wireshark. Check this post for more details.
Digging into it, I found the following article, which explains how to set Proxy ARP on a Linux box. Unfortunately, ‘cat /proc/sys/net/ipv4/conf/all/proxy_arp’ showed that Proxy ARP was disabled. So, before I dug further in to why the system was proxying ARP requests, I took another close look at the trace and found something that I’ve never seen anywhere else: The router was answering the ARP request for the IP bound to a remote interface, using the MAC of the local interface!
Let me expand on this a little further; Here’s the basic sequence of what happened:
1. PC1 sends an ARP request for 192.168.3.1
2. The router hears the request on the 192.168.2.0 interface, and instead of dropping it like every other TCP/IP implementation on Earth, decides to process it.
3. The router responds back to the ARP request for 192.168.3.1 on the 192.168.2.0 network. Keep in mind that the 192.168.3.1 address is on a different interface. When responding, the router uses with the MAC address of the interface on the 192.168.2.0 network.
4. The PC hears the ARP reply, caches the ARP entry, and sends the unicast to it’s default gateway (which lives on another network, mind you) without batting an eye.
Now, this still looks a whole hell of a lot like proxy ARP, but the use case is different. With proxy ARP, the PC would usually have a mask that encompassed all three networks (such as 192.168.0.0/16), and would have simply ARPed for the endpoint (the server). The router would have received the ARP request, realized it was connected to the destination network, and responded back to the PC with its own MAC address. The PC would have sent to the router, and the router would have sent to the server.
In this case, though, the PC had the correct mask, just had an incorrect default gateway. So there’s actually two questions that need answering here:
1. Why in the hell did the PC ARP for the router’s IP when it knew the router was on a different subnet?
2. Why in the hell is the router performing roughly the same thing as a proxy ARP when proxy ARP is disabled.
For number 1, I don’t really have a solid answer because I have no insight into the stack code in Windows, but my guess is it’s a function of trying to make sure Proxy ARP can be used in common misconfiguration scenarios. After all, the only time you really need Proxy ARP is when you don’t really know what you are doing and you want your NOS to automagically fix your ignorant config.
NOTE: Yes, I know, not everyone understands how to set up a network. I’m not passing judgement, just stating fact. If you configure your network correctly, Proxy ARP is not only unnecessary, it’s also a huge pain in the ass.
As for number 2, well, it’s a feature. No, really. And I quote:
“Mr. Bloemsaat included a patch which restricts ARP responses to the interface actually implementing the requested address. But, over almost a month of discussion, the networking hackers have made it clear that they do not intend to change the way Linux behaves. Their reasoning follows, more or less, these lines:
- Blocking ARP responses in this way is putting filtering decisions at the wrong layer of the networking code. This sort of action belongs at the netfilter level, rather than down at the device level.
- Linux’s approach to ARP responses is fully compliant with all applicable RFCs.
- In some situations, responding out of all interfaces is the only way to successfully get communication established. “
My God, how I hate Linux. This is so arrogant; everyone in the world (except you) makes IP’s interface-bound things. In fact, there are very good technical networking reasons for doing this. But, because it isn’t explicitly stated in the RFC, the high and mighty Linux Gods say that everyone else is wrong, and can go get bent. Here’s a tip guys: Networking OS’s are supposed to inter-operate with stuff. That’s what they do. And interoperability generally means conformance to how everyone else handles the situation.
Sigh.
Anyhow, the crux of the situation is, in Linux, if you want the OS to operate like every other device with an IP in this situation, you need to enable ARP filtering on all of your interfaces. And for help with that, ladies and gentlemen, I present to you the gem of an article below:
Two Subnetworks on One LAN, and Linux arp_filter | Robert LaThanh
Happy networking.
Recent Comments