Here’s a common situation: you manage an application; you haven’t gone through a release; no patches have been applied (to your knowledge); and the infrastructure guys say that they haven’t touched the network in ages. But a user calls, and to your surprise, they’re reporting that a certain piece of functionality within your application is no longer working as of this morning. What do you do? Where do you start? What makes application management such a challenge is that there are tons of moving parts to this puzzle, and if any one piece malfunctions, you could have a major issue on your hands.
Despite the pressure, you keep your cool and immediately put your problem-solving hat on. You run through your basic application sanity checks, and they all pass. What next? Well, in my arsenal of troubleshooting tools, I find that the most useful of them all is tcpdump. After all, most applications follow the client/server model, and being able to visualize this communication often leads to the answer you’re looking for. If not, it can at least help you rule out certain suspects.
Before I started using tcpdump on a regular basis, I was pretty overwhelmed with all of its different options and how I just might be able to leverage the tool. If you’re a little bit tentative as well, read further, and I’ll break down for you its basic usage. I’ll also explain some common situations where it may come in handy.
In its most basic form, you run tcpdump on your server, and you simply specify the interface that you want to listen to:
tcpdump -ni eth0 not port 22
The -n says that we shouldn’t bother doing DNS lookups for all addresses. The “not port 22” is quite helpful as well, since we don’t want to be flooded with our own ssh session traffic. What you’ll immediately see is all of the traffic to and from your server. You’ll see others attempting to ping you, traffic to your database server, DNS lookups — perhaps too much traffic. The key is to isolate the traffic we’re interested in, and then make sure that traffic is as we expect it to be.
In order for us to isolate the traffic we want to see, we need to create filters. Similar to the example “not port 22” that we used above, we can build expressions that help us sift through all of the extraneous packets. Here is a quick cheat sheet for your reference:
- port 22 (show all packets where the source or destination port is 22)
- host 192.168.1.1 (show all packets where 192.168.1.1 is either the source or the destination)
- src port 22 / src host 192.168.1.1 (show all packets where only the source is either port 22 or the host 192.168.1.1)
- multicast (show all packets that are multicast packets)
- udp / tcp / icmp / arp (show all packets only of this type of protocol)
We can also build more complex expressions if needed:
- not src port 22 and multicast and udp
- host 192.168.1.1 and port 22
- ‘(host 192.168.1.1 and not dst port 22) and (not src port 55555)’
Once we’re able to build our expressions, we can easily see how tcpdump can be used to verify some facts in our troubleshooting process that we otherwise may not be able to know for sure. For instance, is our database server really communicating with our application server?; or is it sending a request and not getting a response? Likewise, is our http server getting a request from a specific client but not sending a response? Is my server getting flooded by traffic from a specific host?
Tcpdump can also be used to troubleshoot more advanced issues:
- Duplicate IP / ARP issue? A new server that comes online with the same IP address as your webserver would wreak immediate havoc. What are the symptoms of this? Well, with tcpdump we can look for ARP responses, tie them to a Mac address, and then in turn, we can trace the Mac address to a switchport to find the culprit.
- Packets tagged when they shouldn’t be? Is your network interface misconfigured with a VLAN tag when it shouldn’t be? Determine the traffic type for sure by using tcpdump to look at the ethernet header.
- Asymmetricly routed? In a multi-homed server, this is quite a common issue. Do we see responses coming back from an IP that isn’t our default gateway or that isn’t the intended interface?
- Rogue DHCP server? Do we see a DHCP response from more than one server?
- Application performance / timing: Using tcpdump, we can calculate round trip times using packet timestamps. This can be useful for benchmarking hardware or application latency.
- Extract payload messages: Using tcpdump, we can make correlations between actions that occur in our applications, versus what gets sent to the client across the network. For instance, when I click the “submit” button in my application, why is it sending the client X when it really should be sending Y?
- Compliance or forensics: tcpdump can be used on a machine hanging off of a switch span port for logging purposes.
I’ve only scratched the surface with what tcpdump is capable of doing. It’s a powerful tool, but ultimately, you do need to understand what you’re looking for in the stream of packets. What tcpdump provides is clear visibility into what otherwise might be a black box.
Do you have an interesting situation where tcpdump helped you get out of a rut? Let me know!
Sr. Systems Integrator