Interpreting the Tool: What Does Poor Network Performance Look Like?
A hammer is a fantastic tool with a million uses. A skilled builder can use it to frame a house, put up a roof, or repair a wall. But the hammer itself doesn’t have skill – that depends on the person using it.
In a similar way, network analysis tools have a million uses. They can help get to the root cause of nagging application performance problems and network problems. But despite expert systems, performance alarms, and problem flags, sometimes the most user-friendly tool is only as good as the person interpreting it.
For this reason, we are going to have a series of articles that focus on how to interpret what the tool is saying when it displays data. We will dig into questions like these:
- What are these charts or statistics telling me?
- How can I tell if the network really is the problem?
- What is the next step in resolving the issue?
In this article, we will look at how a network performance problem is displayed on the TruView and what can be done as the next step.
What Does Poor Network Performance Look Like?
Let’s assume that users have been complaining about poor performance for a business application and that the network is the true root cause. We know that users are going to blame the network regardless of what the problem is, but in this case, let’s assume that the network was the issue. Common things that cause performance to suffer on the network are packet drops, congestion, and path problems. These can reveal themselves as TCP retransmissions and out-of-order problems.
In the TruView, first select the search bar on the top right and enter the name of the application in question. Next, select the site where the users are complaining from the bottom of the page, this will set a filter for the application and site.
In the End-User Response Time Breakdown, you will see a stacked graph that shows Application Response Time (ART), Data Transfer Time (DTT), and NRT (Network Roundtrip Time). With most problems that are truly rooted in the network, the DTT or NRT will be shown as the largest contributor to the total time. In this screen, we see that DTT is the largest.
This means that sending a transaction from the server to the client is taking longer during this spike. This can be due to packet loss, congestion, or interface drops on network interfaces. This also can be due to TCP Window problems, but we will address that in depth in another article. To determine which is to blame, select Application Performance | トレンド | TCP from the dropdown menu. This screen will display any retransmissions, zero windows, or out-of-order events there were in the transaction flow.
In this case, we see that TCP retransmissions directly coincide with the application delay. When we see retransmissions, but the application continues to work, this means that there is packet loss on the network. This is often due to congestion and ethernet errors, and can also be caused by misconfigured MTU or MSS settings on the infrastructure.
To validate where the loss is occurring, the OptiView XG can be used to run a Graphical Path Analysis, which will pinpoint the exact link that is reporting errors. This feature can also be run directly from the TruView, which will pull the data from the XG that is installed on the network.
This display will clearly show which link is causing the packet loss, which can then be addressed directly on the server or switcH.
When you see high DTT on the TruView, watch for TCP problems and capacity issues that can reveal the problem.
The clear response time metrics on the TruView are easy to read and interpret, guiding users to the root cause of the performance issue.
For more information on OptiView XG and TruView, click here.