O365 Network Performance #3

This blog post follows on from the one I wrote a week ago on TCP best practices for Office 365 and  will focus on capturing and analysing a TCP connection to Office 365 and comparing the information available in the packet capture to best practice for TCP discussed in the previous blog post.

 

Summary of TCP Best Practices

Best PracticeValue
Target Bandwidth Per UserMin 10Mbps (ignoring concurrency etc which I’ll cover another time)
Target RTT Per UserMax 50ms
TCP Window Size64kB
TCP Window Scaling Factor4 or 8
TCP MSS1460 unless there is a legitimate reason for it being lower (e.g. Cisco CAPWAP in use)

Analysing a packet flow – MSS, Initial RTT, Initial Window Size & Scaling Factor

For my example I set a packet capture running in Wireshark and then copied a file from my desktop to O365 Onedrive. This effectively caused OneDrive to upload the file to OneDrive in the cloud and allowed me to see what happened.

During the TCP connection my IP address was 192.168.1.142, whilst Office 365 OneDrive was using 13.107.136.9. I filtered the connection to only show traffic between these endpoints using the Wireshark filter “ip.addr == 13.107.136.9”. This leaves me with a total of 4557 packets in total.

As you would expect the first three packets in my connection are a SYN, a SYN-ACK, and an ACK:

Wireshark Trace of Onedrive Upload - First Few Packets

Looking at the first of these packets we can identify the proposed MSS, window size, and scaling factor from 192.168.1.142:

Wireshark Trace of Onedrive Upload - Detail on first packet
  • Window Size: 64240

  • MSS: 1460 bytes

  • Window Scale: 8 (multiply by 256)

Looking at the second of these packets we can identify the proposed MSS, window size, and scaling factor from 13.107.136.9:

Wireshark Trace of Onedrive Upload - Second Packet
  • Window Size: 65535

  • MSS: 1440 bytes

  • Window Scale: 8 (multiply by 256)

We also should note that both ends have in the TCP options the “SACK permitted” option set – this means both ends support TCP selective acknowledgements for packet loss.

We also see here the initial Round Trip Time (iRTT) for the first SYN – SYN-ACK sequence – its 0.0388 seconds or 38ms.

So far we have identified that the Window Size, Window Scaling and MSS is acceptable in both directions. We have also identified that the initial Round Trip time is acceptable.

What about Packet Loss?

You can easily see how many Duplicate ACK packets Wireshark believes there to be by using the “Analyse -> Expert Information” option which provides this screen:

Wireshark Trace of Onedrive Upload - Expert Information

This shows there are 469 duplicate ACK packets. This would suggest that the packet loss is arrived at from the following calculation:

469 Duplicate ACK Packets * 100 / 4468 packets in total connection stream = 10.49%

But do they really represent 469 lost packets? By selecting the “>” its possible to see the packet summary details:

Wireshark Trace of Onedrive Upload - Expert Information - Expand Duplicate ACKs

This shows that the first 63 TCP Duplicate ACK packets are duplicate ACK for packet 449 – this suggests that they might actually be selective ACK packets. You can see this in the packet detail in Wireshark – for packet 450 for example:

Wireshark Trace of Onedrive Upload - A Duplicate ACK for Packet 449

The TCP SACK option can specify multiple received sections – the “left edge” and “right edge” show what has been received. So the expected packet was 212160 but the packets received had starting offset of 233760, and an ending offset of 235200, so some packets have been lost.

Looking at the ACK packet (Packet 449) you can see it is an ACK to packet 264 which has sequence number 210720, meaning next sequence number expected is 212160 (i.e. 210720 add the MSS of 1440).

But there are 63 TCP Duplicate ACKs for packet 449 – so how many packets have actually been lost?

The MSS 1440 bytes, so we can calculate how many packets are lost between the packet acknowledged, and this acknowledgement based on the sequence number and the SACK left edge.

  • 233760 – 210720 = 23040 bytes

  • 23040 / 1440 = 16 packets

These packets are resent as packets 461 through 477. But this doesn’t deal with the selective ACK lost packets. The same selective ack information can appear in multiple duplicate ACK packets with different ACK sequence numbers – meaning that counting duplicate ACKs doesn’t help – each one could refer to a single or many lost packets and could even refer to the same lost packets.

In fact counting the number of packets containing a Selective Acknowledgement doesn’t help either, for example consider the SACK information in packet 456 below:

Wireshark Trace of Onedrive Upload - SACK information in Packet 456

This has duplicate ACK number 212160 same as our earlier packet.

Lost packets here then are:

  • 212160 – 233760 / 1440 + 1 = 16 packets lost

  • 251040 – 235200 / 1440 = 11 packets lost

  • 259680 – 252480 / 1440 = 5 packet lost

Now consider SACK information in packet 457 below:

Wireshark Trace of Onedrive Upload - SACK information in packet 457 showing that same information repeats in multiple packets

Again this has duplicate ACK number 212160 as before.

Lost packets here then are:

  • 212160 – 233760 / 1440 + 1 = 16 packets lost

  • 251040 – 235200 / 1440 = 11 packets lost

  • 259680 – 252480 / 1440 = 5 packet lost

All that has changed is the right edge of the first SACK has incremented by 1440 bytes – which simply tells us an additional packet has been received.

There has to be a better way to determine packet loss then…

When a TCP connection has lost packets and duplicate ACKs / selective ACKs are in use the response of the sender is to resend the packets – these are shown by wireshark as “TCP Fast Retransmission” followed by “TCP Retransmission”, or “TCP Fast Retransmission” followed by a number of “TCP Out-Of-Order” – for example:

Wireshark Trace of Onedrive Upload - Some TCP Fast Retransmission and TCP Retransmission Packets

and

Wireshark Trace of Onedrive Upload - Some TCP Out-Of-Order Packets

So, a better method would be to inspect how many retransmissions, fast retransmissions and out of order packets exist in the life of the connection – looking at the expert information there are:

  • Out of Order Packets: 98

  • Retransmissions: 37

  • Fast Retransmissions: 6

TOTAL: 141

Potential Packet Loss Rate: 141 packets * 100 / 4468 packets = 3.15% packet loss

TCP Round Trip Time

Thankfully Wireshark has a stream graph for TCP Round Trip Time that makes it much easier to see how this varies over time – this is found in the “Statistics -> TCP Stream Graphs -> Round Trip Time” and shows the statistics for both directions – since my capture is for an upload I have chosen from 192.168.1.142 to 13.107.136.9:

Wireshark Trace of Onedrive Upload - Round Trip Time Graph

This shows that for most of the time during my connection the round trip time stayed well below 100ms but peaked at 700ms a few times during the connection.

It is over my proposed target of 50ms however – but I’m working from home just using broadband so I would expect it to be somewhat slower.

This graph doesn’t allow me to see percentile results though which would most likely remove the few data points over 100ms. If I wanted to verify this I could extract the RTT values to Excel and do the calculation – I haven’t bothered to do this just now.

TCP Throughput

Again, Wireshark has a stream graph for TCP Throughput that makes it much easier to see how this varies over time – this is found in the “Statistics -> TCP Stream Graphs -> Throughput” and shows the statistics for both directions – since my capture is for an upload I have chosen from 192.168.1.142 to 13.107.136.9:

Wireshark Trace of Onedrive Upload -Throughput Graph

Final Thoughts

I did this once for a single file uploaded to Onedrive – to properly test this you would need to repeat this test several times for uploads, and several times for downloads. Not just for Onedrive but also for Sharepoint and Teams. You would also need to test this for each of a range of indicative sample sites on your WAN network.

I also suspect that there maybe other ways of achieving the same using tooling – for example the Microsoft tool here is quite good. Its only a PoC tool but it does allow quite extensive testing of your O365 configuration and highlights things that need fixing. If you open it in Edge and perform the advanced tests (you have to run a download) then it performs a lot of relevant tests – some 470 in fact.

Want to know more?

Why not subscribe to our FREE Newsletter to receive regular updates from us on ICT, technology and what we’ve been doing?