O365 Network Performance #1

What in general needs to be considered?

When deploying Office 365 its important to make sure that the network between the user and Office 365 is up to the job. To do this there are a number of challenges that need resolving.

  • Is the latency between the user and office 365 low enough

    • Latency from user to internet egress

    • Latency from internet egress to Office 365

  • Is the jitter between the user and Skype for Business low enough

    • Jitter between user and internet egress

    • Jitter between internet egress and O365

  • Is there enough bandwidth between the user and O365 to cater for all users

    • Bandwidth between user and internet egress

    • Bandwidth between internet egress and O365

  • Is TCP optimally configured on the network – is the MSS, window size, scaling factor, negotiation etc all set optimally? How will latency likely affect TCP?

  • Does the network correctly prioritize voice traffic over video traffic, video traffic over (say) sharepoint traffic

  • Does the client device correctly prioritize and mark the traffic it sends

  • Does DNS resolve rapidly enough and to the correct Office 365 access point (or service frontdoor as Microsoft call it)

  • Is there a forward proxy or internet filter between the user and O365 – and is it correctly configured?

  • Is the firewall configured correctly to allow access to O365?

Picture of a Network from the user on-premise through the Office 365 on the internet.

There are a number of test tools that can be run to answer the above questions – and there are a number of design principles that should be followed when preparing for Office 365. I’m going to discuss some of these over the next few blog posts.

Domain Name Service – Best Practices

We will in this first post start by considering the Domain Name Service (DNS).

DNS Query Returns Local IPs

When a service – such as Office 365 – is delivered by a content delivery network  its possible for the service to have multiple routes to the content via a range of different locations across the world. So for example – I could access my Office 365 tenancy via the Microsoft “Front Door” in South America rather than my local one within the UK.

The DNS resolution for the O365 exchange service, sharepoint, skype for business etc will define where geographically you access that particular O365 service. If not near to where you live then this service will suffer a higher latency that if it was accessed locally.

But what could cause sub optimal responses? When a client queries its DNS server and the local server does not have a cached answer it passes its query onwards to its configured DNS server which will send a reply, if this reply points to the local front door for the O365 service then all is well, but if the upstream DNS server is itself poorly configured and returns a non-geographically tied answer then you might want to consider changing your DNS configuration.

DNS Response Times

Microsoft don’t provide on the internet any guidance around what they would expect good DNS response time to be – however the GRC DNS benchmark tool can be used to compare the query response time of the configured DNS servers with other internet based open DNS servers. A local server using caching should be able to provide a better response time for cached queries than querying to the internet. For cached results, where the server is at the same site as the user, 2-5ms should be achievable. For results over a WAN, where the server is centralized rather than local, a result of less that 25ms should be possible. But it rather depends on the latency across the WAN – for this reason MS would rather you have a local caching DNS server co-located with the user.

Thus best practice for optimising DNS response times in a WAN environment would be to avoid centralized DNS servers and to use DNS servers on each site on the WAN with local caching as this minimizes the effect of latency on the WAN.

DNS Packet Loss

DNS uses UDP as a first choice and if no response is received for UDP then its fails over to a TCP connection. However if DNS experiences packet loss then this slows DNS resolution down and consequently introduces significant delay to the delivery of the service to the user. Consequently, if the path to the client’s configured DNS server is experiencing loss or the DNS server is overloaded and dropping some queries then this will seriously affect the user experience. A packet loss for DNS of even 1% is significant – i.e. if 1000 users all make 1000 DNS queries each day then out of the million requests to the DNS server ten thousand will be dropped, all users will be affected and complain…

How to test DNS performance

Test 1: Geo DNS Localization

To test whether a geographically “local” front door to Office 365 is returned when a DNS query is completed for O365 just run a nslookup against one of the Office 365 DNS records and see what the result is:

C:\Users\Andrew Horler>nslookup outlook.office365.com
Server:  UnKnown

Non-authoritative answer:
Name:    LHR-efz.ms-acdc.office.com
Addresses:  2603:1026:500:3e::2

Aliases:  outlook.office365.com
C:\Users\Andrew Horler>

 I can’t find any authoritive MS provided definition of what LHR-efz.ms-acdc.office.com means and why its called this but an internet search for the IPs shown above locates them as being located in the UK and thus local to me.

You will need to do this for outlook, sharepoint, skype etc. all the services that you are interested in within O365.

Test 2: Response Times & Packet Loss

In general I would recommend using GRC’s DNS Benchmark Tool written by Steve Gibson of Security Now! This tools shows how well your configured DNS servers handle queries and gives you a reliability rating which indicates whether the DNS server is dropping packets. For example when run on my home network it shows:

Picture of the GRC DNS Benchmark Tool showing results from a test.

Test 3: Response Times & Packet Loss for Specific DNS Records

DNS Benchmark is great but it doesn’t allow you to run a query for the same specific DNS record repeatedly and record the response time and packet loss. A script like the below is required for that:

#List of Variables

#Create Array List

#Create Array List Item

#Initialize ArrayList
$dnsresponsearray = New-Object Collections.ArrayList 

#Defining Variable Initial Values
$countiterations = 0
$date = $(Get-date -format dd-MM-yyyy_HH-mm)
$startdatetime = $(Get-date -format dd-MM-yyyy_HH-mm-ss)
$numberofqueries = 20

#Loop To Collect all data

do {

$timetaken = measure-command {$dnsresponse= resolve-dnsname portal.office.com -server -type A -dnsonly | Select-object -ExpandProperty IP4Address -ErrorAction SilentlyContinue} | select-object -ExpandProperty TotalMilliseconds -ErrorAction SilentlyContinue

#Initialize Array Item
$arrayitem = New-object PSobject

#Specify Array Item Values
$arrayitem | Add-Member -type NoteProperty -Name 'DNSResponse' -value $dnsresponse
$arrayitem | Add-member -type NoteProperty -name 'DNSResponseTime_in_ms' -value $timetaken

$dnsresponsearray += $arrayitem

#Clear Array Item Values for reuse
$arrayitem =$null


#wait 30 seconds between queries
#start-sleep -s 30

} until($countiterations -eq $numberofqueries)

$enddatetime = $(Get-date -format dd-MM-yyyy_HH-mm-ss)
$dnsresponsearray | Export-Csv -Encoding Unicode $("C:\JSON-Scripts\dns-queries_" + $date + ".txt")
Add-Content $("C:\JSON-Scripts\dns-queries_" + $date + ".txt") $("_______________________________________________________")
Add-Content $("C:\JSON-Scripts\dns-queries_" + $date + ".txt") $("File started at " + $startdatetime)
Add-Content $("C:\JSON-Scripts\dns-queries_" + $date + ".txt") $("Number of queries: " + $numberofqueries)
Add-Content $("C:\JSON-Scripts\dns-queries_" + $date + ".txt") $("Number of successes: " + $dnsresponsearray.count)
Add-Content $("C:\JSON-Scripts\dns-queries_" + $date + ".txt") $("File ended at " + $enddatetime)

Thanks to Kieran Jacobsen’s blog post on how to get powershell to highlight properly on Squarespace…

The script isn’t perfect – it works for short form DNS responses containing a single item called “IP4Address” but not all DNS servers respond in this manner. The below command is the main element:

$timetaken = measure-command {$dnsresponse= resolve-dnsname portal.office.com -server -type A -dnsonly | Select-object -ExpandProperty IP4Address -ErrorAction SilentlyContinue} | select-object -ExpandProperty TotalMilliseconds -ErrorAction SilentlyContinue

“resolve-dnsname” is the powershell equivalent of the command line “nslookup” command, and measure-command monitors how long this takes to execute. Note that the “-dnsonly” switch ensures the results come from a DNS query not the local cache – to test the cache only use “-cacheonly”. The results from these two commands (filtered) are stored in a powershell array list. At the end of the script after the loop has run the “export-csv” command and “add-content” commands output the results into a file.

Next Time

In the next blog post I think I will probably cover TCP, latency and jitter by examining a packet capture from uploading / downloading a file to OneDrive.

Want to know more?

Why not subscribe to our FREE Newsletter to receive regular updates from us on ICT, technology and what we’ve been doing?