Networking teams are finding that traditional hub and spoke, best effort network designs do not provide optimal performance for Office 365. This is especially true for the real-time traffic of Skype and Teams, which stream large quantities of audio/video traffic. Secondarily, large file transfers to and especially from SharePoint Online can be adversely impacted when moving the service to the cloud.
Problems Big and Small
The network egress between the intranet and the Internet is the point of bandwidth congestion. Real-time and bulk data transfers are often throttled at the network edge, where internal gigabit speed connections are pinched down to skinnier Internet connections. Planning for this shift in bandwidth and ensuring ISP connections are adequate for the additional traffic is required. Microsoft provides a couple of app-specific calculators to assist in estimating the bandwidth needed.
In rare cases, non-optimal peering between ISPs have caused major quality problems, as traffic is routed through multiple hops before hitting Microsoft’s network. Running some performance tools can turn up such issues in advance of running production traffic.
The proper implementation of Quality of Service is more important than ever. Enabling recently assisted a client who was having issues with Skype for Business Online meeting quality. Participants would be randomly dropped, or would be unable to view shared screens on occasion. In a deep Layer 2-7 analysis, our engineer determined that real-time media traffic wasn’t being trusted end-to-end. Voice/video packets were entering the network core without Quality of Service (QoS) tags, and during congested situations, were dropped at a great enough frequency to disrupt reliable performance.
This example adds fuel to the fire that the status quo won’t work for optimal cloud performance. The cloud shifts the source and destination traffic, and exposes gaps in networks that are only optimized for premises-based, client/server traffic.
This article identifies two main options facing network operators: 1) redesign or 2) make incremental improvements, and the pros/cons of each.
Option 1: Network Redesign
The move to Office 365 and to SaaS in general has prompted network re-architecture in some cases. SD-WANs or Direct Internet Access technology provide remote sites with localized egress to the Internet, reducing the number of hops to access cloud resources. The advantages of performance come at the cost of capital and time in what really is an edge network reconstruction.
Microsoft complicated the discussion by introducing, then more or less retracting, ExpressRoute as an option. Most network engineers like ExpressRoute that as an option because it’s conceptually familiar - a dedicated connection into Microsoft’s high-speed network. But while ExpressRoute is still available for Azure, it’s only available for Office 365 under special circumstances. Performance of the customers public internet must be irreparably poor, and an exception must be approved by Microsoft engineering.
Microsoft’s engineering states that the complexity of ExpressRoute isn’t worth the nominal performance gains, especially when connectivity directly over the public Internet performs as well or better than when coming through a dedicated ExpressRoute MPLS pipe.
How could that possibly be, you might ask? At ignite 2017, Microsoft explained why. Because there are so many more Office 365 peering points to the Internet than there are peering points to MPLS Express Route providers, statistics showed that traffic is better off traveling over the Internet to a nearby Office 365 peering point, rather than over a longer haul to an ExpressRoute peering point. This is especially true in situations when the MPLS provider or Office 365 data centers have issues… its much simpler to reroute traffic to another 365 data center using Internet routing than to reroute to alternate ExpressRoute circuits. The goal is to reach the Microsoft backbone ASAP, after which traffic flows in a managed, high speed, low-loss network.
You can find information about Microsoft's peering points at https://www.peeringdb.com/asn/8075
You can read a somewhat dated but still relevant article about their Network at
and glance at some highlights in the graphic below.
Option 2: Network Optimization
Much of MSFT’s guidance assumes organizations can access the Internet directly from each remote site via software-defined networks, especially from small remote offices. If that model is not in the near-
term strategy, additional steps must be taken to optimize performance, as outlined in the “Incremental Optimization” section at the bottom of this document https://docs.microsoft.com/en-us/office365/enterprise/office-365-network-connectivity-principles#BKMK_IncOpt;
The fundamentals of Office 365 networking are at https://docs.microsoft.com/en-us/office365/enterprise/office-365-network-connectivity-principles?redirectSourcePath=%252fen-us%252farticle%252foffice-365-network-connectivity-principles-76e7f232-917a-4b13-8fe2-4f8dbccfe041#bkmk_principles
Microsoft and Enabling highly recommend managing the real-time traffic of Skype and Teams as tightly as possible in the intranet by employing Quality-of-Service protocols. The recommendations for setting diffserv code points or other options for Teams traffic are outlined at https://docs.microsoft.com/en-us/microsoftteams/qos-in-teams.
Finally, Microsoft Premier and Enabling both offer Network assessment services to assist in the optimization / validation of performance of traffic prior to a cutover to the cloud. Past assessments have pointed out optimizations (if not problems) with end-to-end / core QoS and new traffic patterns, which are especially noticeable with the real-time voice/video traffic of Teams/Skype.
 Watch from 59:45, at least https://channel9.msdn.com/Events/Ignite/Microsoft-Ignite-Orlando-2017/BRK3051 and https://myignite.techcommunity.microsoft.com/sessions/64276 up until from 53:00.