External Technical Root Cause Analysis: Microsoft Azure Front Door Outage Impact on KnowBe4 Products
October 29, 2025
This report details the findings and mitigations related to a global Microsoft Azure Front Door (AFD) outage that occurred between 15:45 UTC on October 29 and 00:05 UTC on October 30, 2025.
The incident disrupted multiple Microsoft services — including Azure Active Directory, Azure Portal, Microsoft Entra ID, and Office 365 add-in infrastructure — which, in-turn, directly affected several KnowBe4 products that depend on these services for routing, authentication, and Office add-in initialization.
While user interfaces and administrative consoles experienced intermittent unavailability, all backend mail-processing and delivery systems within KnowBe4 remained operational and continued to function as designed.
The outage originated entirely within Microsoft’s Azure Front Door global network, following an inadvertent tenant-configuration deployment. The faulty configuration propagated inconsistently across AFD nodes worldwide, causing widespread routing errors and service timeouts until Microsoft rolled back to a last-known-good state.
Azure Front Door (AFD): Microsoft’s global, scalable content delivery network (CDN) and web application firewall (WAF) service used to route internet traffic securely and efficiently to Azure-hosted applications.
Azure Front Door Configuration: A set of routing and security rules that define how traffic flows through Azure Front Door to customer applications. A configuration error or invalid deployment can cause global routing disruptions.
Azure Active Directory / Microsoft Entra ID: Microsoft’s identity and access management service that provides user authentication, single sign-on, and authorization across Azure and Microsoft 365 applications.
Office Web Add-in: A lightweight web application that extends Microsoft Outlook, Excel, Word, and other Office products. KnowBe4’s add-ins use this framework to provide in-app phishing-reporting and email-classification functionality.
appsforoffice.microsoft.com: A Microsoft-controlled endpoint that serves the Office.js library, which is required to initialize all Office web add-ins. If unavailable, no add-in code can execute.
Office.js: The client-side JavaScript framework that provides APIs for Office web add-ins. It must load successfully from Microsoft’s servers before an add-in can function.
Last Known Good Configuration: The most recent stable configuration version used by Microsoft to restore service after an invalid deployment or system failure.
Web Application Firewall (WAF): A network security layer that filters and monitors HTTP(S) traffic between a web application and the Internet, protecting against common web exploits and attacks.
Mailflow: The process by which emails are received, analyzed, classified, and delivered within KnowBe4’s mail-security products. This component was unaffected during the Azure outage.
Magic Link: A secure, single-use authentication link that allows KnowBe4 administrators or new customers to access a portal or begin deployment without entering credentials.
Telemetry: Automated data collection used for monitoring system performance and availability, enabling detection of latency, errors, or external dependency issues.
Fail-Open: A system design approach in which, during an outage or dependency failure, limited functionality remains available instead of the service fully blocking operations.
At 15:45 UTC on October 29, 2025, Microsoft Azure Front Door (AFD) experienced a global service disruption caused by an inadvertent tenant configuration deployment that introduced an invalid state across the AFD network.
The invalid configuration caused a significant number of nodes to fail to load properly, resulting in widespread latencies, timeouts, and connection errors for both Microsoft services and customer applications.
As unhealthy nodes were dropped from service, traffic was automatically rerouted to the remaining healthy nodes, creating further imbalances and intermittent regional outages.
Microsoft responded by blocking all configuration changes, deploying a “last known good” configuration, and manually recovering the affected nodes.
Full mitigation was confirmed by Microsoft at 00:05 UTC on October 30, 2025, following the gradual rebalancing of global traffic.
During the outage window, dependent services such as Microsoft Entra ID, Office 365 add-in frameworks (appsforoffice.microsoft.com), and the Azure Portal were also impacted. These dependencies are critical to KnowBe4’s authentication, routing, and Office-based integrations, and their unavailability directly contributed to product-level interruptions.
The Azure Front Door outage was caused by an inadvertent tenant configuration change within the AFD internals that introduced an invalid state to AFD nodes globally.
A software defect in Microsoft’s deployment validation process allowed the faulty configuration to bypass safety checks and propagate to production, causing nodes to fail to load properly and drop from the network.
As nodes failed, traffic was rerouted to remaining healthy regions, overloading them and amplifying global latency and timeout rates. Microsoft blocked further configuration changes, rolled back to the previous known-good state, and manually rebalanced traffic to restore stability.
Because some of KnowBe4’s services depend on Azure Front Door for routing and Microsoft’s Office 365 endpoints (e.g., appsforoffice.microsoft.com) for add-in initialization, no failover mechanism could circumvent this Microsoft-controlled dependency during the event.
Impact systems globally using these foundational services and followed Microsoft’s global impact window (15:45 UTC, Oct 29 – 00:05 UTC, Oct 30):
Prevent & Protect
Outlook web add-ins failed to load; email send operations hung with error messages. Gateway Moderation unaffected.
Defend
Admin console login unavailable due to Front Door authentication dependency; email processing unaffected.
PAB – Hybrid & MSR
Add-in failed to initialize via appsforoffice.microsoft.com; Gmail extension unaffected.
KSC
Reporting UI and login unavailable; no mailflow impact.
Webforms
External forms temporarily inaccessible as certificates and routing were managed via Front Door.
KnowBe4 Deployment Center
Unable to request new magic links or access onboarding UI; some in-progress deployments failed but were recoverable after service restoration.
Workspace
No confirmed customer impact; active sessions remained stable.
No customer data loss occurred**,** and normal operation resumed once Microsoft completed its rollback.
Oct 29 15:45
Microsoft Azure Front Door Impact Begins
Global service disruption begins after an inadvertent tenant configuration change within Azure Front Door (AFD). Customers and Microsoft services begin to experience latency, timeouts, and connectivity errors.
16:04
Microsoft Investigation Initiated
Azure monitoring systems trigger alerts. Microsoft engineers begin reviewing recent AFD configuration changes.
16:15
Root-Cause Isolation Begins
Microsoft identifies the likely source as a tenant configuration deployment that entered an invalid state across global nodes.
16:18
First Public Communication Posted by Microsoft
Initial Microsoft status update published to Azure Status page. Internal KnowBe4 monitoring detects elevated timeouts across Office web add-ins and administrative consoles.
16:20
Targeted Notifications Issued
Microsoft sends targeted impact notifications through Azure Service Health. KnowBe4 teams begin correlation of customer reports with AFD dependency failures.
16:22
KnowBe4 Publishes StatusPage Alerts Acknowledging Outage
“We are investigating elevated error rates and loading issues across multiple products. Initial investigations point to a widespread Azure incident. More info will be added after our engineering teams are able to assess the scope of the issues.” (Status Page)
17:26
Azure Portal Fails Away from Front Door
Microsoft routes Azure Portal traffic off AFD as part of mitigation. KnowBe4 observes continuing errors across products using AFD endpoints.
17:30
Configuration Changes Blocked Globally
Microsoft freezes all customer and internal configuration updates to stop further propagation of the faulty state.
17:40
‘Last Known Good’ Configuration Deployed
Microsoft initiates rollout of the most recent validated configuration across AFD infrastructure. KnowBe4 confirms stabilization in limited regions.
18:30
Global Configuration Push Begins
Fixed configuration distributed worldwide. Gradual traffic recovery observed. KnowBe4 services begin to auto-recover region by region.
18:45
Manual Node Recovery and Rebalancing
Microsoft engineers start manually recovering affected AFD nodes and gradually restore routing to healthy nodes.
19:00
KnowBe4 Internal Status Update
Engineering publishes initial internal assessment: backend mailflow unaffected, user-facing add-ins and consoles unavailable.
20:30
Partial Recovery Confirmed
Microsoft reports substantial improvement in latency and error rates. KnowBe4 verifies restored access for portions of US and EU customer traffic.
23:15
Downstream Dependencies Stabilize
Microsoft confirms mitigation for PowerApps and related services. KnowBe4 validates near-normal operation across most regions.
Oct 30 00:05
Global Mitigation Complete
Microsoft confirms full restoration of Azure Front Door and Office 365 add-in endpoints. KnowBe4 services fully recovered with no remaining customer impact.
Finding: All impacted KnowBe4 applications use Azure Front Door for secure global routing, TLS termination, and authentication relay.Mitigations:
Finding: Office web add-ins are hard-coded to initialize through appsforoffice.microsoft.com before executing customer code.Mitigations:
Finding: The outage was initiated within Microsoft’s control plane and communicated via Azure Status and Service Health dashboards.Mitigations:
Between 15:45 UTC on October 29 and 00:05 UTC on October 30, 2025, customers experienced periodic errors, timeouts, and login failures across Microsoft-integrated KnowBe4 interfaces.
Administrative consoles, add-ins, and forms were unavailable for segments of this window, depending on regional propagation of the faulty AFD configuration.
Backend systems—including Prevent, Protect, Defend, and PhishER pipelines—continued to deliver mail security and training events without interruption.
Following Microsoft’s rollback and global node rebalancing, service availability gradually returned to normal with no customer action required.
To accelerate detection and correlation of third-party dependency failures, new synthetic monitoring tests have been added to continuously validate access to appsforoffice.microsoft.com, Azure Front Door endpoints, and related Microsoft authentication paths.
Although Office web add-ins cannot currently bypass Microsoft’s initialization endpoint, KnowBe4 is:
KnowBe4 has updated its incident-response and communication templates to provide faster, clearer updates when external infrastructure failures occur.
This incident was caused by a Microsoft-initiated tenant configuration error within Azure Front Door, which propagated globally and temporarily disrupted routing and authentication services for Microsoft and dependent applications.
KnowBe4’s platform maintained operational integrity and security throughout the event; however, user-facing Microsoft-dependent interfaces were interrupted until Azure Front Door was fully restored.
KnowBe4 continues to work with Microsoft to understand root-cause remediation and to evaluate resilience enhancements for critical dependencies. We remain committed to transparency, reliability, and continuous improvement of our service availability for all customers.