Cyber Security
Microsoft Suffers Global Outage Again, Was Down For 8 Hours
By TechDogs Bureau
Updated on Thu, Aug 1, 2024
On July 19, 2024, Microsoft found itself accused of bringing the world to a standstill, in what soon got dubbed “the largest IT outage of all time”.
However, soon after it was revealed that the primary perpetrator was a problematic file in a faulty update by cybersecurity expert CrowdStrike.
Yet, the outage affected over 8.5 million devices with Microsoft’s “blue screen of death” and estimates put the total direct financial loss for just US Fortune 500 companies at $5.4 billion.
The outage disrupted businesses, airports, train stations, banks, telecom providers, media outlets, healthcare and more across the world.
With over 2,200 flights canceled and around 7,000 delayed, travelers were left stranded, emergency services were halted and a large number of business operations were affected.
However, on the same day, Microsoft witnessed a major Microsoft 365 outage affecting customers across the Central US region due to an Azure configuration change.
Now, Microsoft finds itself at the center of another global IT outage.
So, what caused this global outage and how did Microsoft respond? Let’s explore!
What Caused Microsoft’s Global Outage?
-
According to a Post Incident Review (PIR) report published on Microsoft Azure’s website, the company offered clarity on its latest service outage incident.
-
Through a Mitigation Statement of the incident, Microsoft revealed that a few of its customers may have had issues connecting with its services globally.
-
Impacted services included Azure App Services, Application Insights, Azure IoT Central, Azure Log Search Alerts, Azure Policy along with the Azure portal itself and some Microsoft 365 apps and Microsoft Purview services.
-
The outage was initially caused by a DDoS (Distributed Denial-of-Service) attack.
-
DDoS attacks aim to overwhelm a business’s network servers by flooding them with excessive requests and consuming its capacity, to the point where it can’t respond to legitimate user requests.
-
While Microsoft’s DDoS protection mechanisms were successfully activated, an error in the implementation and configuration of its defense systems caused the attack's impact to be amplified rather than mitigating it.
-
“An unexpected usage spike resulted in Azure Front Door (AFD) and Azure Content Delivery Network (CDN) components performing below acceptable thresholds, leading to intermittent errors, timeout and latency spikes,” read an excerpt from its Mitigation Statement (Tracking ID: KTY1-HW8).
-
This resulted in an 8-hour outage between around 11:45 UTC and 19:43 UTC, which was when pre-incident failure rates were matched.
-
Microsoft declared the incident mitigated at 20:48 UTC, while some downstream services took longer to recover, based on their AFD and/or CDN configurations.
-
Microsoft also communicated its acknowledgment of the incident and responses thereof through its X handle, saying, “Thank you for your patience. We are currently investigating an issue impacting Azure Services. Our teams are actively working to resolve this as early as possible.”
-
While outage reports came in from around the globe, numerous companies and individuals were affected, including the UK government's HM Courts and Tribunals Service, Minecraft video game players and more.
What Did Microsoft Say About The Incident?
-
Through the Mission Statement, Microsoft said, “Customer impact began at 11:45 UTC and we started investigating. Once the nature of the usage spike was understood, we implemented networking configuration changes to support our DDoS protection efforts, and performed failovers to alternate networking paths to provide relief.”
-
“Our initial network configuration changes successfully mitigated majority of the impact by 14:10 UTC. Some customers reported less than 100% availability, which we began mitigating at around 18:00 UTC.”
-
“We proceeded with an updated mitigation approach, first rolling this out across regions in Asia Pacific and Europe. After validating that this revised approach successfully eliminated the side effect impacts of the initial mitigation, we rolled it out to regions in the Americas.”
-
“Our team will be completing an internal retrospective to understand the incident in more detail. We will publish a Preliminary Post Incident Review (PIR) within approximately 72 hours, to share more details on what happened and how we responded.”
In a previous news article, we brought up a question; “A major concern that comes up [in relation to the CrowdStrike incident] is how a simple software application file could lead to worldwide standstill, what happens if bad actors intentionally set loose malicious software?
While the DDoS attack on Microsoft wasn’t successful in its deployment and was thwarted by the company, its defense system’s misconfigurations ultimately disrupted Microsoft’s services and caused problems for businesses and individuals.
Keeping this in mind, do you think tech companies should employ stricter and more robust testing of their security measures to protect from cyber threats?
Let us know in the comments below!
First published on Thu, Aug 1, 2024
Enjoyed what you've read so far? Great news - there's more to explore!
Stay up to date with the latest news, a vast collection of tech articles including introductory guides, product reviews, trends and more, thought-provoking interviews, hottest AI blogs and entertaining tech memes.
Plus, get access to branded insights such as informative white papers, intriguing case studies, in-depth reports, enlightening videos and exciting events and webinars from industry-leading global brands.
Dive into TechDogs' treasure trove today and Know Your World of technology!
Disclaimer - Reference to any specific product, software or entity does not constitute an endorsement or recommendation by TechDogs nor should any data or content published be relied upon. The views expressed by TechDogs' members and guests are their own and their appearance on our site does not imply an endorsement of them or any entity they represent. Views and opinions expressed by TechDogs' Authors are those of the Authors and do not necessarily reflect the view of TechDogs or any of its officials. All information / content found on TechDogs' site may not necessarily be reviewed by individuals with the expertise to validate its completeness, accuracy and reliability.
Trending TD NewsDesk
Court Backs School's Punishment For Using AI But AI Research Keeps Unveiling New Insights
By TechDogs Bureau
Business Spend On GenAI Jumps 500% But Nearly Half Prefer Open-Source
By TechDogs Bureau
ZEEL And Telegram Crack Down On Piracy While Young Users Embrace It
By TechDogs Bureau
750,000 French Patients’ Data Leaked While Advertisers Sell Data Of US Military Workers
By TechDogs Bureau
Australia Introduces Bill To Ban Social Media For Under-16s
By TechDogs Bureau
Join Our Newsletter
Get weekly news, engaging articles, and career tips-all free!
By subscribing to our newsletter, you're cool with our terms and conditions and agree to our Privacy Policy.
Join The Discussion