Google Workspace Status Dashboard
Incident affecting Google Calendar
Incident began at 2021-11-12 08:30 and ended at 2021-11-12 10:26 (times are in Coordinated Universal Time (UTC)).
Date | Time | Description | |
---|---|---|---|
| Dec 2, 2021 | 11:36 PM UTC | INCIDENT REPORTDATE/TIME OF THE ISSUE (US/Pacific time) Friday, 12 November 2021 00:30 - Friday, 12 November 2021 02:26 Duration: 1 hour, 56 minutes SummaryOn November 12, 2021, the Google Cloud Load Balancing (GCLB) service experienced failures resulting in impact to several downstream Google Cloud services in Europe for a duration of 1 hour, 56 minutes. We understand that this issue has impacted our valued customers and users, and we apologize to those who were affected. BackgroundGoogle Cloud Load Balancing is a collection of software and services that load balance traffic across Google properties. There are two main components: a control plane and a data plane. The control plane provides programming to the data plane on how to handle requests. A key component of the data plane is the Google Front End (GFE). The GFE is an HTTP/TCP reverse proxy, which is used to serve requests to Google properties including Search, Ads, Workspace (Gmail, Chat, Meet, Docs, Drive, etc.), Cloud External HTTP(S) Load Balancing, Proxy/SSL Load Balancing, and many Cloud APIs. Updates are regularly rolled out to GFEs, typically via configuration flags, starting with canary GFEs and gradually expanding to production globally. GFEs support and terminate QUIC(1) connections, before connecting to downstream backend services. QUIC is a general-purpose transport layer network protocol. Upon first connection, QUIC servers supply a source address token to prove that a client has previously used a given address when resuming a future connection. Root CauseOn Friday, 12 November at 00:27, a configuration change modifying the format of the source address token provided to QUIC clients was rolled out to a small set of GFEs. This change resulted in a misconfigured token that could crash GFEs that had not yet received this update. Shortly thereafter, the monitoring service automatically detected a problem with GFEs using this flag and rolled back the change within four minutes. However, clients that had connected to a GFE with the updated configuration during that period received a misconfigured token, which was subsequently shared with other GFEs during reconnection. So despite the rollback, impact remained until additional mitigations were put in place. [1] - https://cloud.google.com/blog/products/gcp/introducing-quic-support-https-load-balancing Remediation and PreventionGoogle engineers were alerted to the issue via automated alerting on Friday, 12 November 2021, at 00:30 US/Pacific and immediately started an investigation. At 00:31, the configuration change was automatically rolled back. However, by 00:42, it was clear the impact remained widespread, and our engineering team continued further investigation. Mitigation began at 01:38, when traffic was redirected away from the impacted GFEs. At 02:12, a flag change was pushed to temporarily disable QUIC support on GFEs, which mitigated all impact by 02:26. In order to prevent this type of outage from happening again we are pursuing the following: We want to apologize for the length and severity of this incident. We are taking immediate steps to prevent recurrence and improve reliability in the future. If your service or application was affected, we apologize. This is not the level of quality and reliability we strive to offer you, and we are taking immediate steps to improve the platform’s performance and availability. Detailed Description of ImpactOn Friday, November 12, 00:30 2021 US/Pacific, the GCLB service experienced failures resulting in impact to several downstream Google Cloud services for 1 hour, 56 minutes. Some customers in Europe were unable to access web and mobile clients for services including Gmail, Groups, Calendar, Tasks, and Chat. Google GmailAffected customers were unable to access web and mobile clients. This resulted in ~2% traffic drop for Gmail services. This mostly affected customers in Europe. The period of impact was between 00:30 and 02:53. Google GroupsAffected customers were unable to access web and mobile clients. This resulted in affected customers in Europe, who were unable to access web and mobile clients. The period of impact was between 1:28 and 3:06, during which time affected customers in Europe were having issues loading the Groups UI. Google TasksGoogle Tasks experienced error rates up to ~.2% in Europe. Affected customers were unable to access web and mobile clients. The period of impact was between 00:30 and 02:10. Google CalendarGoogle Calendar experienced error rates up to ~.5% in Europe. Affected customers were unable to access web and mobile clients. The period of impact was between 00:30 and 02:10. Google Chat14.5% of Chat users could not connect, which impaired functionality in their clients. This affected mostly European users, both web and mobile. The period of impact was between 00:30 and 02:20. Appendix[1] - https://cloud.google.com/blog/products/gcp/introducing-quic-support-https-load-balancing |
| Nov 12, 2021 | 11:05 AM UTC | The problem with Google Calendar has been resolved. We apologize for the inconvenience and thank you for your patience and continued support. |
| Nov 12, 2021 | 10:40 AM UTC | Our team is continuing to investigate this issue. We will provide an update by Nov 12, 2021, 11:00 AM UTC with more information about this problem. Thank you for your patience. The affected users are unable to access Google Calendar. Some users in Europe may be experiencing issues when attempting to access services. |
| Nov 12, 2021 | 9:35 AM UTC | We're investigating reports of an issue with Google Calendar. We will provide more information shortly. The affected users are unable to access Google Calendar. We are investigating an issue which is affecting some users in Europe affecting their ability to access some services. |
- Times are listed in Coordinated Universal Time (UTC)