The worst outage I never caused
Blemings Labs | Sat 23 Jan 5:05 p.m.–5:25 p.m.
Presented by
-
Julien is a Senior Site Reliability Engineer at Google Sydney, from 2011 to 2018 he worked on Google's production networks, focusing on Internet routing & interconnection. When not at work he does things like designing custom embedded Linux machines & modernising frequency distribution systems.
He was also the 2019 & 2020 Secretary of Linux Australia, the parent organisation for linux.conf.au, and was part of the LCA 2008 team.
Julien is a Senior Site Reliability Engineer at Google Sydney, from 2011 to 2018 he worked on Google's production networks, focusing on Internet routing & interconnection. When not at work he does things like designing custom embedded Linux machines & modernising frequency distribution systems.
He was also the 2019 & 2020 Secretary of Linux Australia, the parent organisation for linux.conf.au, and was part of the LCA 2008 team.
Abstract
In 2017 I came one keypress from causing Google's main backbone to largely fall off the Internet. This is the story of how we used that incident as a learning opportunity, how a lack of buy-in hindered further improvements, and how an existing toolkit of python libraries allowed testing and validation tools to be quickly built, preventing any chance of a recurrence.
In 2017 I came one keypress from causing Google's main backbone to largely fall off the Internet. This is the story of how we used that incident as a learning opportunity, how a lack of buy-in hindered further improvements, and how an existing toolkit of python libraries allowed testing and validation tools to be quickly built, preventing any chance of a recurrence.