Presented by

  • Julien Goodwin

    Julien Goodwin
    @LapTop006

    Julien is a Senior Site Reliability Engineer at Google Sydney, from 2011 to 2018 he worked on Google's production networks, focusing on Internet routing & interconnection. When not at work he does things like designing custom embedded Linux machines & modernising frequency distribution systems. He was also the 2019 & 2020 Secretary of Linux Australia, the parent organisation for linux.conf.au, and was part of the LCA 2008 team.

Abstract

In 2017 I came one keypress from causing Google's main backbone to largely fall off the Internet. This is the story of how we used that incident as a learning opportunity, how a lack of buy-in hindered further improvements, and how an existing toolkit of python libraries allowed testing and validation tools to be quickly built, preventing any chance of a recurrence.