It’s been a while since I’ve written anything… I’ve recently noticed Simon Willison’s weeknotes summarising what’s happened in the last week. I’ve done this internally since starting Google, and I now have a large backlog of snippets (as they’re known in there). I really value this for a few reasons.
- It’s great to write each week and have a fixed point of review, while the all the context is still in your head. More often than not, I’m able to look back and realise that I’ve achieved more than I thought I had!
- It’s a great historical record. Being able to go back in time is invaluable at perf time, but also in the longer term to see how my career has developed.
- Reading other people’s snippets is an incredibly pleasurable start to the week. Invariably, I find a few interesting strands of work that I would not otherwise have known about (I love reading the Go team’s snippets to see how generics is progressing!)
While I can’t write exactly the same snippets externally, I’m going to see if I can pick out something useful or interesting I’ve done each week.
2021W34
A large part of last week was playing catchup with discussions from the previous week, as I had a few days out at Beautiful Days, which was amazing (The Levellers! Hawkwind! Dreadzone!). However, two things stood out:
- I’d made a small change to how one of our pipelines reads data from Bigtable before leaving. A single flag-flip. Unfortunately, I’d neglected to remember the ACL change that went alongside this, causing my colleagues to waste time debugging (it was missed in code review too … but while code review is nice it’s never perfect).
And even when the ACL was fixed, the change caused the pipeline to slow from minutes to hours, so the change had to be rolled back anyway. 🤦♀️ - In another pipeline, I’d rolled out a change to the output format. I enabled in qa a couple of weeks back, and all seemed to be working fine. So I’d enabled it in production before I left. Unfortunately some downstream pipelines started reporting “no data” a day or so later while I was out. Thankfully, my change was quickly identified and rolled back. But again, I caused my colleagues to spend too long debugging my problems.
So: in conclusion, insufficient carefulness on my behalf. Both of these issues could have been resolved better if I’d taken a bit longer to sit down and think through the consequences of what I was working on.
Other things I touched on:
- Centralised some RPC clients for services which our team runs so that we can changes to a single place instead of contacting affected teams
- Investigated memory spike protection in some of our services. This was fun, and led me to conclude that I didn’t understand the tooling well enough to apply it. I need to write a reproducer first before coming back to this.
Stats: 15 changes submitted, 42 reviewed.