Categories
Uncategorized

2020W41

Yet another week of doing lots … but not necessarily the things I was tasked with. Plus it was my turn to be fire-fighter, which always means a lot of ad-hoc distractions.

While I did some coding, a great deal of time was taken up investigating data issues form our customers. The amazing ability of Dremel to query enormous datasets in realtime is a real bonus. But understanding the meaning behind the results can take a lot of time. One customer was complaining about seeing a huge jump in the number of changes to their project at a certain point earlier this year. I was able to track that down to a misconfiguration that they corrected at that time. For a moment, I was worried that I’d lost data for them!

Another customer was investigating how far back in time our dataset went. Looking at some strange results there revealed a bug in how our pipeline considered some jobs to be “in production”. This was simple to fix, but it required a sequence of three chained changes, each depending on deployment of the predecessor.

Probably the most important part of the week was a meeting with our new director to go over the metrics work we do. Lots of probing questions were asked (which is good!) that made us realise we needed to better understand who’s using the metrics we produce, and why they’re helpful. This is something that it’s easy to lose sight of when you’ve been building for a very long time.

Stats: 30 changes submitted, 53 reviewed

Categories
Uncategorized

2021W40

It was quite a busy week, but with not much focus on any one thing.

On the technical front, I deleted two (seemingly) unused fields from a very common configuration protobuf schema. Naturally, I didn’t catch all the field accesses, so there was a round of rollback / rollforwards.

I spent a large amount of time with the rest of my team reviewing old bugs in our backlog. This is something we do periodically, and I find it useful (if tiring). We get review all bugs over six months old in the backlog and see if they’re still applicable. Many times the issue has been superceded and is now obsolete. Sometimes it’s even been fixed!

Stats: 19 changes submitted, 39 reviewed.

Categories
Uncategorized

2021W39

Much of my time this week was taken up with a migration. We have a library internally at Google which does exactly what the new UnixMicro() and UnixMilli() in the time package does (but was available many years prior to 1.17). Initially, I started writing a “tip of the week” article about how they can be used. This quickly morphed into a “migrate everyone to the new functions” project. Thankfully, having everyone in a large monorepo helps enormously. I’ve now submitted 96% of the migration changes a couple of days after starting (using the large-scale-change aka LSC process).

Making the initial change was surprisingly simple, using a combination of codesearch and gofmt -r. I was lucky that I didn’t need anything more involved; however, I see that Eli Bendersky has an excellent introduction, Rewriting Go source code with AST tooling which would cover more advanced needs.

codesearch -l 'unixtime\.(To|From)(Micros|Millis)'
while read filename
do
  gofmt -r 'unixtime.ToMicros(a) -> a.UnixMicro()' "$filename" |
  gofmt -r 'unixtime.FromMicros(a) -> time.UnixMicro(a)' |
  gofmt -r 'unixtime.ToMilli(a) -> a.UnixMilli()' |
  gofmt -r 'unixtime.FromMillis(a) -> time.UnixMilli(a)' |
  goimports > "$filename.tmp" &&
  mv "$filename.tmp" "$filename"
done

I ended up with a change touching around 4100 files. The LSC tools split this up into around 250 changes, and I worked with an approver to get these reviewed and submitted over a couple of days. I can’t thank my approver enough for the diligent review of such tedious minor changes!

There were a few problems…

  • The main one that cropped up was the result of the goimports call above. In some cases, there were multiple files in the package, but the goimports above was only running on a single file, so I saw some unexpected imports for names that were pulled in from a different file in the same package. These were quickly identified and fixed, as they caused test failures!
  • One problem happened twice: a local variable named time shadowed the newly added import of the time package. This was trivially fixable with a local rename. Twice out of 4100 files is OK to do by hand.
  • In a few cases I discovered areas of code that had remained on Go 1.16 (temporarily). These broke very quickly when the tests were executed. I simply reverted these changes and will revisit them in a month or two.

Overall this is a pretty positive experience. I was able to upgrade a large swathe of Google’s code with fairly little fuss. Nobody asked me to do it — and I didn’t need to ask much in the way of permission. The end result is that code inside Google is more similar to code outside.

The main downside is that I should have been working on migrating a metrics pipeline, and instead I got a bit distracted by an interesting problem. Ah well.

Stats: 267 submitted (244 were the LSC described above), 30 reviewed

Categories
Uncategorized

2021W38

I was on vacation. I caught cold, and it was not that great.

I peeked at work on Friday and caught an interesting C++ code review. Changing from this:

template <typename T>
class SomeClass {
  …
  template <typename U>
  Add(U&& item) {
    add(SomeClass(std::forward(item)));
  }
  …
}

To this:

template <typename T>
class SomeClass {
  …
  template <typename U>
  Add(U&& item) {
    add(SomeClass(std::forward<U>(item)));
  }
  …
}

It’s quite subtle, and easy to miss by accident. In this case T=std::string and U=std::string_view. The initial version has a compilation error in this case, “no matching function for call to ‘forward’”. Thankfully, clang also points out that it can’t infer the template, which is a big clue.

My read is that SomeClass(std::string) can be constructed from a std::string_view. But the compiler doesn’t know it has a std::string_view unless we correctly parameterize the call to std::forward.

Categories
Uncategorized

2021W37

Something of a mixed week. I was the oncall person, so got to deal with a bunch of interesting failing things. Most interesting was an alert that one of our databases was approaching 70% of its disk quota. We weren’t in a position to get more, so I ended up designing a solution which would migrate the blobs out of the database. We don’t have time to implement this right now. But we have a reviewed design, and enough time to implement it before it’s a crisis.

The main thing I was supposed to be working on was designing a migration for the metrics system I work on. As happens so often, I ended up going down a rabbit hole trying to understand the source data. Bugs were filed … but not a lot of progress was made.

I filed my first bug against Chrome! The right click menu on a mac has grown a “Exit Full Screen” menu item right where “Back” used to sit. This led to me being frustrated two-to-three times per hour as I spend of my time in full screen mode.

Stats: 14 changes submitted, 39 reviewed.

Categories
Uncategorized

2021W36

Most of this week was follow-on from last week’s attempt to make a data pipeline read from a Spanner database. The configuration turned out to be quite the tricky thing to make work. But at least I solved it for everyone. And the improvements are worthwhile: the pipeline runtime has halved, and we’re now able to read much more recent data, as we’re not relying on exports of database backups. The key part to making this work is that the pipeline is using a new feature to access the underlying files that the database uses, instead of having to make an online query.

Aside from that, one highlight was a pair programming exercise with a colleague in the USA who was interested in learning Go. We managed to contribute an improvement in about an hour, which is great, and he’s now looking at making changes on his own.

I spent a great deal of time with my manager writing up a summary of what I’d done over the last six months. I find it hard to present a coherent, data driven picture of this (despite the effort of writing snippets each week, so I actually have something to base this on…)

Stats: 18 changes submitted, 47 reviewed.

Categories
Uncategorized

2021W35

This week was fairly quiet: Monday was a bank holiday, and it’s perf season inside Google, so everyone was busy writing about themselves and others instead of writing code…

Unfortunately for me personally it was a week of going down rabbit holes & yak shaving.

I spent far, far too long visiting the innards of cmp and protocmp in an effort to modernise an older test. The premise was simple: convert all int64 values in a particular message to 1, so that we don’t care about the values, just the presence or absence of the value. I was trying to use FilterMessage(), but couldn’t make it work. Eventually I ended up with a more brute-force approach: alter the entire result, not just the message I was concerned with.

cmp.Transform("normalize", func(int64) int64 {
  return 1
})

The other yak shaving relates to a data pipeline. I was switching it from reading some files to reading from a Spanner database. The code change was verbose, but fairly simple. Then I tried to run it and discovered the configuration change I needed was anything but. I ended up with some configuration of command line flags … in a proto message … wrapped in another command flag … embedded in some GCL … in turn embedded in another shell script. Not only was the quoting horrific, but I couldn’t make the GCL do what I needed. I ended up hacking the interpreter to emit errors from an eval() statement which are normally thrown away. This is one of the benefits of having everything in a monorepo, but I really didn’t need to spend hours on this.

Stats: 9 changes submitted, 47 reviewed.

Categories
Uncategorized

Weeknotes

It’s been a while since I’ve written anything… I’ve recently noticed Simon Willison’s weeknotes summarising what’s happened in the last week. I’ve done this internally since starting Google, and I now have a large backlog of snippets (as they’re known in there). I really value this for a few reasons.

  • It’s great to write each week and have a fixed point of review, while the all the context is still in your head. More often than not, I’m able to look back and realise that I’ve achieved more than I thought I had!
  • It’s a great historical record. Being able to go back in time is invaluable at perf time, but also in the longer term to see how my career has developed.
  • Reading other people’s snippets is an incredibly pleasurable start to the week. Invariably, I find a few interesting strands of work that I would not otherwise have known about (I love reading the Go team’s snippets to see how generics is progressing!)

While I can’t write exactly the same snippets externally, I’m going to see if I can pick out something useful or interesting I’ve done each week.

2021W34

A large part of last week was playing catchup with discussions from the previous week, as I had a few days out at Beautiful Days, which was amazing (The Levellers! Hawkwind! Dreadzone!). However, two things stood out:

  1. I’d made a small change to how one of our pipelines reads data from Bigtable before leaving. A single flag-flip. Unfortunately, I’d neglected to remember the ACL change that went alongside this, causing my colleagues to waste time debugging (it was missed in code review too … but while code review is nice it’s never perfect).

    And even when the ACL was fixed, the change caused the pipeline to slow from minutes to hours, so the change had to be rolled back anyway. 🤦‍♀️
  2. In another pipeline, I’d rolled out a change to the output format. I enabled in qa a couple of weeks back, and all seemed to be working fine. So I’d enabled it in production before I left. Unfortunately some downstream pipelines started reporting “no data” a day or so later while I was out. Thankfully, my change was quickly identified and rolled back. But again, I caused my colleagues to spend too long debugging my problems.

So: in conclusion, insufficient carefulness on my behalf. Both of these issues could have been resolved better if I’d taken a bit longer to sit down and think through the consequences of what I was working on.

Other things I touched on:

  • Centralised some RPC clients for services which our team runs so that we can changes to a single place instead of contacting affected teams
  • Investigated memory spike protection in some of our services. This was fun, and led me to conclude that I didn’t understand the tooling well enough to apply it. I need to write a reproducer first before coming back to this.

Stats: 15 changes submitted, 42 reviewed.