Categories
Uncategorized

2020W41

Yet another week of doing lots … but not necessarily the things I was tasked with. Plus it was my turn to be fire-fighter, which always means a lot of ad-hoc distractions.

While I did some coding, a great deal of time was taken up investigating data issues form our customers. The amazing ability of Dremel to query enormous datasets in realtime is a real bonus. But understanding the meaning behind the results can take a lot of time. One customer was complaining about seeing a huge jump in the number of changes to their project at a certain point earlier this year. I was able to track that down to a misconfiguration that they corrected at that time. For a moment, I was worried that I’d lost data for them!

Another customer was investigating how far back in time our dataset went. Looking at some strange results there revealed a bug in how our pipeline considered some jobs to be “in production”. This was simple to fix, but it required a sequence of three chained changes, each depending on deployment of the predecessor.

Probably the most important part of the week was a meeting with our new director to go over the metrics work we do. Lots of probing questions were asked (which is good!) that made us realise we needed to better understand who’s using the metrics we produce, and why they’re helpful. This is something that it’s easy to lose sight of when you’ve been building for a very long time.

Stats: 30 changes submitted, 53 reviewed

Categories
Uncategorized

2021W40

It was quite a busy week, but with not much focus on any one thing.

On the technical front, I deleted two (seemingly) unused fields from a very common configuration protobuf schema. Naturally, I didn’t catch all the field accesses, so there was a round of rollback / rollforwards.

I spent a large amount of time with the rest of my team reviewing old bugs in our backlog. This is something we do periodically, and I find it useful (if tiring). We get review all bugs over six months old in the backlog and see if they’re still applicable. Many times the issue has been superceded and is now obsolete. Sometimes it’s even been fixed!

Stats: 19 changes submitted, 39 reviewed.

Categories
Uncategorized

2021W39

Much of my time this week was taken up with a migration. We have a library internally at Google which does exactly what the new UnixMicro() and UnixMilli() in the time package does (but was available many years prior to 1.17). Initially, I started writing a “tip of the week” article about how they can be used. This quickly morphed into a “migrate everyone to the new functions” project. Thankfully, having everyone in a large monorepo helps enormously. I’ve now submitted 96% of the migration changes a couple of days after starting (using the large-scale-change aka LSC process).

Making the initial change was surprisingly simple, using a combination of codesearch and gofmt -r. I was lucky that I didn’t need anything more involved; however, I see that Eli Bendersky has an excellent introduction, Rewriting Go source code with AST tooling which would cover more advanced needs.

codesearch -l 'unixtime\.(To|From)(Micros|Millis)'
while read filename
do
  gofmt -r 'unixtime.ToMicros(a) -> a.UnixMicro()' "$filename" |
  gofmt -r 'unixtime.FromMicros(a) -> time.UnixMicro(a)' |
  gofmt -r 'unixtime.ToMilli(a) -> a.UnixMilli()' |
  gofmt -r 'unixtime.FromMillis(a) -> time.UnixMilli(a)' |
  goimports > "$filename.tmp" &&
  mv "$filename.tmp" "$filename"
done

I ended up with a change touching around 4100 files. The LSC tools split this up into around 250 changes, and I worked with an approver to get these reviewed and submitted over a couple of days. I can’t thank my approver enough for the diligent review of such tedious minor changes!

There were a few problems…

  • The main one that cropped up was the result of the goimports call above. In some cases, there were multiple files in the package, but the goimports above was only running on a single file, so I saw some unexpected imports for names that were pulled in from a different file in the same package. These were quickly identified and fixed, as they caused test failures!
  • One problem happened twice: a local variable named time shadowed the newly added import of the time package. This was trivially fixable with a local rename. Twice out of 4100 files is OK to do by hand.
  • In a few cases I discovered areas of code that had remained on Go 1.16 (temporarily). These broke very quickly when the tests were executed. I simply reverted these changes and will revisit them in a month or two.

Overall this is a pretty positive experience. I was able to upgrade a large swathe of Google’s code with fairly little fuss. Nobody asked me to do it — and I didn’t need to ask much in the way of permission. The end result is that code inside Google is more similar to code outside.

The main downside is that I should have been working on migrating a metrics pipeline, and instead I got a bit distracted by an interesting problem. Ah well.

Stats: 267 submitted (244 were the LSC described above), 30 reviewed

Categories
Uncategorized

2021W38

I was on vacation. I caught cold, and it was not that great.

I peeked at work on Friday and caught an interesting C++ code review. Changing from this:

template <typename T>
class SomeClass {
  …
  template <typename U>
  Add(U&& item) {
    add(SomeClass(std::forward(item)));
  }
  …
}

To this:

template <typename T>
class SomeClass {
  …
  template <typename U>
  Add(U&& item) {
    add(SomeClass(std::forward<U>(item)));
  }
  …
}

It’s quite subtle, and easy to miss by accident. In this case T=std::string and U=std::string_view. The initial version has a compilation error in this case, “no matching function for call to ‘forward’”. Thankfully, clang also points out that it can’t infer the template, which is a big clue.

My read is that SomeClass(std::string) can be constructed from a std::string_view. But the compiler doesn’t know it has a std::string_view unless we correctly parameterize the call to std::forward.

Categories
Uncategorized

2021W37

Something of a mixed week. I was the oncall person, so got to deal with a bunch of interesting failing things. Most interesting was an alert that one of our databases was approaching 70% of its disk quota. We weren’t in a position to get more, so I ended up designing a solution which would migrate the blobs out of the database. We don’t have time to implement this right now. But we have a reviewed design, and enough time to implement it before it’s a crisis.

The main thing I was supposed to be working on was designing a migration for the metrics system I work on. As happens so often, I ended up going down a rabbit hole trying to understand the source data. Bugs were filed … but not a lot of progress was made.

I filed my first bug against Chrome! The right click menu on a mac has grown a “Exit Full Screen” menu item right where “Back” used to sit. This led to me being frustrated two-to-three times per hour as I spend of my time in full screen mode.

Stats: 14 changes submitted, 39 reviewed.

Categories
Uncategorized

Code Review

One of the things I enjoy most at Google is code review. I don’t think anything else has improved me so much as an engineer. What I really value is that it’s an open discussion about how to make the system better, without personal prejudice. Being exposed to other people’s ideas is one of the clearest benefits of diversity: it forces you to think about another’s views and opinions.

Google’s setup for code review works well. All code must be reviewed before before it’s submitted to the codebase. You must have at least one other person read through the code. If there’s something that makes them go “huh?” it’ll come out. This is not (usually) a rubber stamping exercise. This really matters, as it means that knowledge is distributed instead of siloed.

One thing which I think is peculiar to Google is the notion of readability. This is the idea that our code should look fairly similar, no matter which project you’re on. It makes it much easier to move between teams, or jump into another teams codebase. This is enforced through the use of readability approvals, which is an incremental process where each code review must have approval from an expert in that language in addition to the regular review from your colleague.

I participate in the Go readability approvers, and it’s been a great experience. I see code from all over Google, and I help mentor folks not only in Go, but how we write Go inside Google. Sometimes it’s really trivial stuff: use a common library for creating a time.Time from microseconds. Other times it’s more structural: if you’ve designed a public API around channels, I might discuss to see if there’s a less error prone way. Often I learn quite a lot through these discussions!

Another benefit of code review is that it provides a common point for tooling. When a review is started, many tools have already triggered, starting all kinds of analyses. When I get to a review, I can immediately see that there’s an API being misused, or some configuration is incorrect. It’s far cheaper to correct these issues before the code is submitted.

Statistics: I’ve reviewed over 15,000 changes at Google.

Anyway, reviews are one of the most enjoyable parts of my job. I get to learn, as do the people I’m working with. And it’s over of the quickest ways to help unblock my colleagues.

Categories
Uncategorized

2021W36

Most of this week was follow-on from last week’s attempt to make a data pipeline read from a Spanner database. The configuration turned out to be quite the tricky thing to make work. But at least I solved it for everyone. And the improvements are worthwhile: the pipeline runtime has halved, and we’re now able to read much more recent data, as we’re not relying on exports of database backups. The key part to making this work is that the pipeline is using a new feature to access the underlying files that the database uses, instead of having to make an online query.

Aside from that, one highlight was a pair programming exercise with a colleague in the USA who was interested in learning Go. We managed to contribute an improvement in about an hour, which is great, and he’s now looking at making changes on his own.

I spent a great deal of time with my manager writing up a summary of what I’d done over the last six months. I find it hard to present a coherent, data driven picture of this (despite the effort of writing snippets each week, so I actually have something to base this on…)

Stats: 18 changes submitted, 47 reviewed.

Categories
Uncategorized

2021W35

This week was fairly quiet: Monday was a bank holiday, and it’s perf season inside Google, so everyone was busy writing about themselves and others instead of writing code…

Unfortunately for me personally it was a week of going down rabbit holes & yak shaving.

I spent far, far too long visiting the innards of cmp and protocmp in an effort to modernise an older test. The premise was simple: convert all int64 values in a particular message to 1, so that we don’t care about the values, just the presence or absence of the value. I was trying to use FilterMessage(), but couldn’t make it work. Eventually I ended up with a more brute-force approach: alter the entire result, not just the message I was concerned with.

cmp.Transform("normalize", func(int64) int64 {
  return 1
})

The other yak shaving relates to a data pipeline. I was switching it from reading some files to reading from a Spanner database. The code change was verbose, but fairly simple. Then I tried to run it and discovered the configuration change I needed was anything but. I ended up with some configuration of command line flags … in a proto message … wrapped in another command flag … embedded in some GCL … in turn embedded in another shell script. Not only was the quoting horrific, but I couldn’t make the GCL do what I needed. I ended up hacking the interpreter to emit errors from an eval() statement which are normally thrown away. This is one of the benefits of having everything in a monorepo, but I really didn’t need to spend hours on this.

Stats: 9 changes submitted, 47 reviewed.

Categories
Uncategorized

Weeknotes

It’s been a while since I’ve written anything… I’ve recently noticed Simon Willison’s weeknotes summarising what’s happened in the last week. I’ve done this internally since starting Google, and I now have a large backlog of snippets (as they’re known in there). I really value this for a few reasons.

  • It’s great to write each week and have a fixed point of review, while the all the context is still in your head. More often than not, I’m able to look back and realise that I’ve achieved more than I thought I had!
  • It’s a great historical record. Being able to go back in time is invaluable at perf time, but also in the longer term to see how my career has developed.
  • Reading other people’s snippets is an incredibly pleasurable start to the week. Invariably, I find a few interesting strands of work that I would not otherwise have known about (I love reading the Go team’s snippets to see how generics is progressing!)

While I can’t write exactly the same snippets externally, I’m going to see if I can pick out something useful or interesting I’ve done each week.

2021W34

A large part of last week was playing catchup with discussions from the previous week, as I had a few days out at Beautiful Days, which was amazing (The Levellers! Hawkwind! Dreadzone!). However, two things stood out:

  1. I’d made a small change to how one of our pipelines reads data from Bigtable before leaving. A single flag-flip. Unfortunately, I’d neglected to remember the ACL change that went alongside this, causing my colleagues to waste time debugging (it was missed in code review too … but while code review is nice it’s never perfect).

    And even when the ACL was fixed, the change caused the pipeline to slow from minutes to hours, so the change had to be rolled back anyway. 🤦‍♀️
  2. In another pipeline, I’d rolled out a change to the output format. I enabled in qa a couple of weeks back, and all seemed to be working fine. So I’d enabled it in production before I left. Unfortunately some downstream pipelines started reporting “no data” a day or so later while I was out. Thankfully, my change was quickly identified and rolled back. But again, I caused my colleagues to spend too long debugging my problems.

So: in conclusion, insufficient carefulness on my behalf. Both of these issues could have been resolved better if I’d taken a bit longer to sit down and think through the consequences of what I was working on.

Other things I touched on:

  • Centralised some RPC clients for services which our team runs so that we can changes to a single place instead of contacting affected teams
  • Investigated memory spike protection in some of our services. This was fun, and led me to conclude that I didn’t understand the tooling well enough to apply it. I need to write a reproducer first before coming back to this.

Stats: 15 changes submitted, 42 reviewed.

Categories
Uncategorized

Maven Irritation

I’m looking at Maven related stuff for the first time in … perhaps 4 years. This is long enough that I’ve forgotten how irritating it is. My task is simple: redirect the writes from the project’s directory to another location on local disk. This is because the project is located on a filesystem where writes are expensive (something like an NFS filer, but worse). I have to do this for all projects, not just my own, so saying “edit the POM to do X” is not an option. Given how disparate the projects are, there isn’t even a company-wide POM I can alter.

Of course the maven documentation tells you everything you don’t want to know. So, off to stack overflow.

First, is the helpful description of ${project.build.directory}. This is exactly what I want: a way to affect all settings related to the output directory (target by default). See pom-4.0.0.xml for how this works.

Except… it doesn’t work. You can set -Dproject.build.directory=/tmp on the command line, and maven will happily ignore you. I think this is because AbstractCompilerMojo marks it as read-only.

It turns out there’s another way to do this, via profiles.

<profile>
  <id>move-output-dir</id>
  <activation>
    <activeByDefault>true</activeByDefault>
  </activation>
  <build>
    <directory>${java.io.tmpdir}/maven-builds/${project.groupId}/${project.artifactId}/target</directory>
  </build>
</profile>

This works perfectly! But it has to be done on a pom-by-pom basis. What I want is to use maven’s settings.xml to do this. So I pasted it into the profile section there and got:

[WARNING] Some problems were encountered while building the effective settings
[WARNING] Unrecognised tag: ‘build’ (position: START_TAG seen …n … @263:14) @ /…/conf/settings.xml, line 263, column 14

It turns out that profiles in settings.xml are different to profiles in pom.xml (“The profile element in the settings.xml is a truncated version of the pom.xml profile element.”)

At this point, I’m kind of stuck. Maven has provided the illusion of configurability, but all attempts to do so have failed.