Categories
Uncategorized

Go Strings

I’ve been looking at Go recently. It’s a pleasant language, with few surprises. However, I wondered (as always) what the encoding of a string is supposed to be. For example:

  • Python 2 has two types: str, and unicode. Python 3 has sensibly renamed these to bytes and str, respectively.
  • Perl has a magic bit which gets set to state that the string contains characters as opposed to bytes (it’s called the UTF-8 bit, but it means characters).

So how does Go deal with characters in strings? Given that the authors of Go also invented UTF-8, we can hope it’s been thought about.

There are three types to think about.

byte[]

A slice of bytes.

string

A (possibly empty) sequence of bytes. Strings are immutable.

rune

A single unicode code point. Produced by characters in single quotes.

There’s no explicit encoding in the above. Nonetheless, there’s an implicit preference for UTF-8:

But this doesn’t help the common case:

package main

import "fmt"

func main() {
  s := "café"
  fmt.Printf("%q has length %dn", s, len(s))
}

// "café" has length 5

The unicode/utf8 package can do what’s needed though. This provides functions for, amongst other things, picking runes out of strings.

package main

import (
  "fmt"
  "unicode/utf8"
)

func main() {
  s := "café"
  fmt.Printf("%q has length %dn", s, utf8.RuneCountInString((s)))
}

// "café" has length 4

This is very Go-like. The default is somewhat low-level, but the types and libraries build on top of it. For example, text/scanner provides a nice way of iterating over runes in a UTF-8 input stream.

On a whim, I took a look at the internals of utf8.RuneCountInString(). It’s deceptively simple.

func RuneCountInString(s string) (n int) {
  for _ = range s {
    n++
  }
  return
}

This relies on the spec defining how a string interacts with a for loop: it’s defined as iterating over the UTF-8 codepoints (or runes).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s