Go Strings

I’ve been looking at Go recently. It’s a pleasant language, with few surprises. However, I wondered (as always) what the encoding of a string is supposed to be. For example:

  • Python 2 has two types: str, and unicode. Python 3 has sensibly renamed these to bytes and str, respectively.
  • Perl has a magic bit which gets set to state that the string contains characters as opposed to bytes (it’s called the UTF-8 bit, but it means characters).

So how does Go deal with characters in strings? Given that the authors of Go also invented UTF-8, we can hope it’s been thought about.

There are three types to think about.


A slice of bytes.


A (possibly empty) sequence of bytes. Strings are immutable.


A single unicode code point. Produced by characters in single quotes.

There’s no explicit encoding in the above. Nonetheless, there’s an implicit preference for UTF-8:

But this doesn’t help the common case:

package main

import "fmt"

func main() {
  s := "café"
  fmt.Printf("%q has length %d\n", s, len(s))

// "café" has length 5

The unicode/utf8 package can do what’s needed though. This provides functions for, amongst other things, picking runes out of strings.

package main

import (

func main() {
  s := "café"
  fmt.Printf("%q has length %d\n", s, utf8.RuneCountInString((s)))

// "café" has length 4

This is very Go-like. The default is somewhat low-level, but the types and libraries build on top of it. For example, text/scanner provides a nice way of iterating over runes in a UTF-8 input stream.

On a whim, I took a look at the internals of utf8.RuneCountInString(). It’s deceptively simple.

func RuneCountInString(s string) (n int) {
  for _ = range s {

This relies on the spec defining how a string interacts with a for loop: it’s defined as iterating over the UTF-8 codepoints (or runes).

Comments are Closed