I need to migrate the old web site over to wordpress. There’s a lot to change though and I need to make sure that it goes (relatively) smoothly. I don’t want to end up with too many 404s.

Stuff that needs thinking about:

  • favicon. I probably should import into the blog bits, maybe under wp-content?
  • Staged. This blog should be accessible at until all content is migrated.
  • /dist needs to be separated out; I don’t want to make it part of this blog.
  • /books, /comics and /scripts all need a damned great makeover. Dunno how yet.

I think it’s important to get the old content migrated before I put this in place. On the other hand, looking at the logs makes me realise that there really aren’t that many hits for a lot of it. My most popular article How To Copy Files Across A Network has had only 5560 hits in the last year. Still, migrating the articles isn’t hard and I should probably give that a go first.


How To Copy Files Across A Network

There are several ways to copy files across a network using Unix machines. This document aims to outline how to use some of the more common methods, and how to use them effectively.

I assume the use of ssh(1) in all situations. Anything else just isn’t of much use these days. Hopefully, all these methods will be more secure than current practices involving ftp(1) or rcp(1).


scp is probably the easiest of all of the three methods. It was designed as a replacement for rcp(1), which was a one night hack to be a networked version of cp(1). So it uses a fairly easy, familiar syntax:

scp [-Cr] /some/file [ more ... ]

Before it does any copying, scp will use the ssh authentication to connect. Normally, this consists of asking your for your password or passphrase. If you are having trouble getting scp working, try connecting with ssh -v first. If that works, scp should, too.

By specifying a -r flag, you can specify that the source file can be a directory, and if so to copy it recursively. This means you can copy large trees to another computer easily.

scp will encrypt your file as it gets copied across, but by specifying the -C option, you can ask scp to compress your data automatically as it travels. This is especially good with things like large text files (including XML and HTML), as they compress very well. Doing this can save a significant amount of time on a large copy.

One final tip. By default, scp uses the 3DES encryption algorithm. All encryption algorithms are slow, but you can get a slight speed up by specifying a different one. Try adding "-c blowfish" to your command line to see if it speeds up some.

As useful as it is, there are some things that you shouldn’t use scp for.

  1. Mostly, this is where you are copying more than a few files at a time. scp spawns a new process at the remote end for each file you are copying, so it can be quite slow.
  2. When using the -r flag, you must be careful. It does not understand symbolic links. If you have a symbolic link, scp will blindly follow it, even if it points to a directory that’s already been copied. This can lead to scp copying an infinite amount of data, or at the very least, one full disks worth. Be careful!


rsync again has a very similiar syntax to rcp:

rsync -e ssh [-avz] /some/file [ more ... ]

rsync’s speciality is copying large files or collections of files that have small changes made to them. It calculates the differences between files and only transfers the parts that have changed. This can lead to enourmous improvements in speed when copying a directory tree for a 2nd time.

The flags in that command are:

  • -a – Archive mode. This should probably always be on. It asks rsync to attempt to preserve permissions timestamps, ownerships and so on. It also doesn’t follow symlinks.
  • -v – Verbose mode. List the files that are being transferred.
  • -z – Enable compression. This will compress each file as it gets sent over the pipe. Depending upon the data you are copying, this can be a big win.
  • -e ssh – Use ssh as the transport. You should always specify this. If you get tired of typing it in each time, you can type in this command to set the default for rsync: export RSYNC_RSH=ssh

What disadvantages does rsync have? Not that many…

  1. It does have picky syntax though. In particular, the use of trailing slashes on source directories can imply different meanings as to how that directory is copied, which can be confusing.
  2. You have to remember to specify that you’re using ssh.
  3. rsync isn’t installed everywhere.


tar is normally an archiving program for backups. But with the use of ssh, it can be coerced into copying large directory trees with ease. It has the advantage that it copies things correctly, even ACLs on those Unixes which have them, as well as being perfectly comfortable with symlinks.

The syntax, however, is slightly baroque:

tar -cf - /some/file | ssh tar -xf - -C /destination

Whilst it looks complex, at heart it’s quite simple: create a tar archive to stdout, send it across the network to another tar on the remote machine for unpacking.

The arguments to the first tar are -c to create a new archive and -f –, which tells tar to send the newly created archive to stdout.

The arguments to the second tar command are the reverse: -x to extract the archive and -f – to take the archive from stdin. The final -C /destination tells tar to change into the /destination directory before extracting anything.

Why should you use this method when the other two are available? For initial copying of large directory trees, this method can be very quick, because it streams. The first tar will send it’s output as soon as it has found the first file in the source directory tree, and that will be extracted almost immediately afterwards by the 2nd tar. Meanwhile, the first tar is still finding and transmitting files. This pipeline works very well.

As with the other two methods, you can ask for compression of the data stream if your source data is amenable to it. Here, you have to add a -z flag to each tar:

tar -czf - /some/file | ssh tar -xzf - -C /destination

In a similiar fashion, you can enable verbose mode by passing a -v flag to the 2nd tar. Don’t pass it to the first one as well, or you’ll get doubled output!

Why shouldn’t you use this method?

  1. The syntax is a pain to remember.
  2. It’s not as quick to type as the scp command, for small amounts of files.
  3. rsync will beat it hands down for a tree of files that already exists on the destination.

Old Content

I have lots of content from my old site which needs to be brought in to the new blog. I’ll start adding it as new posts and add redirects on the old locations so that they still work.

Thankfully, this should be fairly easy as it’s all XHTML content to begin with. The only difficulty will be the lists of books and comix. I’ll worry about those later.