I need to migrate the old web site over to wordpress. There’s a lot to change though and I need to make sure that it goes (relatively) smoothly. I don’t want to end up with too many 404s.

Stuff that needs thinking about:

  • favicon. I probably should import into the blog bits, maybe under wp-content?
  • Staged. This blog should be accessible at until all content is migrated.
  • /dist needs to be separated out; I don’t want to make it part of this blog.
  • /books, /comics and /scripts all need a damned great makeover. Dunno how yet.

I think it’s important to get the old content migrated before I put this in place. On the other hand, looking at the logs makes me realise that there really aren’t that many hits for a lot of it. My most popular article How To Copy Files Across A Network has had only 5560 hits in the last year. Still, migrating the articles isn’t hard and I should probably give that a go first.


How To Copy Files Across A Network

There are several ways to copy files across a network using Unix machines. This document aims to outline how to use some of the more common methods, and how to use them effectively.

I assume the use of ssh(1) in all situations. Anything else just isn’t of much use these days. Hopefully, all these methods will be more secure than current practices involving ftp(1) or rcp(1).


scp is probably the easiest of all of the three methods. It was designed as a replacement for rcp(1), which was a one night hack to be a networked version of cp(1). So it uses a fairly easy, familiar syntax:

scp [-Cr] /some/file [ more ... ]

Before it does any copying, scp will use the ssh authentication to connect. Normally, this consists of asking your for your password or passphrase. If you are having trouble getting scp working, try connecting with ssh -v first. If that works, scp should, too.

By specifying a -r flag, you can specify that the source file can be a directory, and if so to copy it recursively. This means you can copy large trees to another computer easily.

scp will encrypt your file as it gets copied across, but by specifying the -C option, you can ask scp to compress your data automatically as it travels. This is especially good with things like large text files (including XML and HTML), as they compress very well. Doing this can save a significant amount of time on a large copy.

One final tip. By default, scp uses the 3DES encryption algorithm. All encryption algorithms are slow, but you can get a slight speed up by specifying a different one. Try adding "-c blowfish" to your command line to see if it speeds up some.

As useful as it is, there are some things that you shouldn’t use scp for.

  1. Mostly, this is where you are copying more than a few files at a time. scp spawns a new process at the remote end for each file you are copying, so it can be quite slow.
  2. When using the -r flag, you must be careful. It does not understand symbolic links. If you have a symbolic link, scp will blindly follow it, even if it points to a directory that’s already been copied. This can lead to scp copying an infinite amount of data, or at the very least, one full disks worth. Be careful!


rsync again has a very similiar syntax to rcp:

rsync -e ssh [-avz] /some/file [ more ... ]

rsync’s speciality is copying large files or collections of files that have small changes made to them. It calculates the differences between files and only transfers the parts that have changed. This can lead to enourmous improvements in speed when copying a directory tree for a 2nd time.

The flags in that command are:

  • -a – Archive mode. This should probably always be on. It asks rsync to attempt to preserve permissions timestamps, ownerships and so on. It also doesn’t follow symlinks.
  • -v – Verbose mode. List the files that are being transferred.
  • -z – Enable compression. This will compress each file as it gets sent over the pipe. Depending upon the data you are copying, this can be a big win.
  • -e ssh – Use ssh as the transport. You should always specify this. If you get tired of typing it in each time, you can type in this command to set the default for rsync: export RSYNC_RSH=ssh

What disadvantages does rsync have? Not that many…

  1. It does have picky syntax though. In particular, the use of trailing slashes on source directories can imply different meanings as to how that directory is copied, which can be confusing.
  2. You have to remember to specify that you’re using ssh.
  3. rsync isn’t installed everywhere.


tar is normally an archiving program for backups. But with the use of ssh, it can be coerced into copying large directory trees with ease. It has the advantage that it copies things correctly, even ACLs on those Unixes which have them, as well as being perfectly comfortable with symlinks.

The syntax, however, is slightly baroque:

tar -cf - /some/file | ssh tar -xf - -C /destination

Whilst it looks complex, at heart it’s quite simple: create a tar archive to stdout, send it across the network to another tar on the remote machine for unpacking.

The arguments to the first tar are -c to create a new archive and -f –, which tells tar to send the newly created archive to stdout.

The arguments to the second tar command are the reverse: -x to extract the archive and -f – to take the archive from stdin. The final -C /destination tells tar to change into the /destination directory before extracting anything.

Why should you use this method when the other two are available? For initial copying of large directory trees, this method can be very quick, because it streams. The first tar will send it’s output as soon as it has found the first file in the source directory tree, and that will be extracted almost immediately afterwards by the 2nd tar. Meanwhile, the first tar is still finding and transmitting files. This pipeline works very well.

As with the other two methods, you can ask for compression of the data stream if your source data is amenable to it. Here, you have to add a -z flag to each tar:

tar -czf - /some/file | ssh tar -xzf - -C /destination

In a similiar fashion, you can enable verbose mode by passing a -v flag to the 2nd tar. Don’t pass it to the first one as well, or you’ll get doubled output!

Why shouldn’t you use this method?

  1. The syntax is a pain to remember.
  2. It’s not as quick to type as the scp command, for small amounts of files.
  3. rsync will beat it hands down for a tree of files that already exists on the destination.

Old Content

I have lots of content from my old site which needs to be brought in to the new blog. I’ll start adding it as new posts and add redirects on the old locations so that they still work.

Thankfully, this should be fairly easy as it’s all XHTML content to begin with. The only difficulty will be the lists of books and comix. I’ll worry about those later.


Learning Java, Again

After a period of nearly 8 years, I’m coming back to Java. I looked at it in 1996 when I was in college. I bought Java 1.0 in a nutshell. Times have definitely changed. I picked up Java 1.4 in a nutshell and it’s 4 times the size. Yikes, it’s at least 3 times the size.

But anyway, the employer is directing me at Java, after 4 years of Perl. It’s time to learn about servlets, which thankfully appear to be quite similar to mod_perl. It’s not totally surprising given that they cover much of the same ground. But as everything with Java seems, it’s far more wordy, verbose and formal. I fully understand those who call it the COBOL of the future.

Of course, it’s not just Java to learn. It’s also the environment that goes with it. I’ve been using Unix for a long time and know my way around compilers, linkers, makefiles and so on. But now I have to contend with eclipse, netbeans, ant, tomcat and a host of other things. Lots to learn, but hopefully rewarding.


subversion crash

Normally I run subversion under apache as I like being able to get at my stuff from anywhere. But recently, some upgrade has broken. I’ve now started seeing broken checkouts. This is most disconcerting. For now, I’ve switched to accessing it over the filesystem which seems to work ok. But an update on that mailing list post…


I’ve managed to get a stack trace. I switched to gdb 5.3 instead of the system default (6). And that managed to get me this stack trace:

#0  0x0807a015 in core_output_filter ()
#1  0x285c3c0d in logio_out_filter () from /usr/local/libexec/apache2/
#2  0x0805d253 in chunk_filter ()
#3  0x080744c0 in ap_content_length_filter ()
#4  0x08061757 in ap_byterange_filter ()
#5  0x285d130e in expires_filter () from /usr/local/libexec/apache2/
#6  0x2820d35d in apr_brigade_write () from /usr/local/lib/apache2/
#7  0x2820d9f2 in apr_brigade_vprintf () from /usr/local/lib/apache2/
#8  0x289c42b7 in send_xml () from /usr/local/libexec/apache2/
#9  0x289c5049 in upd_change_xxx_prop () from /usr/local/libexec/apache2/
#10 0x289da9da in change_file_prop () from /usr/local/lib/
#11 0x289dacae in delta_proplists () from /usr/local/lib/
#12 0x289db73a in update_entry () from /usr/local/lib/
#13 0x289db19c in delta_dirs () from /usr/local/lib/
#14 0x289db872 in update_entry () from /usr/local/lib/
#15 0x289db19c in delta_dirs () from /usr/local/lib/
#16 0x289db872 in update_entry () from /usr/local/lib/
#17 0x289db19c in delta_dirs () from /usr/local/lib/
#18 0x289db872 in update_entry () from /usr/local/lib/
#19 0x289db19c in delta_dirs () from /usr/local/lib/
#20 0x289db872 in update_entry () from /usr/local/lib/
#21 0x289db19c in delta_dirs () from /usr/local/lib/
#22 0x289dc31e in svn_repos_finish_report () from /usr/local/lib/
#23 0x289c5e25 in dav_svn__update_report () from /usr/local/libexec/apache2/
#24 0x289c79b9 in dav_svn_deliver_report () from /usr/local/libexec/apache2/
#25 0x28615b37 in dav_method_report () from /usr/local/libexec/apache2/
#26 0x2861719d in dav_handler () from /usr/local/libexec/apache2/
#27 0x08065275 in ap_run_handler ()
#28 0x080656cb in ap_invoke_handler ()
#29 0x08062679 in ap_process_request ()
#30 0x0805d468 in ap_process_http_connection ()
#31 0x0806f3b5 in ap_run_process_connection ()
#32 0x0806357a in child_main ()
#33 0x08063778 in make_child ()
#34 0x08063880 in startup_children ()
#35 0x08064032 in ap_mpm_run ()
#36 0x0806a835 in main ()
#37 0x0805cef6 in _start ()

Now I need to start building debug versions of apache and subversion in order to start making some sense of that. It’s my suspicion that subversion is sending bad buckets to apache somehow.


Hello World

It’s time to start using this site for something useful. I’ve used wordpress at work for a while, and it seems like it could be useful here too. It’s a lot easier than the previous setup of cvs+xslt+make.

I’ve decided on wordpress-pg since I don’t have MySQL installed and didn’t understand it when I tried. PostgreSQL works well and feels a lot simpler to me. Plus it doesn’t involve threads, which is always a bonus. Disturbingly, the images in the tarball appeared corrupted, I’ll have to let them know. I’m still keeping the wordpress-pg code in subversion, but the data is safely stored in a database where it belongs.

What sort of content goes here? Well, it’s likely to be mostly technical since that’s a large part of me. But hopefully I’ll get some cycling in too.

Anyway, on with the writing…