My Sysadmin Toolbox

After seeing lots of these at recently, I thought I’d try to come up with my own list. I used to be a sysadmin (I’m now a programmer), and I’ve long felt that you really a good set of tools (and to know how to use them) in order to be most productive.

  • zsh

    I spend the vast majority of my time at a command line. Zsh ensures I make best use of my time. If you’ve used bash, you might think you know what completion is—press the tab key and it fills out file names for you. But zsh takes it to a whole new level. Not only does it complete file names, but also users, hostnames, option flags, environment variables, PIDs and more. On top of that, it does it in a context-sensitive manner. So if you type in “chown ” and press TAB, it starts completing usernames. Type in a space and another TAB and it starts completing file names again.

    On top of that, it allows partial completion. If I type in /u/l/e/r/ and press TAB, It gets expanded to /usr/local/etc/rc.d/. This is phenomenally useful.

    But it’s not just completion that zsh is good at. It’s also good at globbing. That’s turning wildcards into filenames. In addition to the usual forms of globbing, zsh can glob recursively. So if I want to look for “foobar” in all my files (but not directories), I can do:

      % grep foobar **/*(.)

    The “**/*” is the recursive glob, and the “(.)” limits it to files and not directories. You can also limit by user, by timestamp and a few other things.

    This is just covering the surface of zsh. Suffice to say that if you make heavy use of the command line, investing some time in learning zsh will make you vastly more productive.

  • screen

    This was mentioned on a few of the other lists as well. GNU Screen on the face of it doesn’t do anything. You just end up with another command line when you first run it. But the beauty of it is that if you get disconnected, you can just log back in, run screen -d -r and pick up exactly where you left off. For me, this is ideal, given the flakiness of my home wireless network. But you might want to use it so you can shut down your PC at night and pick up where you left off in the morning.

    On top of that, screen lets you run multiple command lines at once inside it, log the output and cut’n’paste between them. Think of it as a safety harness for your work.

  • rsync

    Rsync is one of the closer things to magic that’s around. It’s a simple file copying utility. But the clever bit is that it only copies the things that have changed. This doesn’t sound like much until you’ve edited several files in a collection which is 200Mb and needs to be on another box. When rsync tells you it’s finished and only transferred 10Kb instead of 200Mb, you’ll really come to appreciate it.

    If you’re still using tar/gzip or zip to create an archive to ship to another computer, stop wasting your time and disk space. Learn to use rsync and your life will be far more pleasant.

  • OpenSSH

    Thankfully, ssh is pretty ubiquitous these days. It seems to have mostly worked in its mission to eliminate telnet. But it has a few tricks that are worth knowing about.

    First, the agent. One of the nice things about ssh is that it doesn’t have to rely on sending passwords around. Instead, you can use public key authentication. However, even typing in your passphrase can get pretty tedious after a while for every connection. Enter the ssh-agent. Just stick eval `ssh-agent` in your startup scripts and then run ssh-add once. After that, you don’t get asked for your passphrase any more. The only caveat is that now you really need to lock your screen when you walk away from it.

    Next are the tunnels. Ssh is able to create network “tunnels” in and out of otherwise secure locations. This is very handy for creating ad-hoc networks. For example, I’m allowed to ssh into my work, but not to anything else. Yet, I can use RDP to connect to my workstation by running this command:

      % ssh -L3389:myworkstation:3389
      % rdesktop localhost

    That says: listen on port 3389, and any connections that come in, forward them on to myworkstation port 3389 from the other side of the ssh connection to

    If you’re on windows, check out PuTTY. It’s got all the features, but wrapped up in a nice GUI interface.

  • lsof / pfiles / sockstat

    Lsof (List Open Files) is one of the first diagnostic tools that I reach for when I need to understand something. The purpose is simple: it tells you what files (and network connections) a process has open. If you’re wondering where a process is logging to, this might be able to tell you. Conversely, it can also tell you which processes have a particular file open (usually a lock file).

    On Solaris, the pfiles command is similiar.

    On a related note, FreeBSD also has the very, very useful sockstat command, which lists all open sockets and what processes hold them open. The useful bit is that it does this without needing to be root (unlike Linux’s netstat -anp).

  • strace / truss / ktrace

    These are the second diagnostic tool that I reach for when something’s not right. Unix operating systems have a very clear distinction between userland and kernel, and this tool shows all the points where a program crosses between the two (makes a system call). If you really want to know how a program is interacting with its environment, these tools will tell you. It’s godo for answering questions like:

    • What files is this process opening and closing?
    • What connections to the network are being made?
    • What’s been read in by this program?
  • multitail

    A recent addition to my toolbox. It’s like tail -f, except that it looks at more than one file at once. It also does highlighting of search terms. Dead handy.

  • curl

    Most of the stuff I do these days involves the web. Curl is a fantastic little tool for inspecting the web from the command line. It covers all the protocols you need, and can dump out any information about the transaction. Want to issue a PUT request to an SSL server, verifying the certificate and specifing basic auth? It’s got you covered.

  • vim

    Everybody needs a good editor. Vim isn’t the only choice, but it’s pretty likely to be available wherever you go. And once you’ve started learning how to use it properly, you won’t go back. In particular, I can’t live without ^P and ^N for doing completion inside the file you’re editing.

  • sudo

    If you’re still using su, then you need help. Sudo allows you to dole out root access on a much more granular level and you get proper logging of who did what. If you haven’t looked at the manual recently, then check out sudo -e for editing a file as another user. It ensures that you get your regular editor (vim or Emacs) instead of the incredibly unhelpful ed that root probably defaults to.

  • subversion

    Everybody needs version control. If you’re editing files, stick them in subversion. You won’t regret it. Particularly when you need to see what those files looked like 6 months ago.

  • mutt

    Every now and again, you need to deal with mailbox files. Mutt is a great choice for that, thanks to its mini language for filtering mail. Need delete all mail over 10 days old sent by cron@somehost? Not a problem. Even if you don’t use it on a regular basis, it’s worth getting familiar with.

  • gdb

    Yes, this is a programmers tool. But it’s worth knowing a tiny amount about if you’re a sysadmin as well. What for? It lets you see why something dumped core. If you find a core file, then do file core to see what program left it behind and then gdb /path/to/program core. When you’re inside gdb, type in where and it will (most of the time) give you a stack trace, showing what it was doing at the time of the crash. This is normally a big help in trying to figure out what went wrong.

    You can also use gdb to find out what a running program is doing by specifing a PID instead of a corefile.

  • perl / python / ruby

    If you perform the same series of actions more than a couple of times, you should consider investing some time in automating the process. Shell scripts are handy, but can only go so far. Learning one of these languages will give you a really powerful ability to write your own sysadmin toolbox.

  • mediawiki

    Documentation. Everybody hates doing it. Why not make it as easy as possible? A wiki is the answer to that, and mediawiki is one of the better pieces of wiki software out there. It’s pretty simple to get going (although it does depend on MySQL).

    Remember: getting it documented is first priority. Once the information is in the wiki, it can be restructured later. So long as the information is there, it will be searchable (and hence useful).

Hmmm, that’s a bit more than the 10 they wanted. But it’s a large portion of my regular toolkit. Hopefully there’s something useful for other people in there as well…



Well, I’m both annoyed and happy at Apple.

Good news first: 10.4.6 fixes my problem with zsh. I would be curious to find out what the fix actually was…

On the bad news, my replacement mouse still hasn’t turned up. My wireless mouse stopped working in February. Apple promised they’d ship me a new one (after spending an hour on the phone to India). I watched the tracker on their web site. The blasted courier company failed to deliver to my workplace, claiming that they’d tried three times. I never saw them, nor did anybody else in my office. So I phoned Apple again, only spending half an hour on the phone to India this time. They again promised that I’d be shipped a new mouse. That was several weeks ago.

Seriously, Apple make some nice kit, but their phone support is atrocious.


zsh globbing

I love using zsh—it’s full of completely insane features that you wouldn’t even think of using. Until you come across a situation and think: That’s exactly what I need!

Today’s example is recursive globbing. How to pick out all the non-image files in the current directory? I needed to run docs2unix over them. You could come up with some evil find command, but in zsh it looks like this:

  % dos2unix **/*~*.(gif|png|jpg)(.)
  1. **/* picks out all files and directories, recursively.
  2. ~*.(gif|png|jpg) excludes images from that list.
  3. (.) ensures that only files and not directories are chosen.

Easy, peasy. :-)

Update: Even better, case insensitive globbing!

  % dos2unix (#i)**/*~*.(gif|png|jpg)(.)

Terminal, zsh & [Process Completed]

This is mostly a “me too!” post, but it’s been bugging me. Every other time I open a new Terminal window (under OS X Tiger, anyway), I get a [Process Completed] message instead of my shell. According to other people, this happens more if you use zsh, and especially if you close the window using ⌘-W. Interestingly, the same problem occurs in iTerm.

Well, I spent some time with ktrace, zsh, bash and Terminal. Sadly, the results aren’t terribly informative (so far).

First of all, I traced both bash and zsh exiting when ⌘-W was pressed. Neither was particularly interesting. There were no cleanups that one performed that the other did not.

Next, I traced Terminal (and its child processes) when starting up zsh. Twice. The first time everything worked, the second time everything broke. Now the trace makes the point of breakage plainly clear. The broken one gets stuck in a loop reading EOF from file descriptor 17, whereas the working one does not.

Looking back through the trace, file descriptor 17 is opened to /dev/ptyp4. According to pty, that’s a master pseudo terminal. It’s opened inside the Terminal process itself, which then forks, dup2’s the master pty to fd 0 (stdin) and then exec’s /usr/libexec/pt_chown (pt_chown source in FreeBSD should be similiar, judging by strings output).

We don’t see any output for pt_chown as it’s setuid root. But Terminal waits for it to finish, after it’s presumably corrected the ownership of the slave tty. Next, fd 18 is opened by Terminal as the slave tty4, Terminal forks and it’s then dup’d to stdin, stdout and stderr. fd 17 is then closed in that process and login is called.

At this point, Terminal’s got pty4 open, and login’s got tty4 and it’s basically a pipe between the two processes. Except that the kernel is making it look like a genuine 70’s era serial connection to the slave.

Anyway, it’s here that the two traces diverge (coming back to the original problem). The good session reads from the master pty, and then calls stat(2) on tty4. But it’s weird as the trace shows no return value for the read call. OTOH, the broken trace just shows a return value of EOF (0 bytes read) and loops around doing that for a few thousand times.

Funnily enough, the trace looks pretty much identical to that which would be produced by program 19.3 in Rich Steven’s APUE (best book I ever bought).

Sadly, it still hasn’t gotten me to the bottom of the problem. But I have refreshed my memory about how pseudo-terminals work. 🙂 I suspect that to make further progress, I’d have to run login under ktrace as well. I imagine that something is going wrong in there causing it to exit early, which is why the read in the parent returns EOF. Even then, I’m not sure if running it all as root would affect the outcome. Probably is my guess…

And no, I will not stop using zsh!

Update: A useful tip for debugging what your shell is doing at startup under osx. Add these lines to the top of ~/.zshrc (or ~/.bashrc if you’re a luddite 😉 ):

  lsof -p $$ > ~/lsof.out.$$
  ktrace -p $$ -f ~/ktrace.out.$$

This still doesn’t capture what’s going on in the parent process (login), but it does give you an idea of what the shell is getting up to.


OpenSSH & zsh misfeature

For a while, I’ve been using a little trick that I found on the zsh wiki (CompletionExamples) to automatically turn my known_hosts file into a set of host names. Unfortunately, the latest Ubuntu upgrade has turned on a new feature of OpenSSH, HashKnownHosts (detailed in ssh_config(5)). Unfortunately, this breaks the parsing because the hostnames are no longer stored in the known_hosts file.

The simple workaround, in my case, was to stick HashKnownHosts no into /etc/ssh/ssh_config. And now everything’s back to normal.

Update: As Aristotle points out below, this is definitely a trade off of security vs convenience. Don’t do it if you’re not happy with the consequences.


zsh editor integration

A top tip from the zsh book:

autoload -U edit-command-line
zle -N edit-command-line
bindkey 'ee' edit-command-line

Binds M-e to “stick this command line in your text editor.”


mutt completion in zsh

After a little while spent battling with zsh’s completion system today, I have made it complete my aliases file. My setup is slightly non standard (although not that unusual I think) in that the aliases aren’t defined in my ~/.muttrc file. Instead, they’re in ~/.mail_aliases. To get zsh to know about this requires a small function to go in your ~/.zshrc:

_email-mutt() {

I managed to get that by looking at the source for the _email_addresses completion function. I don’t know how I could have got by without it. The zsh book should turn up in a few days; maybe that will enlighten me more.