Terminal, zsh & [Process Completed]

This is mostly a “me too!” post, but it’s been bugging me. Every other time I open a new Terminal window (under OS X Tiger, anyway), I get a [Process Completed] message instead of my shell. According to other people, this happens more if you use zsh, and especially if you close the window using ⌘-W. Interestingly, the same problem occurs in iTerm.

Well, I spent some time with ktrace, zsh, bash and Terminal. Sadly, the results aren’t terribly informative (so far).

First of all, I traced both bash and zsh exiting when ⌘-W was pressed. Neither was particularly interesting. There were no cleanups that one performed that the other did not.

Next, I traced Terminal (and its child processes) when starting up zsh. Twice. The first time everything worked, the second time everything broke. Now the trace makes the point of breakage plainly clear. The broken one gets stuck in a loop reading EOF from file descriptor 17, whereas the working one does not.

Looking back through the trace, file descriptor 17 is opened to /dev/ptyp4. According to pty, that’s a master pseudo terminal. It’s opened inside the Terminal process itself, which then forks, dup2’s the master pty to fd 0 (stdin) and then exec’s /usr/libexec/pt_chown (pt_chown source in FreeBSD should be similiar, judging by strings output).

We don’t see any output for pt_chown as it’s setuid root. But Terminal waits for it to finish, after it’s presumably corrected the ownership of the slave tty. Next, fd 18 is opened by Terminal as the slave tty4, Terminal forks and it’s then dup’d to stdin, stdout and stderr. fd 17 is then closed in that process and login is called.

At this point, Terminal’s got pty4 open, and login’s got tty4 and it’s basically a pipe between the two processes. Except that the kernel is making it look like a genuine 70’s era serial connection to the slave.

Anyway, it’s here that the two traces diverge (coming back to the original problem). The good session reads from the master pty, and then calls stat(2) on tty4. But it’s weird as the trace shows no return value for the read call. OTOH, the broken trace just shows a return value of EOF (0 bytes read) and loops around doing that for a few thousand times.

Funnily enough, the trace looks pretty much identical to that which would be produced by program 19.3 in Rich Steven’s APUE (best book I ever bought).

Sadly, it still hasn’t gotten me to the bottom of the problem. But I have refreshed my memory about how pseudo-terminals work. 🙂 I suspect that to make further progress, I’d have to run login under ktrace as well. I imagine that something is going wrong in there causing it to exit early, which is why the read in the parent returns EOF. Even then, I’m not sure if running it all as root would affect the outcome. Probably is my guess…

And no, I will not stop using zsh!

Update: A useful tip for debugging what your shell is doing at startup under osx. Add these lines to the top of ~/.zshrc (or ~/.bashrc if you’re a luddite 😉 ):

  lsof -p $$ > ~/lsof.out.$$
  ktrace -p $$ -f ~/ktrace.out.$$

This still doesn’t capture what’s going on in the parent process (login), but it does give you an idea of what the shell is getting up to.

2 Comments to Terminal, zsh & [Process Completed]

  1. Daniel Jalkut says:

    I’m glad to see somebody else taking up the task of investigating this. It’s still on my “find more info someday” list because, unfortunately, my “Victory Over!” post turned out to be only 95% true. I still get the bug every once in a while, despite my mostly successful workaround!

    I’m looking forward to seeing more analysis if you get renewed inspiration for possible sources of the problem. It’s a tricky sucker, but boy will I be relieved the day it’s fixed for good!

  2. Dominic Mitchell says:

    I will take it up again, but like all these things, it’s just a question of finding the time. I’ve got so much other stuff I’m meant to be working on right now. 🙁

    Thanks for the kind words.