This is mostly a “me too!” post, but it’s been bugging me. Every other time I open a new Terminal window (under OS X Tiger, anyway), I get a [Process Completed]
message instead of my shell. According to other people, this happens more if you use zsh, and especially if you close the window using ⌘-W. Interestingly, the same problem occurs in iTerm.
- Victory over ‘[Process Completed]’! reports that it’s something to do with ttys. He also offers a workaround, but it’s just that, a workaround.
- Tigers and broken Terminals and xterms also reports that it’s pty related.
Well, I spent some time with ktrace, zsh, bash and Terminal. Sadly, the results aren’t terribly informative (so far).
First of all, I traced both bash and zsh exiting when ⌘-W was pressed. Neither was particularly interesting. There were no cleanups that one performed that the other did not.
Next, I traced Terminal (and its child processes) when starting up zsh. Twice. The first time everything worked, the second time everything broke. Now the trace makes the point of breakage plainly clear. The broken one gets stuck in a loop reading EOF from file descriptor 17, whereas the working one does not.
Looking back through the trace, file descriptor 17 is opened to /dev/ptyp4
. According to pty, that’s a master pseudo terminal. It’s opened inside the Terminal process itself, which then forks, dup2’s the master pty to fd 0 (stdin) and then exec’s /usr/libexec/pt_chown
(pt_chown source in FreeBSD should be similiar, judging by strings output).
We don’t see any output for pt_chown as it’s setuid root. But Terminal waits for it to finish, after it’s presumably corrected the ownership of the slave tty. Next, fd 18 is opened by Terminal as the slave tty4, Terminal forks and it’s then dup’d to stdin, stdout and stderr. fd 17 is then closed in that process and login is called.
At this point, Terminal’s got pty4 open, and login’s got tty4 and it’s basically a pipe between the two processes. Except that the kernel is making it look like a genuine 70’s era serial connection to the slave.
Anyway, it’s here that the two traces diverge (coming back to the original problem). The good session reads from the master pty, and then calls stat(2) on tty4. But it’s weird as the trace shows no return value for the read call. OTOH, the broken trace just shows a return value of EOF (0 bytes read) and loops around doing that for a few thousand times.
Funnily enough, the trace looks pretty much identical to that which would be produced by program 19.3 in Rich Steven’s APUE (best book I ever bought).
Sadly, it still hasn’t gotten me to the bottom of the problem. But I have refreshed my memory about how pseudo-terminals work. 🙂 I suspect that to make further progress, I’d have to run login under ktrace as well. I imagine that something is going wrong in there causing it to exit early, which is why the read in the parent returns EOF. Even then, I’m not sure if running it all as root would affect the outcome. Probably is my guess…
And no, I will not stop using zsh!
Update: A useful tip for debugging what your shell is doing at startup under osx. Add these lines to the top of ~/.zshrc
(or ~/.bashrc
if you’re a luddite 😉 ):
lsof -p $$ > ~/lsof.out.$$ ktrace -p $$ -f ~/ktrace.out.$$
This still doesn’t capture what’s going on in the parent process (login), but it does give you an idea of what the shell is getting up to.