FreeBSD Crash Dump Analysis

Unfortunately, I had a crash earlier on my FreeBSD server. It’s probably due to bad hardware. But I decided to take a look at the dumpfile anyway. The best resource is the FreeBSD handbook section on Debugging a Kernel Crash Dump with kgdb. That’s invaluable.

Sadly, it didn’t tell me much:

Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x8
fault code              = supervisor write, page not present
instruction pointer     = 0x20:0xc04c8f79
stack pointer           = 0x28:0xd13ddb44
frame pointer           = 0x28:0xd13ddb54
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = resume, IOPL = 0
current process         = 864 (squid)
trap number             = 12
panic: page fault

So at the time, squid was sitting around waiting for things to happen (the stacktrace showed it as being inside poll()) and the machine blew up. Darn. Ah well, the standard response is to just make world and try again.


Recursive Rails

I’ve been ploughing through the examples in the Rails book. Until I got to the testing section. In particular, the test_checkout method. I started seeing core dumps. Inspecting the core dump with gdb showed that there was obviously something going wrong recursively.

I had a look in the test.log file and it showed that was calling display_cart continuously in a loop, with an empty set of options. Tracking down a nearby log message, I got looking at components.rb. It was obvious that component_logging was being called with an empty set of optipons. But at this point I was more or less out of my depth (I’ve only just started learning ruby). So I turned to the rails development site for help.

There, a quick search for render_component quickly led me to ticket#2695. Glad to see I’m not the only one having the same problem. It also revealed that it had been fixed in change#2829. I manually patched my installation to do the same thing and everything works fine now.

But what I find really odd is the fact that Ruby will core dump (SIGILL—illegal instruction) when it recurses too deeply. It’s a bit weird that it doesn’t protect against that. Perhaps its just my installation on FreeBSD? I’ll have to test Linux as well. Hmm, there it raises a SystemStackError instead, which is more reasonable. I’ll have to point this out on the freebsd-ports mailing list.


subversion crash, redux

After much fiddling, upgrading, dumping, restoring, I’ve realised that I don’t have a clue what the problem in subversion is. What’s really irritating is that when I compile it with debugging, the problem goes away. So, I’ve accepted that as a solution for a moment, disconcerting as it may be. At least I’ll be able to checkout without crashing.


subversion crash

Normally I run subversion under apache as I like being able to get at my stuff from anywhere. But recently, some upgrade has broken. I’ve now started seeing broken checkouts. This is most disconcerting. For now, I’ve switched to accessing it over the filesystem which seems to work ok. But an update on that mailing list post…


I’ve managed to get a stack trace. I switched to gdb 5.3 instead of the system default (6). And that managed to get me this stack trace:

#0  0x0807a015 in core_output_filter ()
#1  0x285c3c0d in logio_out_filter () from /usr/local/libexec/apache2/
#2  0x0805d253 in chunk_filter ()
#3  0x080744c0 in ap_content_length_filter ()
#4  0x08061757 in ap_byterange_filter ()
#5  0x285d130e in expires_filter () from /usr/local/libexec/apache2/
#6  0x2820d35d in apr_brigade_write () from /usr/local/lib/apache2/
#7  0x2820d9f2 in apr_brigade_vprintf () from /usr/local/lib/apache2/
#8  0x289c42b7 in send_xml () from /usr/local/libexec/apache2/
#9  0x289c5049 in upd_change_xxx_prop () from /usr/local/libexec/apache2/
#10 0x289da9da in change_file_prop () from /usr/local/lib/
#11 0x289dacae in delta_proplists () from /usr/local/lib/
#12 0x289db73a in update_entry () from /usr/local/lib/
#13 0x289db19c in delta_dirs () from /usr/local/lib/
#14 0x289db872 in update_entry () from /usr/local/lib/
#15 0x289db19c in delta_dirs () from /usr/local/lib/
#16 0x289db872 in update_entry () from /usr/local/lib/
#17 0x289db19c in delta_dirs () from /usr/local/lib/
#18 0x289db872 in update_entry () from /usr/local/lib/
#19 0x289db19c in delta_dirs () from /usr/local/lib/
#20 0x289db872 in update_entry () from /usr/local/lib/
#21 0x289db19c in delta_dirs () from /usr/local/lib/
#22 0x289dc31e in svn_repos_finish_report () from /usr/local/lib/
#23 0x289c5e25 in dav_svn__update_report () from /usr/local/libexec/apache2/
#24 0x289c79b9 in dav_svn_deliver_report () from /usr/local/libexec/apache2/
#25 0x28615b37 in dav_method_report () from /usr/local/libexec/apache2/
#26 0x2861719d in dav_handler () from /usr/local/libexec/apache2/
#27 0x08065275 in ap_run_handler ()
#28 0x080656cb in ap_invoke_handler ()
#29 0x08062679 in ap_process_request ()
#30 0x0805d468 in ap_process_http_connection ()
#31 0x0806f3b5 in ap_run_process_connection ()
#32 0x0806357a in child_main ()
#33 0x08063778 in make_child ()
#34 0x08063880 in startup_children ()
#35 0x08064032 in ap_mpm_run ()
#36 0x0806a835 in main ()
#37 0x0805cef6 in _start ()

Now I need to start building debug versions of apache and subversion in order to start making some sense of that. It’s my suspicion that subversion is sending bad buckets to apache somehow.