First aid for Panic Dump Analyses on Solaris 10 / 11

1. Creat a “save” panic:
# savecore -L

2. Decompress it;
# savecore  -vf /var/crash/vmdump.0

3. Load the panic dump in the kernel debugger:
/var/crash# mdb -k /var/crash/unix.0 vmcore.0

4. Analyse..
See the processes that were running at the time of the panic:
> ::ps

Check the message buffer: (normally seen with ‘dmesg’)
> ::msgbuf

5. On a TEST system, initiate a TRUE panic by destroying the contents of
the memory location known as “rootdir”. This memory location basically
points to the (memory)address of the root.

!!!!!!!!DO THIS AT OUR OWN RISK!!!!!!!

# sync
# echo “rootdir/W 0t0” |mdb -w -k /dev/ksyms /dev/kmem

6. Uncompress the panic again
# savecore  -vf /var/crash/vmdump.1

7. Load the panic in mdb
/var/crash# mdb -k /var/crash/unix.1 vmcore.1

8. First check the obvious:
> ::ps

get some general system information:
> $<utsname

Most of all:
> ::status
debugging crash dump vmcore.1 (64-bit) from sol11-1
operating system: 5.11 11.0 (i86pc)
image uuid: 4a5723a9-8692-ebba-e648-a43cf58dc1eb
panic message: BAD TRAP: type=e (#pf Page fault) rp=ffffff0007b66a90 addr=ffffff0100000000
dump content: kernel pages only
WARNING: /dev/kmem written to, use ::kmemstatus to view log

Here the cause of the panic can already be identified: a page fault.

9. What was the OS doing at the time of the panic that might have provoked the
> ::msgbuf
BAD TRAP: type=e (#pf Page fault) rp=ffffff0007b66a90 addr=ffffff0100000000

So now look for the thread ffffff01b426e880 that was running on one of the run-queues:

>::cpuinfo -v
4 fffffffffbc48d30  1b    0    0   1   no    no t-110  ffffff01b426e880 bash
“bash” doesn’t say much here but then the memory occuption is not attributable
to a specific process.
At lease we can identify that at the time of the panic, this thread was being scheduled on CPU ID 4. See ‘psrinfo’ for more information on that CPU.

A shortcut to this information is:
> ::panicinfo
cpu                4
thread ffffff01b426e880
message BAD TRAP: type=e (#pf Page fault) rp=ffffff0007b66a90 addr=ffffff0100000000

Other information that might prove of interest, but will probably only work
on a “real” panic:

9. Get the HEX memory address of the panicstring

> panicstr/X
panicstr:       fb958850

10. Convert the data from that address panicstr to strings (“/s”):
> fb958850/s

11. Get the time of the panic
> time/Y
time:           2013 Jun 12 23:24:14

12. Review the stacktrace
> $C
> ::stack

13. Examine the panic thread
> panic_thread/X
panic_thread:   b426e880
> b426e880$<thread

This entry was posted in Solaris / linux. Bookmark the permalink.