% CVSId: $Id: kernel_oopsing.tex,v 1.4 2003/03/03 08:23:10 mulix Exp $
\documentclass[final, total, pdf, colorBG, slideColor, azure]{prosper}

\title{Linux Kernel Debugging}
\subtitle{Your kernel just oopsed - What do you do, hotshot?}
\author{Muli Ben-Yehuda}
\email{mulix@mulix.org}
\institution{IBM Haifa Research Lab}

\slideCaption{Kernel Debugging, IBM HRL LKDSG 2003}

\begin{document}

\maketitle

\begin{slide}{Kernel Debugging - Why?}

\begin{itemize}

\item Why would we want to debug the kernel? after all, it's the one
part of the system that we never have to worry about, because it
always works.
\item Well, no. 

\end{itemize}

\end{slide}

\begin{slide}{Kernel Debugging - Why?(cont)}

\begin{itemize}

\item Because a driver is not working as well as it should, or is not
working at all. 

\item Because we have a school or work project. 

\item Because the kernel is crashing, and we don't know why. 

\item Because we want to learn how the kernel works. 

\item Because it's fun! Real men hack kernels ;-) 

\end{itemize}

\end{slide}

\begin{slide}{Broad Overview of the Kernel}

\begin{itemize}

\item Over a million lines of code. 

\begin{itemize}
\item Documentation/
\item drivers/
\item kernel/
\item arch/
\item fs/
\item lib/
\item mm/
\item net/
\item Others: security/ include/ sound/ init/ usr/ crypto/ ipc/
\end{itemize}

\end{itemize}

\end{slide}

\begin{slide}{Broad Kernel Overview (cont)}

\begin{itemize}

\item Supports runtime loading and unloading of additional code
(kernel modules). 
\item Configured using Kconfig, a domain specifc configuration
language. 
\item Built using kbuild, a collection of complex Makefiles. 
\item Heavily dependant on gcc and gccisms. Does {\em not} use or link
with user space libraries, although supplies many of them - sprintf,
memcpy, strlen, printk (not printf!). 

\end{itemize} 

\end{slide}

\begin{slide}{Read the Source, Luke}

\begin{itemize}

\item The source is there - use it to figure out what's going on. 

\item Linux kernel developers frown upon binary only modules, because
they don't have the source and thus cannot debug them. 

\item Later kernels include facilities to mark when a binary only
module has been loaded (``tainted kernels''). Kernel developers will
kindly refuse to help debug a problem when a kernel has been tainted. 

\end{itemize}

\end{slide} 

\begin{slide}{Read the Source, Luke (cont)}

Use the right tools for the job. Tools to navigate the source include:

\begin{itemize}

\item lxr - 
\href{http://www.iglu.org.il/lxr/}{http://www.iglu.org.il/lxr/}

\item find and grep
\item ctags, etags, gtags and their ilk. 

\end{itemize} 

Use a good IDE

\begin{itemize}

\item emacs
\item vi
\item One brave soul I heard about used MS Visual Studio!

\end{itemize} 

\end{slide}

\begin{slide}{Use the source}

The two oldest and most useful debugging aids are

\begin{itemize}

\item Your brain. 
\item printf. 

\end{itemize} 

Use them! the kernel gives you printk, which 

\begin{itemize}

\item Can be called from interrupt context.
\item Behaves mostly like printf, except that it doesn't support
floating point. 

\end{itemize}

\end{slide}

\begin{slide}{Use the Source (cont)}

Use something like this snippet to turn printks on and off depending
on whether you're building a debug or relase build. 

\begin{tiny}
\begin{verbatim}

#ifdef DEBUG_FOO 

#define CDBG(msg, args...) do {                             \
        printk(KERN_DEBUG "[%s] " msg , __func__ , ##args );\
} while (0)

#else /* !defined(DEBUG_FOO) */ 

#define CDBG(msg, args...) do {} while (0)

#endif /* !defined(DEBUG_FOO) */ 

\end{verbatim}
\end{tiny}

\end{slide}

\begin{slide}{Use the Source (cont)}

\begin{itemize}

\item For really tough bugs, write code to solve bugs. Don't be afraid to
insert new kernel modules to monitor or affect your primary 
development focus. 

\item {\bf Code defensively}. Whenever you suspect memory overwrites or
use after free, use memory poisoning. 

\item Enable all of the kernel debug options - they will find your
bugs for you!

\item
\begin{tiny}
\begin{verbatim}
#define assert(x) do { if (!(x)) BUG(); } while (0)
\end{verbatim}
\end{tiny}

\item Linux 2.5 has BUG\_ON(). 

\end{itemize}

\end{slide}

\begin{slide}{Kernel Debuggers}

Linux has several kernel debuggers, none of which are in the main tree
(for the time being). The two most common are

\begin{itemize}

\item kdb -
\href{http://oss.sgi.com/projects/kdb}{http://oss.sgi.com/projects/kdb}

\item kgdb -
\href{http://kgdb.sourceforge.net/}{http://kgdb.sourceforge.net/}

\end{itemize}

\end{slide}

\begin{slide}{KGDB}

\begin{itemize}

\item Requires two machines, a slave and a master. 

\item gdb runs on the master, controlling a gdb stub in the slave
kernel via the serial port.

\item When an OOPS or a panic occurs, you drop into the debugger. 

\item Very very useful for the situations where you dump core in an
interrupt handler and no oops data makes it to disk - you drop into  
the debugger with the correct backtrace. 

\end{itemize}

\end{slide}

\begin{slide}{ksymoops}

\begin{itemize}

\item Read Documentation/oops-tracing.txt

\item Install ksymoops, available from
\href{ftp://ftp.il.kernel.org}{ftp://ftp.il.kernel.org} 

\item Run it on the oops (get it from the logs, serial console, or
copy from the screen). 

\item ksymoops gives you a human readable back trace. 

\item Sometimes the oops data can be trusted ("easy" bugs like a NULL
pointer dereference) and sometimes it's no more than a general hint to
what is going wrong (memory corruption overwrite EIP).

\end{itemize}

\end{slide}

\begin{slide}{ksymoops(cont)}

\begin{itemize} 

\item Linux 2.5 includes an "in kernel" oops tracer, called
kksymoops. Don't forget to enable it when compiling your new 2.5
kernel!

\item It can be found under Kernel Hacking -> Load all symbols for
debugging/kksymoops (CONFIG\_KALLSYMS). 

\end{itemize} 

\end{slide}

\begin{slide}{ksymoops(cont)}


\begin{tiny}
\begin{verbatim}
Unable to handle kernel NULL pointer dereference at virtual address 00000000
 printing eip:
c014a9cc
*pde = 00000000
Oops: 0002
CPU:    0
EIP:    0060:[<c014a9cc>]    Not tainted
EFLAGS: 00010202
EIP is at sys_open+0x2c/0x90
eax: 00000001   ebx: 00000001   ecx: ffffffff   edx: 00000000
esi: bffffaec   edi: ce07e000   ebp: cdbcffbc   esp: cdbcffb0
ds: 007b   es: 007b   ss: 0068
Process cat (pid: 862, threadinfo=cdbce000 task=cdcf7380)
Stack: bffffaec 40013020 bffff9b4 cdbce000 c010adc7 bffffaec 00008000 00000000 
       40013020 bffff9b4 bffff868 00000005 0000007b 0000007b 00000005 420dabd4 
       00000073 00000246 bffff848 0000007b 
Call Trace:
 [<c010adc7>] syscall_call+0x7/0xb

Code: 89 1d 00 00 00 00 e8 59 fc ff ff 89 c6 85 f6 78 2f 8b 4d 10 
\end{verbatim}
\end{tiny}

\end{slide}

\begin{slide}{LKCD}

\begin{itemize}

\item LKCD - Linux Kernel Crash Dump 

\item \href{http://lkcd.sf.net}{http://lkcd.sf.net}

\item Saves a dump of the system's state at the time the dump
occurs. 

\item A dump occurs when the kernel panics or oopses, or when
requested by the administrator.

\item Must be configured before the crash occurs! 

\end{itemize}

\end{slide} 

\begin{slide}{Making sense of kernel data}

\begin{itemize}

\item System.map - kernel function addresses
\item /proc/kcore - image of system memory
\item vmlinux - the uncompressed kernel, can be disassembled using
objdump(1).

\end{itemize}

\end{slide}

\begin{slide}{User Mode Linux}

\begin{itemize}

\item For some kinds of kernel development (architecture independent,
file systems, memory management), using UML is a life saver. 

\item Allows you to run the Linux kernel in user space, and debug it
with gdb. 

\item Work is underway at making valgrind work on UML, which is
expected to find many bugs. 

\end{itemize}

\end{slide}

\begin{slide}{Magic SysRq}

\begin{itemize} 

\item More info at Documentation/sysrq.txt. 

\item a 'magical' key combo you can hit which the kernel will respond
to regardless of whatever else it is doing, unless it is completely
locked up.

\item CONFIG\_MAGIC\_SYSRQ, echo ``1'' > /proc/sys/kernel/sysrq

\item On x86, press 'ALT-SysRq-<command key>'. The sysrq key is also
known as the 'Print Screen' key.

\end{itemize} 

\end{slide}

\begin{slide}{Magic SysRq(cont)}

\begin{itemize}

\item  'b' - Will immediately reboot the system without syncing or
unmounting your disks.

\item 'o' - Will shut your system off (if configured and supported).

\item 's' - Will attempt to sync all mounted filesystems.

\item 'p' - Will dump the current registers and flags to your console.

\item 't' - Will dump a list of current tasks and their information to
your console.

\item 'm' - Will dump current memory info to your console.

\item 'h' - The most important key - will display help ;-) 

\end{itemize} 

\end{slide} 

\begin{slide}{Happy Hacking!}

\center{Questions? Comments?}
\center{Happy Oopsing!}

\end{slide}

\end{document}

%%% Local Variables: 
%%% mode: latex
%%% TeX-master: t
%%% End: 
