OKL4 Debugging Guide

Common Build Errors

Problems with Python Version

Errors such as:

> An error occurred: sort() takes no keyword arguments

or

> An error occurred: unpack requires a string argument of length 36

are almost always the result of an incompatible Python version. Version 2.4 is recommended and supported. Check your version with

$ python -V

Build Command Case Sensitivity

Boolean arguments are case sensitive (in Python, any string value is true, but only "False" is false!). Ensure you pass either "True" or "False" as arguments for parameters that require truth values (eg: debug=False).

Incorrect Toolchain Configuration

Errors such as:

> sh: i686-unknown-linux-gnu-gcc: command not found

indicate that your toolchain cannot be found. If using one of the standard toolchains, ensure the bin directory of the toolchain is in your PATH. For example, if using the NICTA IA32 toolchain, ensure /opt/nicta/gcc-3.3.4-glibc-2.3.3/i686-unknown-linux-gnu/bin is in your PATH.

No output from serial port

When running OKL4 on hardware all debug output is communicated over the serial port. If you can't see any output via Minicom (or PuTTY, etc), the most common cause is incorrectly configured serial parameters. The following are the correct params:

  • Baud rate (Speed): 115200
  • Data bits: 8
  • Parity bit: none
  • Stop bits: 1
  • Flow control: none

Build time options to assist with debugging

  • Append debug_trace=5 to your build command to enable the highest level of debug output. Other levels (from 1 to 5) can also be specified. The default is 0, which provides no debug output.

Run-time Debugging with KDB

The OKL4 Kernel Debugger (KDB) is the most efficient method of run-time debugging.

Accessing or Entering KDB

  • To access KDB while simulating or running on hardware, press [Esc].

  • To enter KDB as soon the kernel boots, append enter_kdb=True to your build line.

  • To enter KDB programmatically:
    • In userland, #include <l4/kdebug.h> and call L4_KDB_Enter("some text here")

    • In the kernel, #include <debug.h> and call enter_kdebug("some text here")

Using KDB

KDB provides a command-driven menu interface. To display the current menu of commands, key [?]. The listing provides the key access to the command followed by a short command description. For example:

> ?
  BS - back up to previous menu
  ?  - this help message
 ESC - back to previous menu
  a  - architecture specifics
  c  - KDB configuration
 SPC - show current user exception frame
  F  - show exception frame
  p  - dump page table
  g  - continue execution
  L  - list all capability lists
  S  - list all address spaces
  d  - dump memory
  P  - dump physical memory
  D  - dump memory in other space
  6  - Reset system
  l  - dump clist contents
  G  - show system sync-point dependency graph
  m  - show created mutexes
  q  - show scheduling queue
  s  - show space info
  t  - show thread control block
  T  - shows thread control block (extended)
  #  - statistics
  b  - tracebuffer menu
  r  - enable/disable/list tracepoints

Some commands (such as b - tracebuffer menu) are sub-menus. Inside a sub-menu, again key "?" to view the possible commands. To exit a submenu, key [Esc], [Backspace], or [CmdUp].

To exit KDB and continue execution of the system, key [g]. To perform a system reset, key [6].

Menu command input does not require [Enter] following the keying of the command - commands will be executed immediately. Some commands, however, request further input (for example, a thread or address space id). When an input is requested, a default response is usually displayed in square brackets - either key in the required information followed by [Enter], or press [Enter] immediately to accept the default.

Common KDB Operations

Common KDB operations that assist with debugging are as follows:

Display the scheduling queue

Key [q] to display a listing of all threads in the system, along with their priority, thread id, and current status. For example:

> q
[255]: (roottask)
[240]: (vtimer) (vrtc)
[200]: (event)
[110]: (vserial)
[ 98]: <posix_ex>
idle : idle_thread

The left-most column is the thread priority. Each row displays the thread identifier of all threads that exist at that priority. The thread identifier is wrapped in markup which signifies the thread status according to the following legend:

  • (some_thread): thread with id some_thread is blocked (usually on an IPC or mutex)
  • <some_thread>: thread with id some_thread is executing on CPU

  • {some_thread}: thread with id some_thread is halted
  • !some_thread!: thread with id some_thread is aborted

Inspect the state of a thread in more detail

To view the ThreadControlBlock (TCB) of a thread, key [t]. You will be requested to input the thread name, id, or TCB address of the thread that you wish to inspect. The thread name is usually easiest to input, as it can be copied directly from the scheduling queue output.

Debugging Unhandled Pagefaults

To debug unhandled page faults, ensure you build with a debug trace level of at least 1 (ie: ensure your build line contains debug_trace=1 or greater). If debug_trace=0 (the default), your faulting thread will simply be killed and you will not be alerted.

When an unhandled page fault occurs, the following output will be displayed:

Unhandled page fault:
  addr=0x4/??? priv=R
  ip=0x80500054, sp=80063f6c
  pd=0x8002ca80 thread=0x80000005

This indicates the following:

  • addr: the address that the thread is attempting to access

  • priv: the privileges with which the thread is attempting to access the address; either Read (R), Write (W) or Execute (X))

  • ip: the instruction at which the access occurred

  • sp: the address of the current stack pointer

  • pd: the ProtectionDomain in which the thread is executing

  • thread: the identifier of the thread

The first hint at the cause of the unhandled page fault can be determined by looking at the address: if it is very low (such as 0x6 or 0x18) this generally indicates that the thread has attempted to access a field of a struct to which it has a null pointer.

To determine the instruction that caused the page fault, we can perform an objdump on the output image ELF file located at build/images/image.elf:

$ objdump -dl ./build/images/image.elf | less

and simply search for the instruction address (the ip in the above output).

Note that the objdump used needs to correspond with the toolchain used in the build. For example, if building for gumstix, use:

$ arm-linux-objdump -dl ./build/images/image.elf | less

DebuggingGuide (last edited 2008-08-11 02:34:28 by localhost)