Function Call Tracing in the Kernel and Applications
The project FCTRACE – function call tracing – uses the Linux kernel probes (kprobes) to trace kernel mode (and later user mode) function calls done by an application or any thread of execution. By design, it tries to have as little negative impact on system performance as possible, and sets up and removes probes dynamically as needed.
There are various uses for this, here are some of the main uses I had in mind:
- as a RAS (reliability, availability, serviceability) tool: a user has a difficult-to-reproduce but serious problem with an application or kernel (spurious crashes, data loss or corruption etc.): he can use fctrace to get the call history and cut down on the code review that needs to be done
- as a helper for improved bug reports: a user can reproduce a problem easily on large installations, special hardware or a special setup that the developer cannot access: the user can use fctrace to give the developer more information so he can fix the bug nonetheless, or home in quickly on problematic code behaviour and create simple test code snippets for the user to try out
- as a security team tool: understanding what happens in the code is essential for finding security problems proactively: the tool can speed up understanding the general structure of a program; also, for reactively finding and fixing security bugs, the tool can quickly show the involved functions and speed up the fix process
- as a performance analysis tool (this is a future extension): the performance analysis does not need a setup and gathers data from the actual execution environment
- as a learning tool (helping the community to thrive): what happens (e.g. in the kernel, in Mozilla, in Apache, in glibc etc.) when the user clicks here, calls this kernel or library function, etc.
- as a documentation tool: much code is written without sufficient documentation (design docs, feature specs, implementation docs, ...). Documenters can run tests on the code and use fctrace to find out when the functions are used, and how they interact with each other. Later it may be helpful to generate data usage graphs from traced data references (except for global data), which can serve as rough templates for the documentation.
The main benefits over other tracing tools (strace, ltrace, LTT and related tools like systemtap) are one or more of these: fctrace
- has ease of use – as simple as strace, no setup is needed: nearly anyone can use it
- shows all function calls, not just system or library calls
- is supported on all main platforms (basically every platform that has kprobes)
- is simple to port
- has very little need for maintenance
- module can be compiled and used with any kernel (that supports kprobes), not just SUSE kernels
- has no performance impact when unused, quite small performance impact when used
- has little or no loss of events
- works with any application or kernel thread
- has lots of possibilities for extension (e.g. in the security area)
- planned extension: filtering of calls
- planned extension: display function arguments (requires debuginfo or type info in executable)
- planned extension: call graphs (with separate evaluator)
- planned extension: performance analysis (with separate frontend)
- planned extension: … more ideas (and designs) exist, need to add them here
The main idea is that nowadays applications and the kernel are far too big to watch and understand from the behaviour at the external interfaces only. Watching the function calls gives additional information which is often sufficient to understand a problem. At the very least, it can usually help to quickly identify the function or a set of functions that need to be checked.
Without fctrace, this information can in principle be gathered from the source code (if available). But with the abundance of function registrations for plugins and modules that are nowadays being used in many applications (Apache, Mozilla, Gimp, …) and the kernel, tracing the source becomes much more tedious. Also, in some situations a fix is really wanted (data corruption), but the reproducer of the problem would not look through the source code himself. And even for a developer, esp. who is looking at yet unknown code, this task is better left to a tool which can work much more quickly.
Olaf Dabrunz originated this idea. Patrick Kirsch programmed the initial prototype. Olaf and Patrick are improving the prototype. Help is wanted, see ...
The sourceforge project page is at https://sourceforge.net/projects/call-tracing. The main Subversion branch can be found at https://call-tracing.svn.sourceforge.net/svnroot/call-tracing/trunk/fctrace.