Debug GraalVM with GDB - Moritz Arena's Blog

What is GDB?#

GDB, the GNU Debugger, is a powerful open-source debugging tool designed for software developers to analyze and control program execution, identify bugs, inspect memory, modify runtime variables, and trace stack frames, providing comprehensive support for languages like C, C++, Fortran, and more, with features such as breakpoints, step-through execution, remote debugging, and core dump analysis, making it an essential utility for diagnosing and resolving software issues effectively.

LLDB#

LLDB is a modern, high-performance debugger developed as part of the LLVM project. It is designed for debugging programs written in compiled languages such as C, C++, Objective-C, Rust, and Swift. LLDB leverages the LLVM compiler infrastructure to provide features like:

Fast symbol parsing and memory inspection.
Support for modern architectures (e.g., ARM, x86).
Advanced debugging of multithreaded applications.
Integration with IDEs such as Xcode and CLion.

Java Debugger#

The Java debugger (JVM debugger), typically implemented via the Java Debug Wire Protocol (JDWP), is specifically designed for debugging Java applications running on the Java Virtual Machine (JVM). It supports features unique to the JVM environment, including:

Inspecting Java objects and variables at runtime.
Setting breakpoints in dynamically loaded classes.
Debugging code during runtime changes, such as hot-swapping method bodies.
Tracking exceptions and stack traces in a high-level, language-specific manner.

Tools like jdb (Java Debugger) and IDE-based debuggers (e.g., IntelliJ IDEA, Eclipse) utilize JDWP for communication with the JVM, providing seamless support for Java, Kotlin, and other JVM-based languages.

LLDB is the default debugger on macOS and works seamlessly with Apple’s development ecosystem. On other platforms, it offers an alternative to GDB with more efficient handling of large projects and better support for newer technologies.

Diferences from LLDB#

Although GDB and LLDB are both powerful debugging tools for native programs, there are situations where one cannot be used as a substitute for the other:

Platform-Specific Debugging:
- GDB is more commonly used on Linux and Windows, with extensive support and integration in these environments.
- LLDB is the default choice on macOS, with better compatibility and performance for Darwin-based systems.
Debugging Environment Constraints:
- Some binaries may only include debug information compatible with one tool. For example, a binary compiled with certain versions of the LLVM toolchain may work better with LLDB, while a GCC-compiled binary typically aligns with GDB.
Toolchain Integration:
- LLDB tightly integrates with the LLVM toolchain, making it the preferred debugger when using Clang or developing for Apple platforms.
- GDB is often the default debugger for projects built with GCC or other GNU tools, especially on Linux distributions.
GraalVM Native Images:
- GraalVM native binaries are specifically designed to work with GDB due to how their debugging symbols and runtime structures are built. LLDB is not officially supported in this context.
Feature Support:
- LLDB may lack support for certain legacy debugging features provided by GDB, especially for older architectures or embedded systems.
- GDB, on the other hand, is not as optimized for handling modern multi-threading or large, complex binaries as LLDB.

While GDB and LLDB serve similar purposes for debugging native applications, they are not interchangeable in all situations due to platform preferences, toolchain dependencies, and compatibility with the binary’s debug information. For JVM-based applications, however, GDB and LLDB are irrelevant, as debugging relies on the JVM’s internal debugging protocols (e.g., JDWP), emphasizing the specialized nature of debugging tools for different runtime environments.

GraalVM Native Image#

GraalVM Native Image is a tool provided by GraalVM that compiles Java applications into native binary executables. This process can be done using a Maven plugin or by directly invoking GraalVM’s command-line tools. Once a Java program is packaged as a native binary, it runs as a separate process and no longer retains typical Java features such as dynamic class loading, Java ASM, and reflection. As a result, when migrating large projects to the GraalVM platform and packaging them as native images, developers must perform thorough testing and debugging to address various issues that may arise during runtime on the binary platform. Since traditional IDE-based debugging (using breakpoints, for example) is no longer feasible with native binaries, GDB can be utilized to debug GraalVM Native Image applications, providing an alternative debugging solution for issues encountered in this execution environment.

Identifying source code location#

One goal of the implementation is to make it simple to configure the debugger so that it can identify the relevant source file when it stops during program execution. The native-image tool tries to achieve this by accumulating the relevant sources in a suitably structured file cache.

The native-image tool uses different strategies to locate source files for JDK runtime classes, GraalVM classes, and application source classes for inclusion in the local sources cache. It identifies which strategy to use based on the package name of the class. So, for example, packages starting with java.* or jdk.* are JDK classes; packages starting with org.graal.* or com.oracle.svm.* are GraalVM classes; any other packages are regarded as application classes.

Sources for JDK runtime classes are retrieved from the src.zip found in the JDK release used to run the native image generation process. Retrieved files are cached under subdirectory sources, using the module name (for JDK 11) and package name of the associated class to define the directory hierarchy in which the source is located.

For example, on Linux the source for class java.util.HashMap will be cached in file sources/java.base/java/util/HashMap.java. Debug info records for this class and its methods will identify this source file using the relative directory path java.base/java/util and file name HashMap.java. On Windows things will be the same modulo use of \ rather than / as the file separator.

Sources for GraalVM classes are retrieved from ZIP files or source directories derived from entries in the classpath. Retrieved files are cached under subdirectory sources, using the package name of the associated class to define the directory hierarchy in which the source is located (e.g., class com.oracle.svm.core.VM has its source file cached at sources/com/oracle/svm/core/VM.java).

The lookup scheme for cached GraalVM sources varies depending upon what is found in each classpath entry. Given a JAR file entry like /path/to/foo.jar, the corresponding file /path/to/foo.src.zip is considered as a candidate ZIP file system from which source files may be extracted. When the entry specifies a directory like /path/to/bar, then directories /path/to/bar/src and /path/to/bar/src_gen are considered as candidates. Candidates are skipped when the ZIP file or source directory does not exist, or it does not contain at least one subdirectory hierarchy that matches one of the the expected GraalVM package hierarchies.

Sources for application classes are retrieved from source JAR files or source directories derived from entries in the classpath. Retrieved files are cached under subdirectory sources, using the package name of the associated class to define the directory hierarchy in which the source is located (e.g., class org.my.foo.Foo has its source file cached as sources/org/my/foo/Foo.java).

The lookup scheme for cached application sources varies depending upon what is found in each classpath entry. Given a JAR file entry like /path/to/foo.jar, the corresponding JAR /path/to/foo-sources.jar is considered as a candidate ZIP file system from which source files may be extracted. When the entry specifies a dir like /path/to/bar/classes or /path/to/bar/target/classes then one of the directories /path/to/bar/src/main/java, /path/to/bar/src/java or /path/to/bar/src is selected as a candidate (in that order of preference). Finally, the current directory in which the native executable is being run is also considered as a candidate.

These lookup strategies are only provisional and may need extending in the future. However, it is possible to make missing sources available by other means. One option is to unzip extra app source JAR files, or copy extra app source trees into the cache. Another is to configure extra source search paths.

Debug GraalVM with GDB#

Even though GraalVM’s official documentation provides some guidance on using GDB to debug Native Executables, it is still essential to describe the usage plan in the context of an actual project.

Use command apt install -y gdb to install gdb on the debian operate system.

Which GDB to Use?#

Please use GDB 10.2 or later. The debug info is tested via mx debuginfotest against 10.2.
Note that later versions might have slightly different formatting of debugger output (which, for example, may cause CI/CD gate checks to fail).
GDB bundled in recent Linux releases works just fine for debugging sessions.

Supported Features#

The currently supported features include:

break points configured by file and line, or by method name
single stepping by line including both into and over function calls
stack backtraces (not including frames detailing inlined code)
printing of primitive values
structured (field by field) printing of Java objects
casting/printing objects at different levels of generality
access through object networks via path expressions
reference by name to methods and static field data
reference by name to values bound to parameter and local vars
reference by name to class constants

Note that single stepping within a compiled method includes file and line number info for inlined code, including inlined GraalVM methods. So, GDB may switch files even though you are still in the same compiled method.

Debug Information#

Debug Information is used to ensure that during the debugging process, the debugger can accurately point to the addresses of Java classes, specific methods, and member properties. To build a native executable with debug information, provide the -g command-line option for javac when compiling the application, and then to the native-image builder. This enables source-level debugging, and the debugger (GDB) then correlates machine instructions with specific source lines in Java files.

If you are using the org.graalvm.buildtools:native-maven-plugin plugin to build binary files, you have to configure <debug>true</debug> in the <configuration> section to make sure that graalvm will generate debug information file successfully.

GDB will automatically loads the <executable_name>.debug file for a given native executable <executable_name>. (There is a link between the native executable and its *.debug file)

The *.debug file contains additional information about the build. It gives a list of all class path entries that were used to build the native executable like below:

String dump of section '.debug.svm.imagebuild.classpath':
  [     0]  /home/user/.mx/cache/HAMCREST_e237ae735aac4fa5a7253ec693191f42ef7ddce384c11d29fbf605981c0be077d086757409acad53cb5b9e53d86a07cc428d459ff0f5b00d32a8cbbca390be49/hamcrest.jar
  [    b0]  /home/user/.mx/cache/JUNIT_5974670c3d178a12da5929ba5dd9b4f5ff461bdc1b92618c2c36d53e88650df7adbf3c1684017bb082b477cb8f40f15dcf7526f06f06183f93118ba9ebeaccce/junit.jar
  [   15a]  /home/user/mx/mxbuild/jdk20/dists/jdk9/junit-tool.jar
  [   1a9]  /home/user/graal/substratevm/mxbuild/jdk20/com.oracle.svm.test/bin

The following sections are available:

.debug.svm.imagebuild.classpath
.debug.svm.imagebuild.modulepath
.debug.svm.imagebuild.arguments
.debug.svm.imagebuild.java.properties

Start GDB debugging program#

To debug a specific binary file using gdb, you must use the —args parameter to point to the binary file’s path. Additionally, you can specify the binary’s startup parameters in args. For example, when starting a Nacos binary file through gdb while specifying parameters like nacos.home, the startup command would be:

gdb --args \
    # the path to the Nacos Native binary
    /home/nacos/nacos-server \
    # examples of a startup parameter
    -Dnacos.home=/home/nacos \
    --logging.config=/home/nacos/conf/nacos-logback.xml \
    --spring.config.additional-location=file:/home/nacos/conf/ \
    -Dnacos.preferHostnameOverIp=true

Each time a program is launched for debugging through GDB, it automatically loads a .debug file with the same name from the same directory as the binary program as its debugging information. This process takes some time as GDB needs to build an index for it.

Check types#

WARNING
GDB does not currently include support for Java debugging. In consequence, debug capability has been implemented by generating debug info that models the Java program as an equivalent C++ program. Java class, array and interface references are actually pointers to records that contain the relevant field/array data. In the corresponding C++ model the Java name is used to label the underlying C++ (class/struct) layout types and Java references appear as pointers.
So, for example in the DWARF debug info model java.lang.String identifies a C++ class. This class layout type declares the expected fields like hash of type int and value of type byte[] and methods like String(byte[]), charAt(int), etc. However, the copy constructor which appears in Java as String(String) appears in gdb with the signature String(java.lang.String *).
The C++ layout class inherits fields and methods from class (layout) type java.lang.Object using C++ public inheritance. The latter in turn inherits standard oop (ordinary object pointer) header fields from a special struct class named _objhdr which includes two fields. The first field is called hub and its type is java.lang.Class * i.e. it is a pointer to the object’s class. The second field is called idHash and has type int. It stores an identity hashcode for the object.
The ptype command can be used to print details of a specific type. Note that the Java type name must be specified in quotes because to escape the embedded . characters.

Pre-checking the class type is helpful to later specify breakpoints accurately. You can use the info type command in GDB to verify that a class has been compiled into the binary and to inspect its data structure in the header file.

# Find all methods named main
(gdb) info functions ::main
All functions matching regular expression "::main":

File hello/Hello.java:
76:	void hello.Hello::main(java.lang.String[]*);

File java/util/Timer.java:
534:	void java.util.TimerThread::mainLoop();

# Figure out the data structure of ArrayList class
(gdb) info types ArrayList
All types matching regular expression "ArrayList":

...
File java/util/ArrayList.java:
	java.util.ArrayList;
	java.util.ArrayList$ArrayListSpliterator;
	java.util.ArrayList$Itr;
	java.util.ArrayList$ListItr;
...

Set Breakpoint#

To set a breakpoint, you should find the type of the method you want to set a breakpoint in. For example, to set a breakpoint at line 518 in the io.grpc.stub.ClientCalls class, you can use the following GDB command (note the format for the breakpoint path):

(gdb) b io/grpc/stub/ClientCalls.java:518

If you prefer not to specify the exact line number in a Java file and are certain about the method you want to debug, you can set a breakpoint directly on the method like below:

(gdb) b 'io.grpc.stub.ClientCalls::futureUnaryCall'

Inspect Breakpoint#

In the GDB debugging process, use the run command to start the program. If the breakpoint is valid, the program will pause at the breakpoint. You can continue using the continue command to jump to the next valid breakpoint. During the pause, you can check the variable address values in the context of the breakpoint. Use the p command to dereference the address value to get the class data information. This operation also supports type casting. The specific command invocation methods are as follows:

# run or continue until the address value of the method argument 'value' is output
(gdb) p value
# Dereference the memory address of value as the 'Payload' type
(gdb) p *('com.alibaba.nacos.api.grpc.auto.Payload' *) <address>
# Dereference the 'Metadata' inside Payload
(gdb) p *('com.alibaba.nacos.api.grpc.auto.Metadata' *) <address>
# Dereference the 'Body' inside Payload
(gdb) p *('com.google.protobuf.Any' *) <address>

If we take the io/grpc/stub/ClientCalls.java:518 breakpoint as an example, the following debugging information will be displayed:

Thread 148 "or-172.18.0.1-5" hit Breakpoint 1, io.grpc.stub.ClientCalls$UnaryStreamToFuture::onMessage(java.lang.Object*) (this=0x7ff8001637d0, value=0x7ff801c01248) at io/grpc/stub/ClientCalls.java:519
519     io/grpc/stub/ClientCalls.java: No such file or directory.

# Output the address value of the method parameter 'value'
(gdb) p value
$1 = (java.lang.Object *) 0x7ff801c01248

# Output the data structure of Payload to get more detailed information
(gdb) p *('com.alibaba.nacos.api.grpc.auto.Payload' *)0x7ff801c01248
$2 = 
{
	<com.google.protobuf.GeneratedMessageV3> = {
		<com.google.protobuf.AbstractMessage> = {
			<com.google.protobuf.AbstractMessageLite> = {
				<java.lang.Object> = {
					<_objhdr> = {
            			hub = 0x6ac4ea0 <sun.security.util.DerInputStream::getUnalignedBitString()+480>
					}, 
					<No data fields> 
				}, 
				memoizedHashCode = 0
			}, 
			memoizedSize = -1
		},
    	unknownFields = 0x12418d9 <com.caucho.hessian.util.IdentityIntMap::toString()+2265>
	},
  	metadata_ = 0x13e0255 <com.fasterxml.jackson.databind.deser.impl.CreatorCollector::_reportDuplicateCreator(int, bool, com.fasterxml.jackson.databind.introspect.AnnotatedWithParams*, com.fasterxml.jackson.databind.introspect.AnnotatedWithParams*)+1141>, 
  	body_ = 0x13e0280 <com.fasterxml.jackson.databind.deser.impl.CreatorCollector::_reportDuplicateCreator(int, bool, com.fasterxml.jackson.databind.introspect.AnnotatedWithParams*, com.fasterxml.jackson.databind.introspect.AnnotatedWithParams*)+1184>, 
	memoizedIsInitialized = 1 '\001', 
  	static DEFAULT_INSTANCE = 0x108c638 <com.alipay.sofa.jraft.rpc.RpcRequests$PingRequest::parseFrom(com.google.protobuf.CodedInputStream*, com.google.protobuf.ExtensionRegistryLite*)+1432>,
  	static PARSER = 0x108c63c <com.alipay.sofa.jraft.rpc.RpcRequests$PingRequest::parseFrom(com.google.protobuf.CodedInputStream*, com.google.protobuf.ExtensionRegistryLite*)+1436>, 
	static serialVersionUID = 0,
  	static METADATA_FIELD_NUMBER = 2, static BODY_FIELD_NUMBER = 3
}

# Dereference the 'Metadata' inside Payload
(gdb) p *('com.alibaba.nacos.api.grpc.auto.Metadata' *)0x13e0255
$3 = 
{
	<com.google.protobuf.GeneratedMessageV3> = {
		<com.google.protobuf.AbstractMessage> = {
			<com.google.protobuf.AbstractMessageLite> = {
				<java.lang.Object> = {
					<_objhdr> = {
						hub = 0x303e7c80
					}, 
					<No data fields>
				},
        		memoizedHashCode = -1920725248
			}, 
			memoizedSize = -956301315
		}, 
		unknownFields = 0x303e44
	}, 
	type_ = 0xfffd83e9, 
	clientIp_ = 0xcf8b48ff, 
	headers_ = 0x48d18b48, 
	memoizedIsInitialized = -127 '\201',
	static DEFAULT_INSTANCE = 0x108a961 <com.alipay.sofa.jraft.rpc.RpcRequests$PingRequest::parseFrom(com.google.protobuf.CodedInputStream*)+577>,
	static PARSER = 0x108a966 <com.alipay.sofa.jraft.rpc.RpcRequests$PingRequest::parseFrom(com.google.protobuf.CodedInputStream*)+582>, 
	static serialVersionUID = 0, 
	static TYPE_FIELD_NUMBER = 3,
	static CLIENTIP_FIELD_NUMBER = 8, static HEADERS_FIELD_NUMBER = 7
}

# Dereference the 'Any' inside Payload
(gdb) p *('com.google.protobuf.Any' *)0x13e0280
$4 = 
{
	<com.google.protobuf.GeneratedMessageV3> = {
		<com.google.protobuf.AbstractMessage> = {
			<com.google.protobuf.AbstractMessageLite> = {
				<java.lang.Object> = {
					<_objhdr> = {
						hub = 0xadc17a0,
            			idHash = 0
					}, 
					<No data fields>
				}, 
				memoizedHashCode = 0
			}, 
			memoizedSize = -1
		}, 
		unknownFields = 0x7fffdc214c38
	}, 
	memoizedIsInitialized = -1 '\377', 
	cachedUnpackValue = 0x0, 
	typeUrl_ = 0xc441f10,
  	value_ = 0x7fff744015f0, 
	static DEFAULT_INSTANCE = 0x7fff74ac5740, 
	static PARSER = 0x7fff74ac5780, 
	static serialVersionUID = 0, 
	static TYPE_URL_FIELD_NUMBER = 1, 
	static VALUE_FIELD_NUMBER = 2
}

Debugging with Isolates#

Under normal circumstances, when attempting to read the correct string content from the address of a string-type member variable of an object, more often than not, an error is returned:

(gdb) p/x $rdx
$5 = 0x2
(gdb) hubname $rdx
Cannot access memory at address 0x8779c8

This issue typically occurs because GraalVM Native Image, during binary compilation, defaults to enabling memory isolation to ensure optimized memory layout during the execution of the binary program. This results in obtaining a virtual address, which cannot be correctly dereferenced during debugging. To resolve this issue, simply add the -H:-SpawnIsolates parameter in the <buildArgs> of the corresponding Maven plugin to disable memory isolation, then recompile and debug.

Other information#

hub#

In the context of GraalVM Native Image, hub is a field defined in the _objhdr struct, which serves as the foundational structure for objects compiled into C++ from Java. The primary purpose of hub is to maintain a reference to the runtime type information of the object. Specifically, its type is java.lang.Class*, meaning it is a pointer to the object’s class metadata.

Role of hub

Type Identification: Every Java object at runtime needs to know its type. This type information is encapsulated in an instance of java.lang.Class. The hub field stores a pointer to this class metadata, indicating the object’s type.
Metadata Access: Through the hub field, the program can access the metadata of the object, enabling operations such as:
- Retrieving the type of the object (e.g., via getClass()).
- Performing runtime type checks (e.g., instanceof).
- Supporting reflection-based operations.
Optimized Runtime Support: In the GraalVM Native Image implementation, the C++ object model incorporates essential features of the Java runtime. By embedding the hub pointer in every object, the system efficiently facilitates type lookup and dynamic dispatch, mimicking the behavior of the Java Virtual Machine (JVM).

After compilation into a native image, the memory layout of a typical object might look like this:

struct _objhdr {
    java.lang.Class *hub;  // Pointer to the class metadata
    int idHash;            // Identity hash code for the object
};

class MyClass : public _objhdr {
    // Fields and methods specific to MyClass
};

hub Field: Points to the instance of java.lang.Class corresponding to the object, such as java.util.HashMap.class.
idHash Field: Stores the identity hash code of the object, equivalent to the value returned by System.identityHashCode() in Java.

The hub field acts as a critical bridge between the C++ object representation and the Java runtime type system, enabling runtime operations that rely on class metadata. This design ensures that key Java Virtual Machine features like type information and dynamic type checks remain functional in a native image context, while also supporting efficient debugging and execution.

`this`#

Just like in Java or C++ code, in instance-methods, prefixing with this. is not needed.

(gdb) bt
#0  hello.Hello$NamedGreeter::greet() (this=0x7ff7f9101208) at hello/Hello.java:71
#1  0x000000000083c060 in hello.Hello::main(java.lang.String[]*) (args=<optimized out>) at hello/Hello.java:77
#2  0x0000000000413355 in com.oracle.svm.core.JavaMainWrapper::runCore0() () at com/oracle/svm/core/JavaMainWrapper.java:178
#3  0x00000000004432e5 in com.oracle.svm.core.JavaMainWrapper::runCore() () at com/oracle/svm/core/JavaMainWrapper.java:136
#4  com.oracle.svm.core.JavaMainWrapper::doRun(int, org.graalvm.nativeimage.c.type.CCharPointerPointer*) (argc=<optimized out>, argv=<optimized out>) at com/oracle/svm/core/JavaMainWrapper.java:233
#5  com.oracle.svm.core.JavaMainWrapper::run(int, org.graalvm.nativeimage.c.type.CCharPointerPointer*) (argc=<optimized out>, argv=<optimized out>) at com/oracle/svm/core/JavaMainWrapper.java:219
#6  com.oracle.svm.core.code.IsolateEnterStub::JavaMainWrapper_run_e6899342f5939c89e6e2f78e2c71f5f4926b786d(int, org.graalvm.nativeimage.c.type.CCharPointerPointer*) (__0=<optimized out>, __1=<optimized out>)
at com/oracle/svm/core/code/IsolateEnterStub.java:1
(gdb) p this
$1 = (hello.Hello$NamedGreeter *) 0x7ff7f9001218
(gdb) p *this
$2 = {
  <hello.Hello$Greeter> = {
    <java.lang.Object> = {
      <_objhdr> = {
        hub = 0x1de2260
      }, <No data fields>}, <No data fields>}, 
  members of hello.Hello$NamedGreeter:
  name = 0x25011b
}

# Using the `this` variable in instance methods
(gdb) p this.name
$3 = (_z_.java.lang.String *) 0x270119

(gdb) p name
$7 = (_z_.java.lang.String *) 0x270119
(gdb) p name.value.data
$8 = 0x7ff7f91008c0 "FooBar\376\376\200G\273\001\027\001'"

Downcasting#

NOTE
If the static type that you want to downcast from is a compressed reference then the type used in the downcast also needs to be that of a compressed reference.

The GNU debugger will see every Java object reference as a pointer type. The pointer points to a structure, actually a C++ class, that models the layout of the Java array using an integer length field and a data field whose type is a C++ array embedded into the block of memory that models the array object.

In Java array, elements of the array data field are references to the base type, in this case pointers to java.lang.String. The data array has a nominal length of 0. However, the block of memory allocated for the String[] object actually includes enough space to hold the number of pointers determined by the value of field len.

Suppose your source uses a variable of static type Greeter and you want to inspect its data.

public static void main(String[] args) {
    Greeter greeter = Greeter.greeter(args);
    greeter.greet(); // Here you might have a NamedGreeter

As you can see, currently GDB only knows about the static type of greeter in line 3:

Thread 1 "hello_image" hit Breakpoint 2, hello.Hello::main(java.lang.String[]*) (args=<optimized out>) at hello/Hello.java:3
3	        greeter.greet();
(gdb) p greeter
$17 = (hello.Hello$Greeter *) 0x7ff7f9101208

Also, you are not able to see fields that only exist for the NamedGreeter subclass.

(gdb) p *greeter
$18 = {
  <java.lang.Object> = {
    <_objhdr> = {
      hub = 0x1d1cae0
    }, <No data fields>}, <No data fields>}

But you do have the hub field, which points to the class-object of an object. Therefore, it allows you to determine the runtime-type of the Greeter object at address 0x7ff7f9101208:

(gdb) p greeter.hub
$19 = (_z_.java.lang.Class *) 0x1d1cae0
(gdb) p *greeter.hub
$20 = {
  <java.lang.Class> = {
    <java.lang.Object> = {
      <_objhdr> = {
        hub = 0x1bec910
      }, <No data fields>}, 
    members of java.lang.Class:
    typeCheckStart = 1188,
    name = 0xb94a2, <<<< WE ARE INTERESTED IN THIS FIELD
    superHub = 0x90202,
    ...
    monitorOffset = 8,
    optionalIdentityHashOffset = 12,
    flags = 0,
    instantiationFlags = 3 '\003'
  }, <No data fields>}
(gdb) p greeter.hub.name
$21 = (_z_.java.lang.String *) 0xb94a2
(gdb) p greeter.hub.name.value.data
$22 = 0x7ff7f80705b8 "hello.Hello$NamedGreeter\351\001~*"

So you learned that the actual type of that object is hello.Hello$NamedGreeter. Now cast to that type, then you can inspect the downcasted convenience variable rt_greeter:

(gdb) set $rt_greeter = ('hello.Hello$NamedGreeter' *) greeter

(gdb) p $rt_greeter
$23 = (hello.Hello$NamedGreeter *) 0x7ff7f9101208
(gdb) p *$rt_greeter
$24 = {
  <hello.Hello$Greeter> = {
    <java.lang.Object> = {
      <_objhdr> = {
        hub = 0x1d1cae0
      }, <No data fields>}, <No data fields>}, 
  members of hello.Hello$NamedGreeter:
  name = 0x270119
}

# Also you can see the name field that only exists in the NamedGreeter subtype.
(gdb) p $rt_greeter.name
$25 = (_z_.java.lang.String *) 0x270119
(gdb) p $rt_greeter.name.value.data
$26 = 0x7ff7f91008c0 "FooBar\376\376\200G\273\001\027\001'"

Isolates#

Enabling the use of isolates, by passing command line option -H:-SpawnIsolates to the native-image builder, affects the way ordinary object pointers (oops) are encoded. In turn, that means the debug info generator has to provide gdb with information about how to translate an encoded oop to the address in memory, where the object data is stored. This sometimes requires care when asking gdb to process encoded oops vs decoded raw addresses.

When isolates are disabled, oops are essentially raw addresses pointing directly at the object contents. This is generally the same whether the oop is embedded in a static/instance field or is referenced from a local or parameter variable located in a register or saved to the stack. It is not quite that simple because the bottom 3 bits of some oops may be used to hold “tags” that record certain transient properties of an object. However, the debug info provided to gdb means that it will remove these tag bits before dereferencing the oop as an address.

By contrast, when isolates are enabled, oops references stored in static or instance fields are actually relative addresses, offsets from a dedicated heap base register (r14 on x86_64, r29 on AArch64), rather than direct addresses (in a few special cases the offset may also have some low tag bits set). When an “indirect” oop of this kind gets loaded during execution, it is almost always immediately converted to a “raw” address by adding the offset to the heap base register value. So, oops which occur as the value of local or parameter vars are actually raw addresses.

NOTE
Note that on some operating systems enabling isolates causes problems with printing of objects when using a gdb release version 10 or earlier. It is currently recommended to disable use of isolates, by passing command line option -H:-SpawnIsolates, when generating debug info if your operating system includes one of these earlier releases. Alternatively, you may be able to upgrade your debugger to a later version.

The DWARF info encoded into the image, when isolates are enabled, tells gdb to rebase indirect oops whenever it tries to dereference them to access underlying object data. This is normally automatic and transparent, but it is visible in the underlying type model that gdb displays when you ask for the type of objects.

For example, consider the static field we encountered above. Printing its type in an image that uses isolates shows that this static field has a different type to the expected one:

(gdb) ptype 'java.math.BigInteger'::powerCache
type = class _z_.java.math.BigInteger[][] : public java.math.BigInteger[][] {
} *

The field is typed as _z_.java.math.BigInteger[][] which is an empty wrapper class that inherits from the expected type java.math.BigInteger[][]. This wrapper type is essentially the same as the original but the DWARF info record that defines it includes information that tells gdb how to convert pointers to this type.

When gdb is asked to print the oop stored in this field it is clear that it is an offset rather than a raw address.

(gdb) p/x 'java.math.BigInteger'::powerCache
$1 = 0x286c08
(gdb) x/x 0x286c08
0x286c08:	Cannot access memory at address 0x286c08

However, when gdb is asked to dereference through the field, it applies the necessary address conversion to the oop and fetches the correct data.

(gdb) p/x *'java.math.BigInteger'::powerCache
$2 = {
  <java.math.BigInteger[][]> = {
    <java.lang.Object> = {
      <_objhdr> = {
        hub = 0x1ec0e2,
        idHash = 0x2f462321
      }, <No data fields>},
    members of java.math.BigInteger[][]:
    len = 0x25,
    data = 0x7ffff7a86c18
  }, <No data fields>}

Printing the type of the hub field or the data array shows that they are also modelled using indirect types:

(gdb) ptype $1->hub
type = class _z_.java.lang.Class : public java.lang.Class {
} *
(gdb) ptype $2->data
type = class _z_.java.math.BigInteger[] : public java.math.BigInteger[] {
} *[0]

The debugger still knows how to dereference these oops:

(gdb) p $1->hub
$3 = (_z_.java.lang.Class *) 0x1ec0e2
(gdb) x/x $1->hub
0x1ec0e2:	Cannot access memory at address 0x1ec0e2
(gdb) p *$1->hub
$4 = {
  <java.lang.Class> = {
    <java.lang.Object> = {
      <_objhdr> = {
        hub = 0x1dc860,
        idHash = 1530752816
      }, <No data fields>},
    members of java.lang.Class:
    name = 0x171af8,
    . . .
  }, <No data fields>}

Since the indirect types inherit from the corresponding raw type it is possible to use an expression that identifies an indirect type pointer in almost all cases where an expression identifying a raw type pointer would work. The only case case where care might be needed is when casting a displayed numeric field value or displayed register value.

For example, if the indirect hub oop printed above is passed to hubname_raw, the cast to type Object internal to that command fails to force the required indirect oops translation. The resulting memory access fails:

(gdb) hubname_raw 0x1dc860
Cannot access memory at address 0x1dc860

In this case it is necessary to use a slightly different command that casts its argument to an indirect pointer type:

(gdb) define hubname_indirect
 x/s (('_z_.java.lang.Object' *)($arg0))->hub->name->value->data
end
(gdb) hubname_indirect 0x1dc860
0x7ffff78a52f0:	"java.lang.Class"

Run GDB in Docker#

GDB was originally designed for Unix-like systems, and running it on Windows typically requires using environments like MinGW or Cygwin, or leveraging WSL to simulate a Unix-like environment. Windows uses the Portable Executable (PE) format for binaries, but GDB’s support for this format is not as mature or efficient as it is for the ELF (Executable and Linkable Format) used on Unix-like systems. Furthermore, programs compiled with MSVC instead of GCC/MinGW may not be directly debuggable with GDB.

The prototype is currently implemented only for the GNU Debugger on Linux:

Linux/x86_64 support has been tested and should work correctly
Linux/AArch64 support is present but has not yet been fully verified (break points should work ok but stack backtraces may be incorrect)

Windows support is still under development.

# based on debian linux which support gnu libc
ARG BASE_IMAGE=debian:latest

FROM ${BASE_IMAGE}

ENV TOMCAT_ACCESSLOG_ENABLED="false" \
    TZ="Asia/Shanghai"

RUN apt update \
    && apt install -y gdb

WORKDIR $BASE_DIR

# Add binary files into docker image
ADD output/* $BASE_DIR/

ENTRYPOINT ["bash","tail -f /dev/null"]

Using Docker to set up a GDB debugging environment is the simplest solution. Docker allows you to specify a lightweight Unix-like environment (e.g., Debian) where GDB can be directly installed and used. Running the binary inside a container provides great convenience and ensures isolation, making the debugging process more streamlined and reliable.

Reference#

[1] Oracle. (n.d.). Debug Info. GraalVM Documentation. Retrieved October 15, 2024, from https://www.graalvm.org/jdk17/reference-manual/native-image/debugging-and-diagnostics/DebugInfo/

[2] Oracle. (n.d.). Guide: Debug Native Image Process. GraalVM Documentation. Retrieved October 13, 2024, from https://www.graalvm.org/latest/reference-manual/native-image/guides/debug-native-image-process/

What is GDB?#

LLDB#

Java Debugger#

Diferences from LLDB#

GraalVM Native Image#

Identifying source code location#

Debug GraalVM with GDB#

Which GDB to Use?#

Supported Features#

Debug Information#

Start GDB debugging program#

Check types#

Set Breakpoint#

Inspect Breakpoint#

Debugging with Isolates#

Other information#

hub#

this#

Downcasting#

Isolates#

Run GDB in Docker#

Reference#

`this`#