summaryrefslogtreecommitdiff
path: root/README.md
blob: 29c60762aacd35235485235eb5d5ec6a4906caf2 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
# build-id

Read your own `.note.gnu.build-id`

## Background

The `.note.gnu.build-id` section in an [ELF](https://en.wikipedia.org/wiki/Executable_and_Linkable_Format) binary contains a "strongly unique embedded identifier".

[binutils](https://www.gnu.org/software/binutils/)' `ld` has supported the `--build-id=...` option since version 2.18 (released 2007). When used, with a `sha1` or `md5` argument it directs `ld` to insert an ELF section `.note.gnu.build-id` into the binary containing a hash of the normative parts of the output—that is, an identifier that uniquely identifies the output file.

Its originally intended purpose (described [here](https://fedoraproject.org/wiki/Releases/FeatureBuildId)) is to simplify and improve debugging tools, but it is occasionally useful for a program to be able to read its own build-id. [Mesa](https://www.mesa3d.org/) uses the build-id of the running OpenGL or Vulkan driver as a way of identifying its on-disk cache of pre-compiled shader programs.

I spent a good amount of time researching possible ways of uniquely identifying the running OpenGL or Vulkan driver, and I saw that [others had similar questions](https://stackoverflow.com/questions/17637745/can-a-program-read-its-own-elf-section).

## The Problem

[Mesa](https://mesa3d.org/), the software project providing OpenGL and Vulkan on Linux, needs to identify its on-disk cache of compiled shader programs. Shader programs compiled by one version of Mesa may not work (or worse: cause GPU hangs) with another version, so how can we know whether the running version of Mesa generated those cached files?

I found the `--build-id=...` flag but struggled to find a way to access the identifier it generates from within a running process.

## The Solution
The `dl_iterate_phdr` function is the critical piece of the puzzle, allowing the application to inspect the shared objects it has loaded. A callback function searches the program headers of each object loaded and finds the appropriate `.note.gnu.build-id` section. Mesa's build-id is included in the data that's hashed to provide the key to look up a shader in the on-disk cache. This ensures that Mesa will only load shader programs that were produced by an identical build.

With the problem now solved and the code in successful use in Mesa since 2017, my hope is to make the technique more widely known and in doing so to save others time. The code is very small and MIT licensed, so feel free to include the two source files in your project.

## API
### Usage
Retrieve an opaque pointer to the `.note.gnu.build-id` ELF segment in the process's address space using either the filename of a loaded ELF binary or a symbol address. From the returned pointer, access the build-id and its length in bytes.

The API consists of only four functions and an opaque struct data type.
```c
struct build_id_note;
```

#### Find the `.note.gnu.build-id` section given the filename of the ELF binary
```c
const struct build_id_note *
build_id_find_nhdr_by_name(const char *name);
```

Returns `NULL` on failure.

#### Find the `.note.gnu.build-id` section given a symbol in the ELF binary
```c
const struct build_id_note *
build_id_find_nhdr_by_symbol(const void *symbol);
```

Returns `NULL` on failure.

#### Return the length (in bytes) of the build-id
```c
ElfW(Word)
build_id_length(const struct build_id_note *note);
```

#### Return a pointer to the build-id
```c
const uint8_t *
build_id_data(const struct build_id_note *note);
```

## Examples
Some demonstrations of the API are provided:
  * [test.c](test.c) - Retrieves its own build-id
  * [so-test.c](so-test.c) - Retrieves the build-id of a linked shared object
  * [dlopen-test.c](dlopen-test.c) - Retrieves the build-id of a `dlopen`'d shared object

```sh
$ ./build-id
Build ID: 5a9f352b656d36bd95b0cec8a31679dac872f5be
$ LD_LIBRARY_PATH=. ./so-build-id
Build ID: 79588ab64fe9fe95bce4243e26aee4449517434e
$ ./dlopen-build-id
Build ID: 79588ab64fe9fe95bce4243e26aee4449517434e
```

Separately, the `file` command can retrieve the build-ids:
```sh
$ file build-id
build-id: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=5a9f352b656d36bd95b0cec8a31679dac872f5be, for GNU/Linux 3.2.0, with debug_info, not stripped
$ file libbuild-id.so 
libbuild-id.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=79588ab64fe9fe95bce4243e26aee4449517434e, not stripped
```

### Building
A simple `Makefile` builds the example programs with `-Wl,--build-id=sha1` (and `-fPIC`; see [Caveats](#caveats))
```sh
$ make
```

### Testing
`make check`  runs the example programs and verifies that they output the same build-id as reported by `file`.
```sh
$ make check
```

## Caveats
### -fPIC
Looking up a build-id given a symbol name (with `build_id_find_nhdr_by_symbol`) requires a call to the `dladdr` function. Quoting from the `dladdr(3)` man page:

> Sometimes, the function pointers you pass to dladdr() may surprise you.  On some architectures (notably i386 and x86-64), dli_fname and dli_fbase may end up pointing back at the  object  from  which you called dladdr(), even if the function used as an argument should come from a dynamically linked library.
>
> The problem is that the function pointer will still be resolved at compile time, but merely point to the plt (Procedure Linkage Table) section of the original object (which dispatches the call after asking the dynamic linker to resolve the symbol).  To work around this, you can try to compile the code to be position-independent: then, the compiler cannot prepare the pointer at compile time  any more and gcc(1) will generate code that just loads the final symbol address from the got (Global Offset Table) at run time before passing it to dladdr().

As a result, build code with `-fPIC` to ensure that `build_id_find_nhdr_by_symbol` works as expected. In practice, I found that compiling the `so-build-id` program without `-fPIC` with clang caused the program to fail.

### build-id dependent on compiler flags
Any change that affects the code or data of the ELF binary will also result in a different build-id. It's obvious but important to note that debug and release builds will have different build-ids.

For Mesa's usage this is entirely acceptable because we expect that the vast majority of users are using distribution-provided builds of Mesa.

## Other proposed (and failed) solutions

Other proposed solutions that I tried failed for a variety of reasons:

  * "Just hash all of the source code"
  * Use a linker script to insert `start`/`end` symbols around the `.note.gnu.build-id`. Failed when reading the build-id of a shared object for unknown reasons. Incompatible with [gold](https://en.wikipedia.org/wiki/Gold_(linker)), which does not use linker scripts.
  * Use `dladdr()` to find the path and name of the binary. Open and read ELF sections. Effectively the same as what the code in this repository does, but without the guarantee that the binary you read from the disk is the same one that is executing in the current process.