I pondered the question whether there is a simple way to have a static
(as in static lifetime) read-only associative array in C, with strings as keys,
mapping to any kind of data, ideally without having to manually handle its
lifetime. So, basically similar to what one would get with standard static,
const arrays, like the following:
extern const int month_days[];
const int month_days[] = { 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 };
Turns it it's not too hard, actually, at least on systems with dynamic linkers. We would need a
dynamic symbol for each entry, allowing for lookups via dlsym(3), getting back a pointer to our data.
Staying with our example above, let's say we want to lookup the same by name:
const int month_day_jan = 31;
const int month_day_feb = 28;
const int month_day_mar = 31;
const int month_day_apr = 30;
const int month_day_may = 31;
const int month_day_jun = 30;
const int month_day_jul = 31;
const int month_day_aug = 31;
const int month_day_sep = 30;
const int month_day_oct = 31;
const int month_day_nov = 30;
const int month_day_dec = 31;
Now when linking, we need to declare those symbols to be dynamic, this can be done for example with
the --dynamic-list linker flag, and a file like the following:
{
month_day_jan;
month_day_feb;
month_day_mar;
month_day_apr;
month_day_may;
month_day_jun;
month_day_jul;
month_day_aug;
month_day_sep;
month_day_oct;
month_day_nov;
month_day_dec;
};
Now we can do lookups like the following, for example:
int x = *(int*)dlsym(NULL, "month_day_oct");
This is simple and straightforward, and doesn't come as a surprise - after all
this is how object files are structured and how linkers work. However, I never
considered (ab)using the dynamic linker as some sort of dynamic, associative
array lookup equivalent.
The upsides are that the data is embedded, that its lifetime
is static, that you can use strings as keys, that you don't have to do any
memory management (like fill a map at startup and free it at the end), that
dlsym(3) lookups are efficiently implemented (well, most likely at least), etc.
There are downsides, however:
- the keys being symbol names are limited to alphanumeric characters and underscores, and cannot start with a number (some platforms might allow for other characters)
- they are potentially name-mangled by the compiler (use objdump -t to check)
- the names must be globally unique and are subject to name clashes with other unrelated symbols
Let's look at another example, embedding binary data directly, without using any C code,
by also making sure that the data is in the .rodata section, and creating the
dynamic symbol list from our object file:
ld -r -b binary -o bins.o file1.png folder_x/otherfile.txt
objcopy --rename-section .data=.rodata,contents,alloc,load,readonly bins.o
nm bins.o | awk 'BEGIN{print "{"}{print $3";"}END{print "};"}' > bins.symlst
This will actually create 3 symbols per file, with a filename-based symbol name
and some prefix and suffixes, all pretty self explanatory:
_binary_file1_png_end
_binary_file1_png_size
_binary_file1_png_start
_binary_folder_x_otherfile_txt_end
_binary_folder_x_otherfile_txt_size
_binary_folder_x_otherfile_txt_start