We used to always determine the number of CPU dies by counting unique values in `/sys/devices/system/cpu/cpu*/topology/die_id` files. This method works on x64 systems but is ineffective on ARM64. The `topology_die_id` function is defined for x64 architectures in `arch/x86/include/asm/topology.h` but is not implemented for ARM64 in `arch/arm64/include/asm/topology.h`. In Linux kernel 5.15 (used in Ubuntu 22.04), the `die_id` attribute is exposed with a value of -1 if `topology_die_id` is not defined for the architecture. Therefore, the `die_id` file for ARM64 consistently has the value -1, causing our method of counting unique values to always return 1 on Ubuntu 22.04, regardless of the actual number of dies. This can cause issues if number of sockets is more than 1, since our code assumes `total_dies` is a multiple of `total_sockets`. Starting with Linux kernel 5.17, a change was introduced [1] to expose the `die_id` file only if `topology_die_id` is defined for the architecture. Consequently, in Linux kernel 6.8 (used in Ubuntu 24.04), this file is absent for ARM64 systems. As a result, our method of counting unique values now produces 0 on Ubuntu 24.04. Given the lack of a straightforward way to determine the number of dies on ARM64 systems, this patch sets `total_dies` equal to `total_sockets`, assuming one die per socket. [1] https://github.com/torvalds/linux/commit/2c4dcd7
1.2 KiB
1.2 KiB