We used to always determine the number of CPU dies by counting unique values in `/sys/devices/system/cpu/cpu*/topology/die_id` files. This method works on x64 systems but is ineffective on ARM64. The `topology_die_id` function is defined for x64 architectures in `arch/x86/include/asm/topology.h` but is not implemented for ARM64 in `arch/arm64/include/asm/topology.h`. In Linux kernel 5.15 (used in Ubuntu 22.04), the `die_id` attribute is exposed with a value of -1 if `topology_die_id` is not defined for the architecture. Therefore, the `die_id` file for ARM64 consistently has the value -1, causing our method of counting unique values to always return 1 on Ubuntu 22.04, regardless of the actual number of dies. This can cause issues if number of sockets is more than 1, since our code assumes `total_dies` is a multiple of `total_sockets`. Starting with Linux kernel 5.17, a change was introduced [1] to expose the `die_id` file only if `topology_die_id` is defined for the architecture. Consequently, in Linux kernel 6.8 (used in Ubuntu 24.04), this file is absent for ARM64 systems. As a result, our method of counting unique values now produces 0 on Ubuntu 24.04. Given the lack of a straightforward way to determine the number of dies on ARM64 systems, this patch sets `total_dies` equal to `total_sockets`, assuming one die per socket. [1] https://github.com/torvalds/linux/commit/2c4dcd7
41 lines
1.2 KiB
Ruby
41 lines
1.2 KiB
Ruby
# frozen_string_literal: true
|
|
|
|
class Prog::LearnCpu < Prog::Base
|
|
subject_is :sshable
|
|
CpuTopology = Struct.new(:total_cpus, :total_cores, :total_dies, :total_sockets, keyword_init: true)
|
|
|
|
def get_arch
|
|
arch = sshable.cmd("common/bin/arch").strip
|
|
fail "BUG: unexpected CPU architecture" unless ["arm64", "x64"].include?(arch)
|
|
arch
|
|
end
|
|
|
|
def get_topology
|
|
s = sshable.cmd("/usr/bin/lscpu -Jye")
|
|
parsed = JSON.parse(s).fetch("cpus").map { |cpu|
|
|
[cpu.fetch("socket"), cpu.fetch("core")]
|
|
}
|
|
cpus = parsed.count
|
|
sockets = parsed.map { |socket, _| socket }.uniq.count
|
|
cores = parsed.uniq.count
|
|
|
|
CpuTopology.new(total_cpus: cpus, total_cores: cores, total_dies: 0,
|
|
total_sockets: sockets)
|
|
end
|
|
|
|
def count_dies(arch:, total_sockets:)
|
|
# Linux kernel doesn't provide die_id information for arm64.
|
|
return total_sockets if arch == "arm64"
|
|
|
|
die_ids = sshable.cmd("cat /sys/devices/system/cpu/cpu*/topology/die_id").split("\n")
|
|
die_ids.uniq.count
|
|
end
|
|
|
|
label def start
|
|
arch = get_arch
|
|
topo = get_topology
|
|
topo.total_dies = count_dies(total_sockets: topo.total_sockets, arch: arch)
|
|
pop(arch: arch, **topo.to_h)
|
|
end
|
|
end
|