Hello!
v0.1? Yeah, this is v0 from gitlab, rebased onto v6.0-rc3.
This series has only been lightly tested....
---
Hello!
This series adds what looks like cpuhotplug support to arm64 for use in virtual machines. It does this by moving the cpu_register() calls for architectures that support ACPI out of the arch code by using GENERIC_CPU_DEVICES, then into the ACPI processor driver.
The kubernetes folk really want to be able to add CPUs to an existing VM, in exactly the same way they do on x86. The use-case is pre-booting guests with one CPU, then adding the number that were actually needed when the workload is provisioned.
Wait? Doesn't arm64 support cpuhotplug already!? In the arm world, cpuhotplug gets used to mean removing the power from a CPU. The CPU is offline, and remains present. For x86, and ACPI, cpuhotplug has the additional step of physically removing the CPU, so that it isn't present anymore.
Arm64 doesn't support this, and can't support it: CPUs are really a slice of the SoC, and there is not enough information in the existing ACPI tables to describe which bits of the slice also got removed. Without a reference machine adding this support to the spec is a wild goose chase.
Critically: everything described in the firmware tables must remain present.
For a virtual machine this is easy as all the other bits of 'virtual SoC' are emulated, so they can (and do) remain present when a vCPU is 'removed'.
On a system that supports cpuhotplug the MADT has to describe every possible CPU at boot. Under KVM, the vGIC needs to know about every possible vCPU before the guest is started. With these constraints, virtual-cpuhotplug is really just a hypervisor/firmware policy about which CPUs can be brought online.
This series adds support for virtual-cpuhotplug as exactly that: firmware policy. This may even work on a physical machine too; for a guest the part of firmware is played by the VMM. (typically Qemu).
PSCI support is modified to return 'DENIED' if the CPU can't be brought online/enabled yet. The CPU object's _STA method's enabled bit is used to indicate firmware's current disposition. If the CPU has its enabled bit clear, it will not be registered with sysfs, and attempts to bring it online will fail. The notifications that _STA has changed its value then work in the same way, and firmware can cause the CPU to be registered some time later, allowing it to be brought online.
This creates something that looks like cpuhotplug to user-space, as the sysfs files appear and disappear, and the udev notifications look the same.
One notable difference is the CPU present mask, which is exposed via sysfs. Because the CPUs remain present throughout, they can still be seen in that mask. This value does get used by webbrowsers to estimate the number of CPUs as the CPU online mask is constantly changed on mobile phones.
Linux is tolerant of PSCI returning errors, as its always been allowed to do that. To avoid confusing OS that can't tolerate this, we'd need an additional bit in the MADT GICC flags. This series copies ACPI_MADT_ONLINE_CAPABLE, which appears to be for this purpose, but calls it ACPI_MADT_GICC_CPU_CAPABLE as it has a different bit position in the GICC.
I assume all x86 firmware vendors set the ENABLED bit in the CPU object's _STA method. This code is unconditionally enabled for all ACPI architectures. If there are problems with firmware tables on some devices, the CPUs will already be online by the time the acpi_processor_make_enabled() is called. A mismatch here causes a firmware-bug message and kernel taint. This should only affect people with broken firmware who also boot with maxcpus=1, and bring CPUs online later.
I had a go at switching the remaining architectures over to GENERIC_CPU_DEVICES, so that the Kconfig symbol can be removed, but I got stuck with powerpc and s390.
Thanks,
James Morse (22): ACPI: Move ACPI_HOTPLUG_CPU to be enabled per architecture drivers: base: Use present CPUs in GENERIC_CPU_DEVICES drivers: base: Allow parts of GENERIC_CPU_DEVICES to be overridden drivers: base: Move node_dev_init() before cpu_dev_init() arm64: setup: Switch over to GENERIC_CPU_DEVICES using arch_register_cpu() ia64/topology: Switch over to GENERIC_CPU_DEVICES x86/topology: Switch over to GENERIC_CPU_DEVICES LoongArch: Switch over to GENERIC_CPU_DEVICES ACPI: processor: Register all CPUs from acpi_processor_get_info() ACPI: Rename ACPI_HOTPLUG_CPU to include 'present' ACPI: Rename acpi_processor_hotadd_init and remove pre-processor guards ACPI: Check _STA present bit before making CPUs not present ACPI: Warn when the present bit changes but the feature is not enabled drivers: base: Implement weak arch_unregister_cpu() LoongArch: Use the __weak version of arch_unregister_cpu() arm64: acpi: Move get_cpu_for_acpi_id() to a header ACPICA: Add new MADT GICC flags fields [code first?] arm64, irqchip/gic-v3, ACPI: Move MADT GICC enabled check into a helper irqchip/gic-v3: Don't return errors from gic_acpi_match_gicc() irqchip/gic-v3: Add support for ACPI's disabled but 'online capable' CPUs ACPI: add support to register CPUs based on the _STA enabled bit arm64: document virtual CPU hotplug's expectations
Jean-Philippe Brucker (3): arm64: psci: Ignore DENIED CPUs KVM: arm64: Pass hypercalls to userspace KVM: arm64: Pass PSCI calls to userspace
Documentation/arm64/cpu-hotplug.rst | 79 ++++++++++++++++++ Documentation/arm64/index.rst | 1 + Documentation/virt/kvm/api.rst | 31 ++++++- Documentation/virt/kvm/arm/hypercalls.rst | 1 + arch/arm64/Kconfig | 1 + arch/arm64/include/asm/acpi.h | 11 +++ arch/arm64/include/asm/cpu.h | 1 - arch/arm64/include/asm/kvm_host.h | 2 + arch/arm64/kernel/acpi_numa.c | 11 --- arch/arm64/kernel/psci.c | 2 +- arch/arm64/kernel/setup.c | 13 +-- arch/arm64/kernel/smp.c | 5 +- arch/arm64/kvm/arm.c | 15 +++- arch/arm64/kvm/hypercalls.c | 28 ++++++- arch/arm64/kvm/psci.c | 13 +++ arch/ia64/Kconfig | 2 + arch/ia64/include/asm/acpi.h | 2 +- arch/ia64/include/asm/cpu.h | 11 --- arch/ia64/kernel/acpi.c | 6 +- arch/ia64/kernel/setup.c | 2 +- arch/ia64/kernel/topology.c | 35 ++------ arch/loongarch/Kconfig | 2 + arch/loongarch/kernel/topology.c | 31 +------ arch/x86/Kconfig | 2 + arch/x86/include/asm/cpu.h | 6 -- arch/x86/kernel/acpi/boot.c | 4 +- arch/x86/kernel/topology.c | 19 +---- drivers/acpi/Kconfig | 5 +- drivers/acpi/acpi_processor.c | 99 ++++++++++++++++------- drivers/acpi/processor_core.c | 2 +- drivers/base/cpu.c | 21 +++-- drivers/base/init.c | 2 +- drivers/firmware/psci/psci.c | 2 + drivers/irqchip/irq-gic-v3.c | 38 +++++---- include/acpi/actbl2.h | 1 + include/kvm/arm_hypercalls.h | 1 + include/kvm/arm_psci.h | 4 + include/linux/acpi.h | 10 ++- include/linux/cpu.h | 6 ++ include/uapi/linux/kvm.h | 2 + 40 files changed, 339 insertions(+), 190 deletions(-) create mode 100644 Documentation/arm64/cpu-hotplug.rst
arm64 doesn't support physical hotadd of CPUs that were not present at boot. Much of the platform description is in static tables which do not have update methods. arm64 does support HOTPLUG_CPU, which is backed by a firmware interface to turn CPUs on and off.
acpi_processor_hotadd_init() and acpi_processor_remove() are for adding and removing CPUs that were not present at boot. arm64 systems that do this are not supported as there is currently insufficient information in the platform description. (e.g. did the GICR get removed too?)
arm64 currently relies on the MADT enabled flag check in map_gicc_mpidr() to prevent CPUs that were not described as present at boot from being added to the system. Adding support for virtual CPU hotplug (where the vCPUs have been present the whole time) would require this check to be removed, possibly allowing physical CPUs to be added.
Instead, disable ACPI_HOTPLUG_CPU for arm64 by removing 'default y' and selecting it on the other three ACPI architectures. This allows the weak definitions of some symbols to be removed once loongarch has header defines.
Signed-off-by: James Morse james.morse@arm.com --- arch/ia64/Kconfig | 1 + arch/loongarch/Kconfig | 1 + arch/loongarch/include/asm/cpu.h | 7 +++++++ arch/x86/Kconfig | 1 + drivers/acpi/Kconfig | 1 - drivers/acpi/acpi_processor.c | 18 ------------------ 6 files changed, 10 insertions(+), 19 deletions(-)
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig index 26ac8ea15a9e..7a0ae73d81c8 100644 --- a/arch/ia64/Kconfig +++ b/arch/ia64/Kconfig @@ -15,6 +15,7 @@ config IA64 select ARCH_MIGHT_HAVE_PC_PARPORT select ARCH_MIGHT_HAVE_PC_SERIO select ACPI + select ACPI_HOTPLUG_CPU if ACPI select ACPI_NUMA if NUMA select ARCH_ENABLE_MEMORY_HOTPLUG select ARCH_ENABLE_MEMORY_HOTREMOVE diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig index 26aeb1408e56..9788233d4fab 100644 --- a/arch/loongarch/Kconfig +++ b/arch/loongarch/Kconfig @@ -4,6 +4,7 @@ config LOONGARCH default y select ACPI select ACPI_GENERIC_GSI if ACPI + select ACPI_HOTPLUG_CPU if ACPI select ACPI_MCFG if ACPI select ACPI_SYSTEM_POWER_STATES_SUPPORT if ACPI select ARCH_BINFMT_ELF_STATE diff --git a/arch/loongarch/include/asm/cpu.h b/arch/loongarch/include/asm/cpu.h index 754f28506791..96d006c59149 100644 --- a/arch/loongarch/include/asm/cpu.h +++ b/arch/loongarch/include/asm/cpu.h @@ -124,4 +124,11 @@ enum cpu_type_enum { #define LOONGARCH_CPU_GUESTID BIT_ULL(CPU_FEATURE_GUESTID) #define LOONGARCH_CPU_HYPERVISOR BIT_ULL(CPU_FEATURE_HYPERVISOR)
+#if !defined(__ASSEMBLY__) +#ifdef CONFIG_HOTPLUG_CPU +int arch_register_cpu(int num); +void arch_unregister_cpu(int num); +#endif +#endif /* ! __ASSEMBLY__ */ + #endif /* _ASM_CPU_H */ diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index f9920f1341c8..6cd4ff7b2cab 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -59,6 +59,7 @@ config X86 # select ACPI_LEGACY_TABLES_LOOKUP if ACPI select ACPI_SYSTEM_POWER_STATES_SUPPORT if ACPI + select ACPI_HOTPLUG_CPU if ACPI select ARCH_32BIT_OFF_T if X86_32 select ARCH_CLOCKSOURCE_INIT select ARCH_CORRECT_STACKTRACE_ON_KRETPROBE diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig index 7802d8846a8d..411601f34ffd 100644 --- a/drivers/acpi/Kconfig +++ b/drivers/acpi/Kconfig @@ -311,7 +311,6 @@ config ACPI_HOTPLUG_CPU bool depends on ACPI_PROCESSOR && HOTPLUG_CPU select ACPI_CONTAINER - default y
config ACPI_PROCESSOR_AGGREGATOR tristate "Processor Aggregator" diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c index 6737b1cbf6d6..16b314340e68 100644 --- a/drivers/acpi/acpi_processor.c +++ b/drivers/acpi/acpi_processor.c @@ -150,24 +150,6 @@ static int acpi_processor_errata(void)
/* Initialization */ #ifdef CONFIG_ACPI_HOTPLUG_CPU -int __weak acpi_map_cpu(acpi_handle handle, - phys_cpuid_t physid, u32 acpi_id, int *pcpu) -{ - return -ENODEV; -} - -int __weak acpi_unmap_cpu(int cpu) -{ - return -ENODEV; -} - -int __weak arch_register_cpu(int cpu) -{ - return -ENODEV; -} - -void __weak arch_unregister_cpu(int cpu) {} - static int acpi_processor_hotadd_init(struct acpi_processor *pr) { unsigned long long sta;
The four ACPI architectures only create sysfs entries using register_cpu() for present CPUs, whereas GENERIC_CPU_DEVICES does this for possible CPUs.
Only two of the eight architectures that use GENERIC_CPU_DEVICES have a distinction between present and possible CPUs.
To allow all four ACPI architectures to use GENERIC_CPU_DEVICES, change it to use for_each_present_cpu().
The following architectures use GENERIC_CPU_DEVICES but are not SMP, so possible == present: * m68k * microblaze * nios2
The following architectures use GENERIC_CPU_DEVICES and consider possible == present: * csky: setup_smp() * hexagon: compare smp_start_cpus() and smp_prepare_cpus() * parisc: smp_prepare_boot_cpu() marks the boot cpu as present, processor_probe() sets possible for all CPUs and present for all CPUs except the boot cpu.
um appears to be a subarchitecture of x86.
The remaining architecture using GENERIC_CPU_DEVICES is openrisc, where smp_init_cpus() makes all CPUs < NR_CPUS possible, whereas smp_prepare_cpus() only makes CPUs < setup_max_cpus present. After this change, openrisc systems that boot with max_cpus=1 would not see other CPUs present in sysfs. This should not be a problem as these CPUs can't be brought online as _cpu_up() checks cpu_present().
Signed-off-by: James Morse james.morse@arm.com --- drivers/base/cpu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c index 4c98849577d4..cf6407c34ede 100644 --- a/drivers/base/cpu.c +++ b/drivers/base/cpu.c @@ -500,7 +500,7 @@ static void __init cpu_dev_register_generic(void) #ifdef CONFIG_GENERIC_CPU_DEVICES int i;
- for_each_possible_cpu(i) { + for_each_present_cpu(i) { if (register_cpu(&per_cpu(cpu_devices, i), i)) panic("Failed to register CPU device"); }
architectures often have extra per-cpu work that needs doing before a CPU is registered, often to determine if a CPU is hotpluggable.
To allow more architectures to use GENERIC_CPU_DEVICES, wrap the call as a __weak arch_register_cpu(). This aligns with the way x86, ia64 and loongarch register hotplug CPUs when they become present.
ACPI's acpi_processor.c also has a __weak version of this symbol because arm64 doesn't define one. The duplicate __weak definitions are only a problem if arm64 selects GENERIC_CPU_DEVICES without defining one. This gets fixed up in later patches.
Signed-off-by: James Morse james.morse@arm.com --- arch/ia64/include/asm/cpu.h | 1 - arch/loongarch/include/asm/cpu.h | 1 - arch/x86/include/asm/cpu.h | 1 - drivers/base/cpu.c | 14 ++++++++++---- include/linux/cpu.h | 5 +++++ 5 files changed, 15 insertions(+), 7 deletions(-)
diff --git a/arch/ia64/include/asm/cpu.h b/arch/ia64/include/asm/cpu.h index db125df9e088..a3e690e685e5 100644 --- a/arch/ia64/include/asm/cpu.h +++ b/arch/ia64/include/asm/cpu.h @@ -16,7 +16,6 @@ DECLARE_PER_CPU(struct ia64_cpu, cpu_devices); DECLARE_PER_CPU(int, cpu_state);
#ifdef CONFIG_HOTPLUG_CPU -extern int arch_register_cpu(int num); extern void arch_unregister_cpu(int); #endif
diff --git a/arch/loongarch/include/asm/cpu.h b/arch/loongarch/include/asm/cpu.h index 96d006c59149..ee5703fa4121 100644 --- a/arch/loongarch/include/asm/cpu.h +++ b/arch/loongarch/include/asm/cpu.h @@ -126,7 +126,6 @@ enum cpu_type_enum {
#if !defined(__ASSEMBLY__) #ifdef CONFIG_HOTPLUG_CPU -int arch_register_cpu(int num); void arch_unregister_cpu(int num); #endif #endif /* ! __ASSEMBLY__ */ diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h index 8cbf623f0ecf..f06b05843c45 100644 --- a/arch/x86/include/asm/cpu.h +++ b/arch/x86/include/asm/cpu.h @@ -28,7 +28,6 @@ struct x86_cpu { };
#ifdef CONFIG_HOTPLUG_CPU -extern int arch_register_cpu(int num); extern void arch_unregister_cpu(int); extern void start_cpu0(void); #ifdef CONFIG_DEBUG_HOTPLUG_CPU0 diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c index cf6407c34ede..178936533d87 100644 --- a/drivers/base/cpu.c +++ b/drivers/base/cpu.c @@ -492,19 +492,25 @@ bool cpu_is_hotpluggable(unsigned int cpu) EXPORT_SYMBOL_GPL(cpu_is_hotpluggable);
#ifdef CONFIG_GENERIC_CPU_DEVICES -static DEFINE_PER_CPU(struct cpu, cpu_devices); +DEFINE_PER_CPU(struct cpu, cpu_devices); + +int __weak arch_register_cpu(int cpu) +{ + return register_cpu(&per_cpu(cpu_devices, cpu), cpu); +} #endif
static void __init cpu_dev_register_generic(void) { -#ifdef CONFIG_GENERIC_CPU_DEVICES int i;
+ if (!IS_ENABLED(CONFIG_GENERIC_CPU_DEVICES)) + return; + for_each_present_cpu(i) { - if (register_cpu(&per_cpu(cpu_devices, i), i)) + if (arch_register_cpu(i)) panic("Failed to register CPU device"); } -#endif }
#ifdef CONFIG_GENERIC_CPU_VULNERABILITIES diff --git a/include/linux/cpu.h b/include/linux/cpu.h index 314802f98b9d..86e79e702325 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -75,12 +75,17 @@ extern __printf(4, 5) struct device *cpu_device_create(struct device *parent, void *drvdata, const struct attribute_group **groups, const char *fmt, ...); +extern int arch_register_cpu(int cpu); #ifdef CONFIG_HOTPLUG_CPU extern void unregister_cpu(struct cpu *cpu); extern ssize_t arch_cpu_probe(const char *, size_t); extern ssize_t arch_cpu_release(const char *, size_t); #endif
+#ifdef CONFIG_GENERIC_CPU_DEVICES +DECLARE_PER_CPU(struct cpu, cpu_devices); +#endif + /* * These states are not related to the core CPU hotplug mechanism. They are * used by various (sub)architectures to track internal state
NUMA systems require the node descriptions to be ready before CPUs are registered. This is so that the node symlinks can be created in sysfs.
Currently none of these platforms use GENERIC_CPU_DEVICES.
Move node_dev_init() before cpu_dev_init() so that NUMA architectures can use GENERIC_CPU_DEVICES.
Signed-off-by: James Morse james.morse@arm.com --- drivers/base/init.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/base/init.c b/drivers/base/init.c index 397eb9880cec..05348dcec678 100644 --- a/drivers/base/init.c +++ b/drivers/base/init.c @@ -35,8 +35,8 @@ void __init driver_init(void) of_core_init(); platform_bus_init(); auxiliary_bus_init(); + node_dev_init(); cpu_dev_init(); memory_dev_init(); - node_dev_init(); container_dev_init(); }
To allow ACPI's _STA value to hide CPUs that are present, but not available to online right now due to VMM of firmware policy, the register_cpu() call needs to be made by the ACPI machinery when ACPI is in use.
Switching to GENERIC_CPU_DEVICES is an intermediate step to allow all four ACPI architectures to be modified at once.
Switch over to GENERIC_CPU_DEVICES, and provide an arch_register_cpu() that populates the hotpluggable flag. arch_register_cpu() is also the interface the ACPI machinery expects.
The struct cpu in struct cpuinfo_arm64 is never used directly, remove it to use the one GENERIC_CPU_DEVICES provides.
Signed-off-by: James Morse james.morse@arm.com --- arch/arm64/Kconfig | 1 + arch/arm64/include/asm/cpu.h | 1 - arch/arm64/kernel/setup.c | 13 ++++--------- 3 files changed, 5 insertions(+), 10 deletions(-)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 9fb9fff08c94..a8e5cdfda38e 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -124,6 +124,7 @@ config ARM64 select GENERIC_ARCH_TOPOLOGY select GENERIC_CLOCKEVENTS_BROADCAST select GENERIC_CPU_AUTOPROBE + select GENERIC_CPU_DEVICES select GENERIC_CPU_VULNERABILITIES select GENERIC_EARLY_IOREMAP select GENERIC_IDLE_POLL_SETUP diff --git a/arch/arm64/include/asm/cpu.h b/arch/arm64/include/asm/cpu.h index fd7a92219eea..eca1aea89e81 100644 --- a/arch/arm64/include/asm/cpu.h +++ b/arch/arm64/include/asm/cpu.h @@ -38,7 +38,6 @@ struct cpuinfo_32bit { };
struct cpuinfo_arm64 { - struct cpu cpu; struct kobject kobj; u64 reg_ctr; u64 reg_cntfrq; diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c index fea3223704b6..6047f96e77ad 100644 --- a/arch/arm64/kernel/setup.c +++ b/arch/arm64/kernel/setup.c @@ -391,19 +391,14 @@ static inline bool cpu_can_disable(unsigned int cpu) return false; }
-static int __init topology_init(void) +int arch_register_cpu(int num) { - int i; + struct cpu *cpu = &per_cpu(cpu_devices, num);
- for_each_possible_cpu(i) { - struct cpu *cpu = &per_cpu(cpu_data.cpu, i); - cpu->hotpluggable = cpu_can_disable(i); - register_cpu(cpu, i); - } + cpu->hotpluggable = cpu_can_disable(num);
- return 0; + return register_cpu(cpu, num); } -subsys_initcall(topology_init);
static void dump_kernel_offset(void) {
ia64 has its own arch specific data structure for cpus: struct ia64_cpu. This has one member, making ia64's cpu_devices the same as that provided be GENERIC_CPU_DEVICES. ia64 craetes a percpu struct ia64_cpu called cpu_devices, which has no users. Instead it uses the struct ia64_cpu named sysfs_cpus allocated at boot.
Remove the arch specific structure allocation and initialisation. ia64's arch_register_cpu() now overrides the weak version from GENERIC_CPU_DEVICES, and uses the percpu cpu_devices defined by core code.
All uses of sysfs_cpus are changed to use the percpu cpu_devices.
This is an intermediate step to the logic being moved to drivers/acpi, where GENERIC_CPU_DEVICES will do the work when booting with acpi=off.
Signed-off-by: James Morse james.morse@arm.com --- arch/ia64/Kconfig | 1 + arch/ia64/include/asm/cpu.h | 6 ------ arch/ia64/kernel/topology.c | 35 +++++------------------------------ 3 files changed, 6 insertions(+), 36 deletions(-)
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig index 7a0ae73d81c8..0a60c049c67a 100644 --- a/arch/ia64/Kconfig +++ b/arch/ia64/Kconfig @@ -40,6 +40,7 @@ config IA64 select HAVE_FUNCTION_DESCRIPTORS select HAVE_VIRT_CPU_ACCOUNTING select HUGETLB_PAGE_SIZE_VARIABLE if HUGETLB_PAGE + select GENERIC_CPU_DEVICES select GENERIC_IRQ_PROBE select GENERIC_PENDING_IRQ if SMP select GENERIC_IRQ_SHOW diff --git a/arch/ia64/include/asm/cpu.h b/arch/ia64/include/asm/cpu.h index a3e690e685e5..6e9786c6ec98 100644 --- a/arch/ia64/include/asm/cpu.h +++ b/arch/ia64/include/asm/cpu.h @@ -7,12 +7,6 @@ #include <linux/topology.h> #include <linux/percpu.h>
-struct ia64_cpu { - struct cpu cpu; -}; - -DECLARE_PER_CPU(struct ia64_cpu, cpu_devices); - DECLARE_PER_CPU(int, cpu_state);
#ifdef CONFIG_HOTPLUG_CPU diff --git a/arch/ia64/kernel/topology.c b/arch/ia64/kernel/topology.c index 94a848b06f15..8f5cafde2bc9 100644 --- a/arch/ia64/kernel/topology.c +++ b/arch/ia64/kernel/topology.c @@ -26,8 +26,6 @@ #include <asm/numa.h> #include <asm/cpu.h>
-static struct ia64_cpu *sysfs_cpus; - void arch_fix_phys_package_id(int num, u32 slot) { #ifdef CONFIG_SMP @@ -41,50 +39,27 @@ EXPORT_SYMBOL_GPL(arch_fix_phys_package_id); #ifdef CONFIG_HOTPLUG_CPU int __ref arch_register_cpu(int num) { + struct cpu *cpu = &per_cpu(cpu_devices, num); + /* * If CPEI can be re-targeted or if this is not * CPEI target, then it is hotpluggable */ if (can_cpei_retarget() || !is_cpu_cpei_target(num)) - sysfs_cpus[num].cpu.hotpluggable = 1; + cpu->hotpluggable = 1; map_cpu_to_node(num, node_cpuid[num].nid); - return register_cpu(&sysfs_cpus[num].cpu, num); + return register_cpu(cpu, num); } EXPORT_SYMBOL(arch_register_cpu);
void __ref arch_unregister_cpu(int num) { - unregister_cpu(&sysfs_cpus[num].cpu); + unregister_cpu(&per_cpu(cpu_devices, num)); unmap_cpu_from_node(num, cpu_to_node(num)); } EXPORT_SYMBOL(arch_unregister_cpu); -#else -static int __init arch_register_cpu(int num) -{ - return register_cpu(&sysfs_cpus[num].cpu, num); -} #endif /*CONFIG_HOTPLUG_CPU*/
- -static int __init topology_init(void) -{ - int i, err = 0; - - sysfs_cpus = kcalloc(NR_CPUS, sizeof(struct ia64_cpu), GFP_KERNEL); - if (!sysfs_cpus) - panic("kzalloc in topology_init failed - NR_CPUS too big?"); - - for_each_present_cpu(i) { - if((err = arch_register_cpu(i))) - goto out; - } -out: - return err; -} - -subsys_initcall(topology_init); - - /* * Export cpu cache information through sysfs */
Now that GENERIC_CPU_DEVICES calls arch_register_cpu(), which can be overridden by the arch code, switch over to this to allow common code to choose when the register_cpu() call is made.
x86's struct cpus come from struct x86_cpu, which has no other members or users. Remove this and use the version defined by common code.
This is an intermediate step to the logic being moved to drivers/acpi, where GENERIC_CPU_DEVICES will do the work when booting with acpi=off.
Signed-off-by: James Morse james.morse@arm.com --- arch/x86/Kconfig | 1 + arch/x86/include/asm/cpu.h | 4 ---- arch/x86/kernel/topology.c | 19 +++---------------- 3 files changed, 4 insertions(+), 20 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 6cd4ff7b2cab..c140c64c87c1 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -137,6 +137,7 @@ config X86 select GENERIC_CLOCKEVENTS_MIN_ADJUST select GENERIC_CMOS_UPDATE select GENERIC_CPU_AUTOPROBE + select GENERIC_CPU_DEVICES select GENERIC_CPU_VULNERABILITIES select GENERIC_EARLY_IOREMAP select GENERIC_ENTRY diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h index f06b05843c45..207e48dabfd0 100644 --- a/arch/x86/include/asm/cpu.h +++ b/arch/x86/include/asm/cpu.h @@ -23,10 +23,6 @@ static inline void prefill_possible_map(void) {}
#endif /* CONFIG_SMP */
-struct x86_cpu { - struct cpu cpu; -}; - #ifdef CONFIG_HOTPLUG_CPU extern void arch_unregister_cpu(int); extern void start_cpu0(void); diff --git a/arch/x86/kernel/topology.c b/arch/x86/kernel/topology.c index 8617d1ed9d31..ff11fe82ad80 100644 --- a/arch/x86/kernel/topology.c +++ b/arch/x86/kernel/topology.c @@ -35,8 +35,6 @@ #include <asm/io_apic.h> #include <asm/cpu.h>
-static DEFINE_PER_CPU(struct x86_cpu, cpu_devices); - #ifdef CONFIG_HOTPLUG_CPU
#ifdef CONFIG_BOOTPARAM_HOTPLUG_CPU0 @@ -131,15 +129,15 @@ int arch_register_cpu(int num) } } if (num || cpu0_hotpluggable) - per_cpu(cpu_devices, num).cpu.hotpluggable = 1; + per_cpu(cpu_devices, num).hotpluggable = 1;
- return register_cpu(&per_cpu(cpu_devices, num).cpu, num); + return register_cpu(&per_cpu(cpu_devices, num), num); } EXPORT_SYMBOL(arch_register_cpu);
void arch_unregister_cpu(int num) { - unregister_cpu(&per_cpu(cpu_devices, num).cpu); + unregister_cpu(&per_cpu(cpu_devices, num)); } EXPORT_SYMBOL(arch_unregister_cpu); #else /* CONFIG_HOTPLUG_CPU */ @@ -149,14 +147,3 @@ static int __init arch_register_cpu(int num) return register_cpu(&per_cpu(cpu_devices, num).cpu, num); } #endif /* CONFIG_HOTPLUG_CPU */ - -static int __init topology_init(void) -{ - int i; - - for_each_present_cpu(i) - arch_register_cpu(i); - - return 0; -} -subsys_initcall(topology_init);
Now that GENERIC_CPU_DEVICES calls arch_register_cpu(), which can be overridden by the arch code, switch over to this to allow common code to choose when the register_cpu() call is made.
This allows topology_init() to be removed.
This is an intermediate step to the logic being moved to drivers/acpi, where GENERIC_CPU_DEVICES will do the work when booting with acpi=off.
Signed-off-by: James Morse james.morse@arm.com --- arch/loongarch/Kconfig | 1 + arch/loongarch/kernel/topology.c | 22 +--------------------- 2 files changed, 2 insertions(+), 21 deletions(-)
diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig index 9788233d4fab..fbcd9bd9d781 100644 --- a/arch/loongarch/Kconfig +++ b/arch/loongarch/Kconfig @@ -59,6 +59,7 @@ config LOONGARCH select GENERIC_CLOCKEVENTS select GENERIC_CMOS_UPDATE select GENERIC_CPU_AUTOPROBE + select GENERIC_CPU_DEVICES select GENERIC_ENTRY select GENERIC_GETTIMEOFDAY select GENERIC_IRQ_MULTI_HANDLER diff --git a/arch/loongarch/kernel/topology.c b/arch/loongarch/kernel/topology.c index ab1a75c0b5a6..bfc1f199d07b 100644 --- a/arch/loongarch/kernel/topology.c +++ b/arch/loongarch/kernel/topology.c @@ -6,15 +6,13 @@ #include <linux/nodemask.h> #include <linux/percpu.h>
-static DEFINE_PER_CPU(struct cpu, cpu_devices); - #ifdef CONFIG_HOTPLUG_CPU int arch_register_cpu(int cpu) { int ret; struct cpu *c = &per_cpu(cpu_devices, cpu);
- c->hotpluggable = 1; + c->hotpluggable = !!cpu; ret = register_cpu(c, cpu); if (ret < 0) pr_warn("register_cpu %d failed (%d)\n", cpu, ret); @@ -32,21 +30,3 @@ void arch_unregister_cpu(int cpu) } EXPORT_SYMBOL(arch_unregister_cpu); #endif - -static int __init topology_init(void) -{ - int i, ret; - - for_each_present_cpu(i) { - struct cpu *c = &per_cpu(cpu_devices, i); - - c->hotpluggable = !!i; - ret = register_cpu(c, i); - if (ret < 0) - pr_warn("topology_init: register_cpu %d failed (%d)\n", i, ret); - } - - return 0; -} - -subsys_initcall(topology_init);
To allow ACPI to skip the call to arch_register_cpu() when the _STA value indicates the CPU can't be brought online right now, move the arch_register_cpu() call into acpi_processor_get_info().
Systems can still be booted with 'acpi=off', or in the case of arm64, not include an ACPI description at all. For these, the CPUs are registered by cpu_dev_register_generic().
This moves the CPU register logic back to a subsys_initcall(), while the memory nodes will have been registered earlier.
Signed-off-by: James Morse james.morse@arm.com --- drivers/acpi/acpi_processor.c | 7 +++++++ drivers/base/cpu.c | 2 +- 2 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c index 16b314340e68..4507934c45cd 100644 --- a/drivers/acpi/acpi_processor.c +++ b/drivers/acpi/acpi_processor.c @@ -272,6 +272,13 @@ static int acpi_processor_get_info(struct acpi_device *device) pr->id = 0; }
+ if (!invalid_logical_cpuid(pr->id) && cpu_present(pr->id)) { + int ret = arch_register_cpu(pr->id); + + if (ret) + return ret; + } + /* * Extra Processor objects may be enumerated on MP systems with * less than the max # of CPUs. They should be ignored _iff diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c index 178936533d87..0ba646022a5e 100644 --- a/drivers/base/cpu.c +++ b/drivers/base/cpu.c @@ -504,7 +504,7 @@ static void __init cpu_dev_register_generic(void) { int i;
- if (!IS_ENABLED(CONFIG_GENERIC_CPU_DEVICES)) + if (!IS_ENABLED(CONFIG_GENERIC_CPU_DEVICES) || !acpi_disabled) return;
for_each_present_cpu(i) {
The code behind ACPI_HOTPLUG_CPU allows a not-present CPU to become present. This isn't the only use of HOTPLUG_CPU. On arm64 an offline CPU may be disabled by firmware, preventing it from being brought back online, but it remains present throughout.
Adding code to prevent user-space trying to online these disabled CPUs needs some additional terminology.
Rename the Kconfig symbol CONFIG_ACPI_HOTPLUG_PRESENT_CPU to reflect that it makes possible CPUs present.
Signed-off-by: James Morse james.morse@arm.com --- arch/ia64/Kconfig | 2 +- arch/ia64/include/asm/acpi.h | 2 +- arch/ia64/kernel/acpi.c | 6 +++--- arch/ia64/kernel/setup.c | 2 +- arch/x86/Kconfig | 2 +- arch/x86/kernel/acpi/boot.c | 4 ++-- drivers/acpi/Kconfig | 4 ++-- drivers/acpi/acpi_processor.c | 10 +++++----- include/linux/acpi.h | 6 +++--- 9 files changed, 19 insertions(+), 19 deletions(-)
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig index 0a60c049c67a..d4eb8c8af2d9 100644 --- a/arch/ia64/Kconfig +++ b/arch/ia64/Kconfig @@ -15,7 +15,7 @@ config IA64 select ARCH_MIGHT_HAVE_PC_PARPORT select ARCH_MIGHT_HAVE_PC_SERIO select ACPI - select ACPI_HOTPLUG_CPU if ACPI + select ACPI_HOTPLUG_PRESENT_CPU if ACPI select ACPI_NUMA if NUMA select ARCH_ENABLE_MEMORY_HOTPLUG select ARCH_ENABLE_MEMORY_HOTREMOVE diff --git a/arch/ia64/include/asm/acpi.h b/arch/ia64/include/asm/acpi.h index 87927eb824cc..2fcdfd4ecce8 100644 --- a/arch/ia64/include/asm/acpi.h +++ b/arch/ia64/include/asm/acpi.h @@ -52,7 +52,7 @@ extern unsigned int is_cpu_cpei_target(unsigned int cpu); extern void set_cpei_target_cpu(unsigned int cpu); extern unsigned int get_cpei_target_cpu(void); extern void prefill_possible_map(void); -#ifdef CONFIG_ACPI_HOTPLUG_CPU +#ifdef CONFIG_ACPI_HOTPLUG_PRESENT_CPU extern int additional_cpus; #else #define additional_cpus 0 diff --git a/arch/ia64/kernel/acpi.c b/arch/ia64/kernel/acpi.c index 96d13cb7c19f..38b3155ce8d9 100644 --- a/arch/ia64/kernel/acpi.c +++ b/arch/ia64/kernel/acpi.c @@ -194,7 +194,7 @@ acpi_parse_plat_int_src(union acpi_subtable_headers * header, return 0; }
-#ifdef CONFIG_HOTPLUG_CPU +#ifdef CONFIG_ACPI_HOTPLUG_PRESENT_CPU unsigned int can_cpei_retarget(void) { extern int cpe_vector; @@ -711,7 +711,7 @@ int acpi_isa_irq_to_gsi(unsigned isa_irq, u32 *gsi) /* * ACPI based hotplug CPU support */ -#ifdef CONFIG_ACPI_HOTPLUG_CPU +#ifdef CONFIG_ACPI_HOTPLUG_PRESENT_CPU int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid) { #ifdef CONFIG_ACPI_NUMA @@ -822,7 +822,7 @@ int acpi_unmap_cpu(int cpu) return (0); } EXPORT_SYMBOL(acpi_unmap_cpu); -#endif /* CONFIG_ACPI_HOTPLUG_CPU */ +#endif /* CONFIG_ACPI_HOTPLUG_PRESENT_CPU */
#ifdef CONFIG_ACPI_NUMA static acpi_status acpi_map_iosapic(acpi_handle handle, u32 depth, diff --git a/arch/ia64/kernel/setup.c b/arch/ia64/kernel/setup.c index fd6301eafa9d..874b4104d99c 100644 --- a/arch/ia64/kernel/setup.c +++ b/arch/ia64/kernel/setup.c @@ -569,7 +569,7 @@ setup_arch (char **cmdline_p) #ifdef CONFIG_ACPI_NUMA acpi_numa_init(); acpi_numa_fixup(); -#ifdef CONFIG_ACPI_HOTPLUG_CPU +#ifdef CONFIG_ACPI_HOTPLUG_PRESENT_CPU prefill_possible_map(); #endif per_cpu_scan_finalize((cpumask_empty(&early_cpu_possible_map) ? diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index c140c64c87c1..6f6e4ac14bbf 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -59,7 +59,7 @@ config X86 # select ACPI_LEGACY_TABLES_LOOKUP if ACPI select ACPI_SYSTEM_POWER_STATES_SUPPORT if ACPI - select ACPI_HOTPLUG_CPU if ACPI + select ACPI_HOTPLUG_PRESENT_CPU if ACPI select ARCH_32BIT_OFF_T if X86_32 select ARCH_CLOCKSOURCE_INIT select ARCH_CORRECT_STACKTRACE_ON_KRETPROBE diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c index 907cc98b1938..5a4869d6cd87 100644 --- a/arch/x86/kernel/acpi/boot.c +++ b/arch/x86/kernel/acpi/boot.c @@ -796,7 +796,7 @@ static void __init acpi_set_irq_model_ioapic(void) /* * ACPI based hotplug support for CPU */ -#ifdef CONFIG_ACPI_HOTPLUG_CPU +#ifdef CONFIG_ACPI_HOTPLUG_PRESENT_CPU #include <acpi/processor.h>
static int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid) @@ -845,7 +845,7 @@ int acpi_unmap_cpu(int cpu) return (0); } EXPORT_SYMBOL(acpi_unmap_cpu); -#endif /* CONFIG_ACPI_HOTPLUG_CPU */ +#endif /* CONFIG_ACPI_HOTPLUG_PRESENT_CPU */
int acpi_register_ioapic(acpi_handle handle, u64 phys_addr, u32 gsi_base) { diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig index 411601f34ffd..fdcc407abc30 100644 --- a/drivers/acpi/Kconfig +++ b/drivers/acpi/Kconfig @@ -307,7 +307,7 @@ config ACPI_IPMI To compile this driver as a module, choose M here: the module will be called as acpi_ipmi.
-config ACPI_HOTPLUG_CPU +config ACPI_HOTPLUG_PRESENT_CPU bool depends on ACPI_PROCESSOR && HOTPLUG_CPU select ACPI_CONTAINER @@ -402,7 +402,7 @@ config ACPI_PCI_SLOT
config ACPI_CONTAINER bool "Container and Module Devices" - default (ACPI_HOTPLUG_MEMORY || ACPI_HOTPLUG_CPU) + default (ACPI_HOTPLUG_MEMORY || ACPI_HOTPLUG_PRESENT_CPU) help This driver supports ACPI Container and Module devices (IDs ACPI0004, PNP0A05, and PNP0A06). diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c index 4507934c45cd..6d6da38b730c 100644 --- a/drivers/acpi/acpi_processor.c +++ b/drivers/acpi/acpi_processor.c @@ -149,7 +149,7 @@ static int acpi_processor_errata(void) }
/* Initialization */ -#ifdef CONFIG_ACPI_HOTPLUG_CPU +#ifdef CONFIG_ACPI_HOTPLUG_PRESENT_CPU static int acpi_processor_hotadd_init(struct acpi_processor *pr) { unsigned long long sta; @@ -194,7 +194,7 @@ static inline int acpi_processor_hotadd_init(struct acpi_processor *pr) { return -ENODEV; } -#endif /* CONFIG_ACPI_HOTPLUG_CPU */ +#endif /* CONFIG_ACPI_HOTPLUG_PRESENT_CPU */
static int acpi_processor_get_info(struct acpi_device *device) { @@ -414,7 +414,7 @@ static int acpi_processor_add(struct acpi_device *device, return result; }
-#ifdef CONFIG_ACPI_HOTPLUG_CPU +#ifdef CONFIG_ACPI_HOTPLUG_PRESENT_CPU /* Removal */ static void acpi_processor_remove(struct acpi_device *device) { @@ -458,7 +458,7 @@ static void acpi_processor_remove(struct acpi_device *device) free_cpumask_var(pr->throttling.shared_cpu_map); kfree(pr); } -#endif /* CONFIG_ACPI_HOTPLUG_CPU */ +#endif /* CONFIG_ACPI_HOTPLUG_PRESENT_CPU */
#ifdef CONFIG_X86 static bool acpi_hwp_native_thermal_lvt_set; @@ -527,7 +527,7 @@ static const struct acpi_device_id processor_device_ids[] = { static struct acpi_scan_handler processor_handler = { .ids = processor_device_ids, .attach = acpi_processor_add, -#ifdef CONFIG_ACPI_HOTPLUG_CPU +#ifdef CONFIG_ACPI_HOTPLUG_PRESENT_CPU .detach = acpi_processor_remove, #endif .hotplug = { diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 6f64b2f3dc54..82c31a23ac3c 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -324,12 +324,12 @@ static inline int acpi_processor_evaluate_cst(acpi_handle handle, u32 cpu, } #endif
-#ifdef CONFIG_ACPI_HOTPLUG_CPU +#ifdef CONFIG_ACPI_HOTPLUG_PRESENT_CPU /* Arch dependent functions for cpu hotplug support */ int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, u32 acpi_id, int *pcpu); int acpi_unmap_cpu(int cpu); -#endif /* CONFIG_ACPI_HOTPLUG_CPU */ +#endif /* CONFIG_ACPI_HOTPLUG_PRESENT_CPU */
#ifdef CONFIG_ACPI_HOTPLUG_IOAPIC int acpi_get_ioapic_id(acpi_handle handle, u32 gsi_base, u64 *phys_addr); @@ -650,7 +650,7 @@ static inline u32 acpi_osc_ctx_get_cxl_control(struct acpi_osc_context *context) #define ACPI_GSB_ACCESS_ATTRIB_RAW_PROCESS 0x0000000F
/* Enable _OST when all relevant hotplug operations are enabled */ -#if defined(CONFIG_ACPI_HOTPLUG_CPU) && \ +#if defined(CONFIG_ACPI_HOTPLUG_PRESENT_CPU) && \ defined(CONFIG_ACPI_HOTPLUG_MEMORY) && \ defined(CONFIG_ACPI_CONTAINER) #define ACPI_HOTPLUG_OST
acpi_processor_hotadd_init() will make a CPU present by mapping it based on its hardware id.
'hotadd_init' is ambiguous once there are two different behaviours for cpu hotplug. This is for toggling the _STA present bit. Subsequent patches will add support for toggling the _STA enabled bit, named acpi_processor_make_enabled().
Rename it acpi_processor_make_present() to make it clear this is for CPUs that were not previously present.
Expose the function prototypes it uses to allow the preprocessor guards to be removed. The IS_ENABLED() check will let the compiler dead-code elimination pass remove this if it isn't going to be used.
Signed-off-by: James Morse james.morse@arm.com --- drivers/acpi/acpi_processor.c | 14 +++++--------- include/linux/acpi.h | 2 -- 2 files changed, 5 insertions(+), 11 deletions(-)
diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c index 6d6da38b730c..30a03c2cb464 100644 --- a/drivers/acpi/acpi_processor.c +++ b/drivers/acpi/acpi_processor.c @@ -149,13 +149,15 @@ static int acpi_processor_errata(void) }
/* Initialization */ -#ifdef CONFIG_ACPI_HOTPLUG_PRESENT_CPU -static int acpi_processor_hotadd_init(struct acpi_processor *pr) +static int acpi_processor_make_present(struct acpi_processor *pr) { unsigned long long sta; acpi_status status; int ret;
+ if (!IS_ENABLED(CONFIG_ACPI_HOTPLUG_PRESENT_CPU)) + return -ENODEV; + if (invalid_phys_cpuid(pr->phys_id)) return -ENODEV;
@@ -189,12 +191,6 @@ static int acpi_processor_hotadd_init(struct acpi_processor *pr) cpu_maps_update_done(); return ret; } -#else -static inline int acpi_processor_hotadd_init(struct acpi_processor *pr) -{ - return -ENODEV; -} -#endif /* CONFIG_ACPI_HOTPLUG_PRESENT_CPU */
static int acpi_processor_get_info(struct acpi_device *device) { @@ -288,7 +284,7 @@ static int acpi_processor_get_info(struct acpi_device *device) * because cpuid <-> apicid mapping is persistent now. */ if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) { - int ret = acpi_processor_hotadd_init(pr); + int ret = acpi_processor_make_present(pr);
if (ret) return ret; diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 82c31a23ac3c..34ce32126fd9 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -324,12 +324,10 @@ static inline int acpi_processor_evaluate_cst(acpi_handle handle, u32 cpu, } #endif
-#ifdef CONFIG_ACPI_HOTPLUG_PRESENT_CPU /* Arch dependent functions for cpu hotplug support */ int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, u32 acpi_id, int *pcpu); int acpi_unmap_cpu(int cpu); -#endif /* CONFIG_ACPI_HOTPLUG_PRESENT_CPU */
#ifdef CONFIG_ACPI_HOTPLUG_IOAPIC int acpi_get_ioapic_id(acpi_handle handle, u32 gsi_base, u64 *phys_addr);
When called acpi_processor_remove() unconditionally make a CPU not-present and unregisters it.
To add support for AML events where the CPU has become disabled, but remains present, the _STA method should be checked before calling acpi_processor_remove().
Rename acpi_processor_remove() acpi_processor_remove_possible(), and check the _STA before calling.
Adding the function prototype for arch_unregister_cpu() allows the preprocessor guards to be removed.
Signed-off-by: James Morse james.morse@arm.com --- drivers/acpi/acpi_processor.c | 31 +++++++++++++++++++++++++------ include/linux/cpu.h | 1 + 2 files changed, 26 insertions(+), 6 deletions(-)
diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c index 30a03c2cb464..66b695ca7945 100644 --- a/drivers/acpi/acpi_processor.c +++ b/drivers/acpi/acpi_processor.c @@ -410,13 +410,12 @@ static int acpi_processor_add(struct acpi_device *device, return result; }
-#ifdef CONFIG_ACPI_HOTPLUG_PRESENT_CPU /* Removal */ -static void acpi_processor_remove(struct acpi_device *device) +static void acpi_processor_make_not_present(struct acpi_device *device) { struct acpi_processor *pr;
- if (!device || !acpi_driver_data(device)) + if (!IS_ENABLED(CONFIG_ACPI_HOTPLUG_PRESENT_CPU)) return;
pr = acpi_driver_data(device); @@ -454,7 +453,29 @@ static void acpi_processor_remove(struct acpi_device *device) free_cpumask_var(pr->throttling.shared_cpu_map); kfree(pr); } -#endif /* CONFIG_ACPI_HOTPLUG_PRESENT_CPU */ + +static void acpi_processor_remove(struct acpi_device *device) +{ + struct acpi_processor *pr; + unsigned long long sta; + acpi_status status; + + if (!device) + return; + + pr = acpi_driver_data(device); + if (!pr || pr->id >= nr_cpu_ids || invalid_phys_cpuid(pr->phys_id)) + return; + + status = acpi_evaluate_integer(pr->handle, "_STA", NULL, &sta); + if (ACPI_FAILURE(status)) + return; + + if (cpu_present(pr->id) && !(sta & ACPI_STA_DEVICE_PRESENT)) { + acpi_processor_make_not_present(device); + return; + } +}
#ifdef CONFIG_X86 static bool acpi_hwp_native_thermal_lvt_set; @@ -523,9 +544,7 @@ static const struct acpi_device_id processor_device_ids[] = { static struct acpi_scan_handler processor_handler = { .ids = processor_device_ids, .attach = acpi_processor_add, -#ifdef CONFIG_ACPI_HOTPLUG_PRESENT_CPU .detach = acpi_processor_remove, -#endif .hotplug = { .enabled = true, }, diff --git a/include/linux/cpu.h b/include/linux/cpu.h index 86e79e702325..f6f198a3972b 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -76,6 +76,7 @@ struct device *cpu_device_create(struct device *parent, void *drvdata, const struct attribute_group **groups, const char *fmt, ...); extern int arch_register_cpu(int cpu); +extern void arch_unregister_cpu(int cpu); #ifdef CONFIG_HOTPLUG_CPU extern void unregister_cpu(struct cpu *cpu); extern ssize_t arch_cpu_probe(const char *, size_t);
ACPI firmware can trigger the events to add and remove CPUs, but the OS may not support this.
Print a warning when this happens.
This gives early warning on arm64 systems that don't support CONFIG_ACPI_HOTPLUG_PRESENT_CPU, as making CPUs not present has side effects for other parts of the system.
Signed-off-by: James Morse james.morse@arm.com --- arch/loongarch/include/asm/cpu.h | 2 +- drivers/acpi/acpi_processor.c | 8 ++++++-- 2 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/arch/loongarch/include/asm/cpu.h b/arch/loongarch/include/asm/cpu.h index ee5703fa4121..1e2c7c61dbea 100644 --- a/arch/loongarch/include/asm/cpu.h +++ b/arch/loongarch/include/asm/cpu.h @@ -126,7 +126,7 @@ enum cpu_type_enum {
#if !defined(__ASSEMBLY__) #ifdef CONFIG_HOTPLUG_CPU -void arch_unregister_cpu(int num); +extern void arch_unregister_cpu(int); #endif #endif /* ! __ASSEMBLY__ */
diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c index 66b695ca7945..1bd6e4b8ab66 100644 --- a/drivers/acpi/acpi_processor.c +++ b/drivers/acpi/acpi_processor.c @@ -155,8 +155,10 @@ static int acpi_processor_make_present(struct acpi_processor *pr) acpi_status status; int ret;
- if (!IS_ENABLED(CONFIG_ACPI_HOTPLUG_PRESENT_CPU)) + if (!IS_ENABLED(CONFIG_ACPI_HOTPLUG_PRESENT_CPU)) { + pr_err_once("Changing CPU present bit is not supported"); return -ENODEV; + }
if (invalid_phys_cpuid(pr->phys_id)) return -ENODEV; @@ -415,8 +417,10 @@ static void acpi_processor_make_not_present(struct acpi_device *device) { struct acpi_processor *pr;
- if (!IS_ENABLED(CONFIG_ACPI_HOTPLUG_PRESENT_CPU)) + if (!IS_ENABLED(CONFIG_ACPI_HOTPLUG_PRESENT_CPU)) { + pr_err_once("Changing CPU present bit is not supported"); return; + }
pr = acpi_driver_data(device); if (pr->id >= nr_cpu_ids)
Add arch_unregister_cpu() to allow the ACPI machinery to call unregister_cpu(). This is enough for arm64, (and shortly loongarch) but needs to be overridden by x86 and ia64 who need to do more work.
CC: Jean-Philippe Brucker jean-philippe@linaro.org Signed-off-by: James Morse james.morse@arm.com --- arch/ia64/include/asm/cpu.h | 4 ---- arch/loongarch/include/asm/cpu.h | 6 ------ arch/x86/include/asm/cpu.h | 1 - drivers/base/cpu.c | 5 +++++ 4 files changed, 5 insertions(+), 11 deletions(-)
diff --git a/arch/ia64/include/asm/cpu.h b/arch/ia64/include/asm/cpu.h index 6e9786c6ec98..3b36c6a382bb 100644 --- a/arch/ia64/include/asm/cpu.h +++ b/arch/ia64/include/asm/cpu.h @@ -9,8 +9,4 @@
DECLARE_PER_CPU(int, cpu_state);
-#ifdef CONFIG_HOTPLUG_CPU -extern void arch_unregister_cpu(int); -#endif - #endif /* _ASM_IA64_CPU_H_ */ diff --git a/arch/loongarch/include/asm/cpu.h b/arch/loongarch/include/asm/cpu.h index 1e2c7c61dbea..754f28506791 100644 --- a/arch/loongarch/include/asm/cpu.h +++ b/arch/loongarch/include/asm/cpu.h @@ -124,10 +124,4 @@ enum cpu_type_enum { #define LOONGARCH_CPU_GUESTID BIT_ULL(CPU_FEATURE_GUESTID) #define LOONGARCH_CPU_HYPERVISOR BIT_ULL(CPU_FEATURE_HYPERVISOR)
-#if !defined(__ASSEMBLY__) -#ifdef CONFIG_HOTPLUG_CPU -extern void arch_unregister_cpu(int); -#endif -#endif /* ! __ASSEMBLY__ */ - #endif /* _ASM_CPU_H */ diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h index 207e48dabfd0..580ada2564bb 100644 --- a/arch/x86/include/asm/cpu.h +++ b/arch/x86/include/asm/cpu.h @@ -24,7 +24,6 @@ static inline void prefill_possible_map(void) {} #endif /* CONFIG_SMP */
#ifdef CONFIG_HOTPLUG_CPU -extern void arch_unregister_cpu(int); extern void start_cpu0(void); #ifdef CONFIG_DEBUG_HOTPLUG_CPU0 extern int _debug_hotplug_cpu(int cpu, int action); diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c index 0ba646022a5e..bc2ce8c7f383 100644 --- a/drivers/base/cpu.c +++ b/drivers/base/cpu.c @@ -498,6 +498,11 @@ int __weak arch_register_cpu(int cpu) { return register_cpu(&per_cpu(cpu_devices, cpu), cpu); } + +void __weak arch_unregister_cpu(int num) +{ + unregister_cpu(&per_cpu(cpu_devices, num)); +} #endif
static void __init cpu_dev_register_generic(void)
LoongArch provides its own arch_unregister_cpu(). This clears the hotpluggable flag, then unregisters the CPU.
It isn't necessary to clear the hotpluggable flag when unregistering a cpu. unregister_cpu() writes NULL to the percpu cpu_sys_devices pointer, meaning cpu_is_hotpluggable() will return false, as get_cpu_device() has returned NULL.
Remove arch_unregister_cpu() and use the __weak version.
Signed-off-by: James Morse james.morse@arm.com --- arch/loongarch/kernel/topology.c | 9 --------- 1 file changed, 9 deletions(-)
diff --git a/arch/loongarch/kernel/topology.c b/arch/loongarch/kernel/topology.c index bfc1f199d07b..64798bdbef8c 100644 --- a/arch/loongarch/kernel/topology.c +++ b/arch/loongarch/kernel/topology.c @@ -20,13 +20,4 @@ int arch_register_cpu(int cpu) return ret; } EXPORT_SYMBOL(arch_register_cpu); - -void arch_unregister_cpu(int cpu) -{ - struct cpu *c = &per_cpu(cpu_devices, cpu); - - c->hotpluggable = 0; - unregister_cpu(c); -} -EXPORT_SYMBOL(arch_unregister_cpu); #endif
ACPI identifies CPUs by UID. get_cpu_for_acpi_id() maps the ACPI UID to the linux CPU number.
The helper to retrieve this mapping is only available in arm64's numa code.
Move it to live next to get_acpi_id_for_cpu().
Signed-off-by: James Morse james.morse@arm.com --- arch/arm64/include/asm/acpi.h | 11 +++++++++++ arch/arm64/kernel/acpi_numa.c | 11 ----------- 2 files changed, 11 insertions(+), 11 deletions(-)
diff --git a/arch/arm64/include/asm/acpi.h b/arch/arm64/include/asm/acpi.h index bd68e1b7f29f..0d1da93a5bad 100644 --- a/arch/arm64/include/asm/acpi.h +++ b/arch/arm64/include/asm/acpi.h @@ -97,6 +97,17 @@ static inline u32 get_acpi_id_for_cpu(unsigned int cpu) return acpi_cpu_get_madt_gicc(cpu)->uid; }
+static inline int get_cpu_for_acpi_id(u32 uid) +{ + int cpu; + + for (cpu = 0; cpu < nr_cpu_ids; cpu++) + if (uid == get_acpi_id_for_cpu(cpu)) + return cpu; + + return -EINVAL; +} + static inline void arch_fix_phys_package_id(int num, u32 slot) { } void __init acpi_init_cpus(void); int apei_claim_sea(struct pt_regs *regs); diff --git a/arch/arm64/kernel/acpi_numa.c b/arch/arm64/kernel/acpi_numa.c index e51535a5f939..0c036a9a3c33 100644 --- a/arch/arm64/kernel/acpi_numa.c +++ b/arch/arm64/kernel/acpi_numa.c @@ -34,17 +34,6 @@ int __init acpi_numa_get_nid(unsigned int cpu) return acpi_early_node_map[cpu]; }
-static inline int get_cpu_for_acpi_id(u32 uid) -{ - int cpu; - - for (cpu = 0; cpu < nr_cpu_ids; cpu++) - if (uid == get_acpi_id_for_cpu(cpu)) - return cpu; - - return -EINVAL; -} - static int __init acpi_parse_gicc_pxm(union acpi_subtable_headers *header, const unsigned long end) {
Add the new flag field to the MADT's GICC structure.
'Online Capable' indicates a disabled CPU can be enabled later.
Signed-off-by: James Morse james.morse@arm.com --- This patch probable needs to go via the upstream acpica project, but is included here so the feature can be testd. --- include/acpi/actbl2.h | 1 + 1 file changed, 1 insertion(+)
diff --git a/include/acpi/actbl2.h b/include/acpi/actbl2.h index 655102bc6d14..aad34fc3a845 100644 --- a/include/acpi/actbl2.h +++ b/include/acpi/actbl2.h @@ -1014,6 +1014,7 @@ struct acpi_madt_generic_interrupt { /* ACPI_MADT_ENABLED (1) Processor is usable if set */ #define ACPI_MADT_PERFORMANCE_IRQ_MODE (1<<1) /* 01: Performance Interrupt Mode */ #define ACPI_MADT_VGIC_IRQ_MODE (1<<2) /* 02: VGIC Maintenance Interrupt mode */ +#define ACPI_MADT_GICC_CPU_CAPABLE (1<<3) /* 03: CPU is online capable */
/* 12: Generic Distributor (ACPI 5.0 + ACPI 6.0 changes) */
ACPI, irqchip and the architecture code all inspect the MADT enabled bit for a GICC entry in the MADT.
The addition of an 'online capable' bit means all these sites need updating.
Move the current checks behind a helper to make future updates easier.
Signed-off-by: James Morse james.morse@arm.com --- arch/arm64/kernel/smp.c | 2 +- drivers/acpi/processor_core.c | 2 +- drivers/irqchip/irq-gic-v3.c | 10 ++++------ include/linux/acpi.h | 5 +++++ 4 files changed, 11 insertions(+), 8 deletions(-)
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index ffc5d76cf695..5669b013c2b7 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -525,7 +525,7 @@ acpi_map_gic_cpu_interface(struct acpi_madt_generic_interrupt *processor) { u64 hwid = processor->arm_mpidr;
- if (!(processor->flags & ACPI_MADT_ENABLED)) { + if (!acpi_gicc_is_usable(processor)) { pr_debug("skipping disabled CPU entry with 0x%llx MPIDR\n", hwid); return; } diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c index 2ac48cda5b20..1ba273622faa 100644 --- a/drivers/acpi/processor_core.c +++ b/drivers/acpi/processor_core.c @@ -90,7 +90,7 @@ static int map_gicc_mpidr(struct acpi_subtable_header *entry, struct acpi_madt_generic_interrupt *gicc = container_of(entry, struct acpi_madt_generic_interrupt, header);
- if (!(gicc->flags & ACPI_MADT_ENABLED)) + if (!acpi_gicc_is_usable(gicc)) return -ENODEV;
/* device_declaration means Device object in DSDT, in the diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index 262658fd5f9e..5cf2470c6258 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -2188,8 +2188,7 @@ gic_acpi_parse_madt_gicc(union acpi_subtable_headers *header, u32 size = reg == GIC_PIDR2_ARCH_GICv4 ? SZ_64K * 4 : SZ_64K * 2; void __iomem *redist_base;
- /* GICC entry which has !ACPI_MADT_ENABLED is not unusable so skip */ - if (!(gicc->flags & ACPI_MADT_ENABLED)) + if (!acpi_gicc_is_usable(gicc)) return 0;
redist_base = ioremap(gicc->gicr_base_address, size); @@ -2239,7 +2238,7 @@ static int __init gic_acpi_match_gicc(union acpi_subtable_headers *header, * If GICC is enabled and has valid gicr base address, then it means * GICR base is presented via GICC */ - if ((gicc->flags & ACPI_MADT_ENABLED) && gicc->gicr_base_address) { + if (acpi_gicc_is_usable(gicc) && gicc->gicr_base_address) { acpi_data.enabled_rdists++; return 0; } @@ -2248,7 +2247,7 @@ static int __init gic_acpi_match_gicc(union acpi_subtable_headers *header, * It's perfectly valid firmware can pass disabled GICC entry, driver * should not treat as errors, skip the entry instead of probe fail. */ - if (!(gicc->flags & ACPI_MADT_ENABLED)) + if (!acpi_gicc_is_usable(gicc)) return 0;
return -ENODEV; @@ -2307,8 +2306,7 @@ static int __init gic_acpi_parse_virt_madt_gicc(union acpi_subtable_headers *hea int maint_irq_mode; static int first_madt = true;
- /* Skip unusable CPUs */ - if (!(gicc->flags & ACPI_MADT_ENABLED)) + if (!acpi_gicc_is_usable(gicc)) return 0;
maint_irq_mode = (gicc->flags & ACPI_MADT_VGIC_IRQ_MODE) ? diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 34ce32126fd9..358d9b971de8 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -267,6 +267,11 @@ acpi_table_parse_cedt(enum acpi_cedt_type id, int acpi_parse_mcfg (struct acpi_table_header *header); void acpi_table_print_madt_entry (struct acpi_subtable_header *madt);
+static inline bool acpi_gicc_is_usable(struct acpi_madt_generic_interrupt *gicc) +{ + return (gicc->flags & ACPI_MADT_ENABLED); +} + /* the following numa functions are architecture-dependent */ void acpi_numa_slit_init (struct acpi_table_slit *slit);
gic_acpi_match_gicc() is only called via gic_acpi_count_gicr_regions(). It should only count the number of enabled redistributors, but it also tries to sanity check the GICC entry, currently returning an error if the Enabled bit is set, but the gicr_base_address is zero.
Adding support for the online-capable bit to the sanity check complictes it, for no benefit. The existing check implicitly depends on gic_acpi_count_gicr_regions() previous failing to find any GICR regions (as it is valid to have gicr_base_address of zero if the redistributors are described via a GICR entry).
Instead of complicating the check, remove it. Failures that happen at this point cause the irqchip not to register, meaning no irqs can be requested. The kernel grinds to a panic() pretty quickly.
Without the check, MADT tables that exhibit this problem are still caught by gic_populate_rdist(), which helpfully also prints what went wrong: | CPU4: mpidr 100 has no re-distributor!
Signed-off-by: James Morse james.morse@arm.com --- drivers/irqchip/irq-gic-v3.c | 18 ++++++------------ 1 file changed, 6 insertions(+), 12 deletions(-)
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index 5cf2470c6258..bd3dfea76fe9 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -2236,21 +2236,15 @@ static int __init gic_acpi_match_gicc(union acpi_subtable_headers *header,
/* * If GICC is enabled and has valid gicr base address, then it means - * GICR base is presented via GICC + * GICR base is presented via GICC. The redistributor is only known to + * be accessible if the GICC is marked as enabled. If this bit is not + * set, we'd need to add the redistributor at runtime, which isn't + * supported. */ - if (acpi_gicc_is_usable(gicc) && gicc->gicr_base_address) { + if (gicc->flags & ACPI_MADT_ENABLED && gicc->gicr_base_address) acpi_data.enabled_rdists++; - return 0; - }
- /* - * It's perfectly valid firmware can pass disabled GICC entry, driver - * should not treat as errors, skip the entry instead of probe fail. - */ - if (!acpi_gicc_is_usable(gicc)) - return 0; - - return -ENODEV; + return 0; }
static int __init gic_acpi_count_gicr_regions(void)
To support virtual CPU hotplug, ACPI has added an 'online capable' bit to the MADT GICC entries. This indicates a disabled CPU entry may not be possible to online via PSCI until firmware has set enabled bit in _STA.
What about the redistributor in the GICC entry? ACPI doesn't want to say. Assume the worst: When a redistributor is described in the GICC entry, but the entry is marked as disabled at boot, assume the redistributor is inaccessible.
The GICv3 driver doesn't support late online of redistributors, so this means the corresponding CPU can't be brought online either. Clear the possible and present bits.
Systems that want CPU hotplug in a VM can ensure their redistributors are always-on, and describe them that way with a GICR entry in the MADT.
When mapping redistributors found via GICC entries, handle the case where the arch code believes the CPU is present and possible, but it does not have an accessible redistributor. Print a warning and clear the present and possible bits.
Signed-off-by: James Morse james.morse@arm.com ---- Disabled but online-capable CPUs cause this message to be printed if their redistributors are described via GICC: | GICv3: CPU 3's redistributor is inaccessible: this CPU can't be brought online
If ACPI's _STA tries to make the cpu present later, this message is printed: | Changing CPU present bit is not supported --- drivers/irqchip/irq-gic-v3.c | 14 ++++++++++++++ include/linux/acpi.h | 3 ++- 2 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index bd3dfea76fe9..1fc7d4778434 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -2186,11 +2186,25 @@ gic_acpi_parse_madt_gicc(union acpi_subtable_headers *header, (struct acpi_madt_generic_interrupt *)header; u32 reg = readl_relaxed(acpi_data.dist_base + GICD_PIDR2) & GIC_PIDR2_ARCH_MASK; u32 size = reg == GIC_PIDR2_ARCH_GICv4 ? SZ_64K * 4 : SZ_64K * 2; + int cpu = get_cpu_for_acpi_id(gicc->uid); void __iomem *redist_base;
if (!acpi_gicc_is_usable(gicc)) return 0;
+ /* + * Capable but disabled CPUs can be brought online later. What about + * the redistributor? ACPI doesn't want to say! + * Virtual hotplug systems can use the MADT's "always-on" GICR entries. + * Otherwise, prevent such CPUs from being brought online. + */ + if (!(gicc->flags & ACPI_MADT_ENABLED)) { + pr_warn_once("CPU %u's redistributor is inaccessible: this CPU can't be brought online\n", cpu); + set_cpu_present(cpu, false); + set_cpu_possible(cpu, false); + return 0; + } + redist_base = ioremap(gicc->gicr_base_address, size); if (!redist_base) return -ENOMEM; diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 358d9b971de8..81f5df4a536a 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -269,7 +269,8 @@ void acpi_table_print_madt_entry (struct acpi_subtable_header *madt);
static inline bool acpi_gicc_is_usable(struct acpi_madt_generic_interrupt *gicc) { - return (gicc->flags & ACPI_MADT_ENABLED); + return ((gicc->flags & ACPI_MADT_ENABLED || + gicc->flags & ACPI_MADT_GICC_CPU_CAPABLE)); }
/* the following numa functions are architecture-dependent */
Hi James,
From: James Morse [mailto:james.morse@arm.com] Sent: Wednesday, August 31, 2022 12:09 PM To: linaro-open-discussions@op-lists.linaro.org Cc: Salil Mehta salil.mehta@huawei.com; james.morse@arm.com; lorenzo.pieralisi@linaro.org; Jean-Philippe Brucker jean-philippe@linaro.org Subject: [RFC PATCH v0.1 20/25] irqchip/gic-v3: Add support for ACPI's disabled but 'online capable' CPUs
To support virtual CPU hotplug, ACPI has added an 'online capable' bit to the MADT GICC entries. This indicates a disabled CPU entry may not be possible to online via PSCI until firmware has set enabled bit in _STA.
What about the redistributor in the GICC entry? ACPI doesn't want to say. Assume the worst: When a redistributor is described in the GICC entry, but the entry is marked as disabled at boot, assume the redistributor is inaccessible.
The GICv3 driver doesn't support late online of redistributors, so this means the corresponding CPU can't be brought online either. Clear the possible and present bits.
Systems that want CPU hotplug in a VM can ensure their redistributors are always-on, and describe them that way with a GICR entry in the MADT.
When mapping redistributors found via GICC entries, handle the case where the arch code believes the CPU is present and possible, but it does not have an accessible redistributor. Print a warning and clear the present and possible bits.
Signed-off-by: James Morse james.morse@arm.com
Disabled but online-capable CPUs cause this message to be printed if their redistributors are described via GICC: | GICv3: CPU 3's redistributor is inaccessible: this CPU can't be brought online
If ACPI's _STA tries to make the cpu present later, this message is printed: | Changing CPU present bit is not supported
[...]
redist_base = ioremap(gicc->gicr_base_address, size); if (!redist_base) return -ENOMEM; diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 358d9b971de8..81f5df4a536a 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -269,7 +269,8 @@ void acpi_table_print_madt_entry (struct acpi_subtable_header *madt);
static inline bool acpi_gicc_is_usable(struct acpi_madt_generic_interrupt *gicc) {
- return (gicc->flags & ACPI_MADT_ENABLED);
- return ((gicc->flags & ACPI_MADT_ENABLED ||
gicc->flags & ACPI_MADT_GICC_CPU_CAPABLE));
This does not looks right to me.
As per the ACPI specification 6.5 Draft-12Aug2022" "Table 5.37: GICC CPU Interface Flags", below are the possible combinations of the existing "Enabled" Bit with *new* "online-capable" Bit:
Table 5.37: GICC CPU Interface Flags +---------+---------------------------------------+ | | Enabled Bit(0) | | +-------+---------------+---------------+ | | Bool | False | True | +---------+-------+---------------+---------------+ | | | | | | | False | | | | | | CPU is | CPU is | | | | Not Usable | Ready to Use | | Online | | | | | Capable +-------+---------------+---------------+ | Bit(3) | | | | | | | CPU is | *Invalid* | | | True | Online-Capable| Combination | | | | | | | | | | | +---------+-------+---------------+---------------+
Description:
Enabled Bit(0): If this bit is set, the processor is ready for use. If this bit is clear and the Online Capable bit is set, the system supports enabling this processor during OS runtime. If this bit is clear and the Online Capable bit is also clear, this processor is unusable, and the operating system support will not attempt to use it.
Online-capable Bit(3): The information conveyed by this bit depends on the value of the Enabled bit. If the Enabled bit is set, this bit is reserved and must be zero. Otherwise, if this bit is set, the system supports enabling this processor later during OS runtime.
Above check will return "cpu is usable" even for Enabled=true && online-capable=true (INVALID Case)?
CPU SHOULD be unusable for the kernel for below cases I guess: 1. Enabled=false && online-capable=false (Disabled Case)? 2. Enabled=true && online-capable=true (INVALID Case)?
Please correct my understanding if I am wrong here?
Thanks Salil.
On Thu, 1 Sep 2022 21:18:48 +0100 Salil Mehta salil.mehta@huawei.com wrote:
Hi James,
From: James Morse [mailto:james.morse@arm.com] Sent: Wednesday, August 31, 2022 12:09 PM To: linaro-open-discussions@op-lists.linaro.org Cc: Salil Mehta salil.mehta@huawei.com; james.morse@arm.com; lorenzo.pieralisi@linaro.org; Jean-Philippe Brucker jean-philippe@linaro.org Subject: [RFC PATCH v0.1 20/25] irqchip/gic-v3: Add support for ACPI's disabled but 'online capable' CPUs
To support virtual CPU hotplug, ACPI has added an 'online capable' bit to the MADT GICC entries. This indicates a disabled CPU entry may not be possible to online via PSCI until firmware has set enabled bit in _STA.
What about the redistributor in the GICC entry? ACPI doesn't want to say. Assume the worst: When a redistributor is described in the GICC entry, but the entry is marked as disabled at boot, assume the redistributor is inaccessible.
The GICv3 driver doesn't support late online of redistributors, so this means the corresponding CPU can't be brought online either. Clear the possible and present bits.
Systems that want CPU hotplug in a VM can ensure their redistributors are always-on, and describe them that way with a GICR entry in the MADT.
When mapping redistributors found via GICC entries, handle the case where the arch code believes the CPU is present and possible, but it does not have an accessible redistributor. Print a warning and clear the present and possible bits.
Signed-off-by: James Morse james.morse@arm.com
Disabled but online-capable CPUs cause this message to be printed if their redistributors are described via GICC: | GICv3: CPU 3's redistributor is inaccessible: this CPU can't be brought online
If ACPI's _STA tries to make the cpu present later, this message is printed: | Changing CPU present bit is not supported
[...]
redist_base = ioremap(gicc->gicr_base_address, size); if (!redist_base) return -ENOMEM; diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 358d9b971de8..81f5df4a536a 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -269,7 +269,8 @@ void acpi_table_print_madt_entry (struct acpi_subtable_header *madt);
static inline bool acpi_gicc_is_usable(struct acpi_madt_generic_interrupt *gicc) {
- return (gicc->flags & ACPI_MADT_ENABLED);
- return ((gicc->flags & ACPI_MADT_ENABLED ||
gicc->flags & ACPI_MADT_GICC_CPU_CAPABLE));
This does not looks right to me.
As per the ACPI specification 6.5 Draft-12Aug2022" "Table 5.37: GICC CPU Interface Flags", below are the possible combinations of the existing "Enabled" Bit with *new* "online-capable" Bit:
Note the ACPI 6.5 specification is public: https://uefi.org/specifications so let's refer to that.
Table 5.37: GICC CPU Interface Flags
+---------+---------------------------------------+ | | Enabled Bit(0) | | +-------+---------------+---------------+ | | Bool | False | True | +---------+-------+---------------+---------------+ | | | | | | | False | | | | | | CPU is | CPU is | | | | Not Usable | Ready to Use | | Online | | | | | Capable +-------+---------------+---------------+ | Bit(3) | | | | | | | CPU is | *Invalid* | | | True | Online-Capable| Combination | | | | | | | | | | | +---------+-------+---------------+---------------+
Description:
Enabled Bit(0): If this bit is set, the processor is ready for use. If this bit is clear and the Online Capable bit is set, the system supports enabling this processor during OS runtime. If this bit is clear and the Online Capable bit is also clear, this processor is unusable, and the operating system support will not attempt to use it.
Online-capable Bit(3): The information conveyed by this bit depends on the value of the Enabled bit. If the Enabled bit is set, this bit is reserved and must be zero. Otherwise, if this bit is set, the system supports enabling this processor later during OS runtime.
Above check will return "cpu is usable" even for Enabled=true && online-capable=true (INVALID Case)?
CPU SHOULD be unusable for the kernel for below cases I guess:
- Enabled=false && online-capable=false (Disabled Case)?
- Enabled=true && online-capable=true (INVALID Case)?
Please correct my understanding if I am wrong here?
Thanks Salil.
Hi Salil,
On 9/2/22 10:12, Jonathan Cameron wrote:
On Thu, 1 Sep 2022 21:18:48 +0100 Salil Mehta salil.mehta@huawei.com wrote:
To support virtual CPU hotplug, ACPI has added an 'online capable' bit to the MADT GICC entries. This indicates a disabled CPU entry may not be possible to online via PSCI until firmware has set enabled bit in _STA.
What about the redistributor in the GICC entry? ACPI doesn't want to say. Assume the worst: When a redistributor is described in the GICC entry, but the entry is marked as disabled at boot, assume the redistributor is inaccessible.
The GICv3 driver doesn't support late online of redistributors, so this means the corresponding CPU can't be brought online either. Clear the possible and present bits.
Systems that want CPU hotplug in a VM can ensure their redistributors are always-on, and describe them that way with a GICR entry in the MADT.
When mapping redistributors found via GICC entries, handle the case where the arch code believes the CPU is present and possible, but it does not have an accessible redistributor. Print a warning and clear the present and possible bits.
diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 358d9b971de8..81f5df4a536a 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -269,7 +269,8 @@ void acpi_table_print_madt_entry (struct acpi_subtable_header *madt);
static inline bool acpi_gicc_is_usable(struct acpi_madt_generic_interrupt *gicc) {
- return (gicc->flags & ACPI_MADT_ENABLED);
- return ((gicc->flags & ACPI_MADT_ENABLED ||
gicc->flags & ACPI_MADT_GICC_CPU_CAPABLE));
This does not looks right to me.
As per the ACPI specification 6.5 Draft-12Aug2022" "Table 5.37: GICC CPU Interface Flags", below are the possible combinations of the existing "Enabled" Bit with *new* "online-capable" Bit:
Note the ACPI 6.5 specification is public: https://uefi.org/specifications so let's refer to that.
Table 5.37: GICC CPU Interface Flags +---------+---------------------------------------+ | | Enabled Bit(0) | | +-------+---------------+---------------+ | | Bool | False | True | +---------+-------+---------------+---------------+ | | | | | | | False | | | | | | CPU is | CPU is | | | | Not Usable | Ready to Use | | Online | | | | | Capable +-------+---------------+---------------+ | Bit(3) | | | | | | | CPU is | *Invalid* | | | True | Online-Capable| Combination | | | | | | | | | | | +---------+-------+---------------+---------------+
Description:
Enabled Bit(0): If this bit is set, the processor is ready for use. If this bit is clear and the Online Capable bit is set, the system supports enabling this processor during OS runtime. If this bit is clear and the Online Capable bit is also clear, this processor is unusable, and the operating system support will not attempt to use it.
Online-capable Bit(3): The information conveyed by this bit depends on the value of the Enabled bit. If the Enabled bit is set, this bit is reserved and must be zero. Otherwise, if this bit is set, the system supports enabling this processor later during OS runtime.
Above check will return "cpu is usable" even for Enabled=true && online-capable=true (INVALID Case)?
Yes. Why should linux sanity check the ACPI tables?
In this case, trying to do this just results in harder to read code. What is the benefit? I very much doubt that combination can be used for anything that implies the CPU can't be used.
Thanks,
James
Hi James,
From: James Morse [mailto:james.morse@arm.com] Sent: Friday, September 2, 2022 10:21 AM To: Jonathan Cameron jonathan.cameron@huawei.com; Salil Mehta salil.mehta@huawei.com Cc: linaro-open-discussions@op-lists.linaro.org; lorenzo.pieralisi@linaro.org; Jean-Philippe Brucker jean-philippe@linaro.org; mehta.salil.lnk@gmail.com Subject: Re: [RFC PATCH v0.1 20/25] irqchip/gic-v3: Add support for ACPI's disabled but 'online capable' CPUs
Hi Salil,
On 9/2/22 10:12, Jonathan Cameron wrote:
On Thu, 1 Sep 2022 21:18:48 +0100 Salil Mehta salil.mehta@huawei.com wrote:
To support virtual CPU hotplug, ACPI has added an 'online capable' bit to the MADT GICC entries. This indicates a disabled CPU entry may not be possible to online via PSCI until firmware has set enabled bit in _STA.
What about the redistributor in the GICC entry? ACPI doesn't want to say. Assume the worst: When a redistributor is described in the GICC entry, but the entry is marked as disabled at boot, assume the redistributor is inaccessible.
The GICv3 driver doesn't support late online of redistributors, so this means the corresponding CPU can't be brought online either. Clear the possible and present bits.
Systems that want CPU hotplug in a VM can ensure their redistributors are always-on, and describe them that way with a GICR entry in the MADT.
When mapping redistributors found via GICC entries, handle the case where the arch code believes the CPU is present and possible, but it does not have an accessible redistributor. Print a warning and clear the present and possible bits.
diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 358d9b971de8..81f5df4a536a 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -269,7 +269,8 @@ void acpi_table_print_madt_entry (struct acpi_subtable_header *madt);
static inline bool acpi_gicc_is_usable(struct
acpi_madt_generic_interrupt
*gicc) {
- return (gicc->flags & ACPI_MADT_ENABLED);
- return ((gicc->flags & ACPI_MADT_ENABLED ||
gicc->flags & ACPI_MADT_GICC_CPU_CAPABLE));
This does not looks right to me.
As per the ACPI specification 6.5 Draft-12Aug2022" "Table 5.37: GICC CPU Interface Flags", below are the possible combinations of the existing "Enabled" Bit with *new* "online-capable" Bit:
Note the ACPI 6.5 specification is public: https://uefi.org/specifications so let's refer to that.
Table 5.37: GICC CPU Interface Flags +---------+---------------------------------------+ | | Enabled Bit(0) | | +-------+---------------+---------------+ | | Bool | False | True | +---------+-------+---------------+---------------+ | | | | | | | False | | | | | | CPU is | CPU is | | | | Not Usable | Ready to Use | | Online | | | | | Capable +-------+---------------+---------------+ | Bit(3) | | | | | | | CPU is | *Invalid* | | | True | Online-Capable| Combination | | | | | | | | | | | +---------+-------+---------------+---------------+
Description:
Enabled Bit(0): If this bit is set, the processor is ready for use. If this bit is clear and the Online Capable bit is set, the system supports enabling this processor during OS runtime. If this bit is clear and the Online Capable bit is also clear, this processor is unusable, and the operating system support will not attempt to use it.
Online-capable Bit(3): The information conveyed by this bit depends on the value of the Enabled bit. If the Enabled bit is set, this bit is reserved and must be zero. Otherwise, if this bit is set, the system supports enabling this processor later during OS runtime.
Above check will return "cpu is usable" even for Enabled=true && online-capable=true (INVALID Case)?
Yes. Why should linux sanity check the ACPI tables?
To ensure that other two fields which Linux wants to use are correct, which partly above code is already doing but there is an ambiguity.
In this case, trying to do this just results in harder to read code. What is the benefit?
This code is common to all ARM64 based platforms but BIOS are specific to platforms and sometimes can be buggy. This little extra piece of logic can really save lots of hours of that poor fellow working onsite who is trying to figure out why things are not working when there is no error being purged out in UEFI/Linux at bootup.
Usually, everything we do should be customer driven and should facilitate ease of doing business(here, deployment of particular software using vCPU Hotplug say microvms?)
In my humble opinion, above reason will always outweigh maintainability, though I agree that there is a balance to that as well. But then later is not exactly true in this particular case since amount of code being added is very trivial but has huge end user gains.
I very much doubt that combination can be used for anything that implies the CPU can't be used.
This is the precise ambiguity we can clear if we can put a harmless print to assist the code reviewer and the end user?
Thanks Salil.
Hi Salil,
On 02/09/2022 11:04, Salil Mehta wrote:
Above check will return "cpu is usable" even for Enabled=true && online-capable=true (INVALID Case)?
Yes. Why should linux sanity check the ACPI tables?
To ensure that other two fields which Linux wants to use are correct, which partly above code is already doing but there is an ambiguity.
In this case, trying to do this just results in harder to read code. What is the benefit?
This code is common to all ARM64 based platforms but BIOS are specific to platforms and sometimes can be buggy. This little extra piece of logic can really save lots of hours of that poor fellow working onsite who is trying to figure out why things are not working when there is no error being purged out in UEFI/Linux at bootup.
Usually, everything we do should be customer driven and should facilitate ease of doing business(here, deployment of particular software using vCPU Hotplug say microvms?)
In my humble opinion, above reason will always outweigh maintainability, though I agree that there is a balance to that as well. But then later is not exactly true in this particular case since amount of code being added is very trivial but has huge end user gains.
I very much doubt that combination can be used for anything that implies the CPU can't be used.
This is the precise ambiguity we can clear if we can put a harmless print to assist the code reviewer and the end user?
Both zero is valid - it means the CPU can't be used. I don't think we should print about disabled cpus that are found in the MADT.
If Enabled is set, 'Online Capable' is reserved. We don't know that it won't be allocated in the future. Once it is, all previous kernels will be printing a snotty message about this bit being set, even though it may have a valid meaning. I don't think linux should try to validate the ACPI tables.
I can change the check to logical-xor instead of logical-or, but its harder to read, and I don't think it buys us anything.
In the cases where there are problems with the table, e.g. the redistributors are described in the GICC not the GICR, (so the power-domain isn't known), a warning is printed to try and help in the scenario you describe. I think this is a different case as the OS making a decision not to use the information in the table.
Thanks,
James
Hi James,
I'm confused with that why set cpu possible flag to false when ACPI_MADT_ENABLED is not set. IMO, we should check further about the online_capable bit, once it is set, cpu possible flag should be set to true. I consider "possible" means cpu is not present but can be hot add later. Am I right?
Cheers, Jianyong
Hi Jianyong,
On 30/10/2022 08:54, jianyong.wu--- via Linaro-open-discussions wrote:
I'm confused with that why set cpu possible flag to false when ACPI_MADT_ENABLED is not set.
Without any context .... are you referring to the code in gic_acpi_parse_madt_gicc()?
This is for CPUs that have a MADT:GICC:GICR entry instead of having their redistributor described by a MADT:GICR entry. The ACPI spec says entries in the MADT:GICR are in an always-on power domain, so the irqchip driver can always access them. It doesn't say anything about when the redistributor is accessible if the MADT:GICC:GICR field is used instead. This code is assuming the worst: that the redistributor is not online-capable, and any access by the irqchip driver will lead to an external-abort, or a lockup. Linux won't try to use CPUs that look like this as it can't probe the redistributor features at boot. This is why the CPUs are removed from the possible and present masks.
If you want to support this 'hotplug' mechanism, you need to use the MADT:GICR description of the redistributors because it is defined as always-on.
I tried to get the spec people to start defining the power management in this area, but without a physical piece of hardware that does this: its not going to be standardised.
IMO, we should check further about the online_capable bit, once it is set, cpu possible flag should be set to true.
I consider "possible" means cpu is not present but can be hot add later. Am I right?
Possible means the CPU can be present. It might be present right now.
But present/possible aren't relevant here. This is a different solution to the problem.
We can't invent cpu hotplug for arm64 without a physical machine that does it. As software engineers, we can't know the constraints the hardware designers will have. To make it possible to support physical cpu hotplug machines in the future, we need to add support for this thing in such a way that it doesn't mean stable kernels are broken on that hardware when it comes along.
This series lets EL3/the-VMM manage the CPU online/offline policy from outside the guest. This looks like cpu-hotplug to user-space, but its actually doing something different.
_all_ of arm64's ACPI tables are based on the assumption that everything described is present. We can't undermine this without creating problems. On a VM, everything really is present all the time.
Thanks,
James IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Thanks James. I learn a lot from your explanation here. But if online capable bit in GICC can't ensure the related CPU can be bring up, as it can't ensure GICR exist, what benefit we get from this bit? It seems the only way to enable a disabled CPU is that add GICR entry into MADT not by this bit. And how to understand the meaning conveyed by this bit that it indicates the related CPU can be enabled during OS runtime?
Hi Jianyong,
On 03/11/2022 11:40, jianyong.wu--- via Linaro-open-discussions wrote:
Thanks James. I learn a lot from your explanation here. But if online capable bit in GICC can't ensure the related CPU can be bring up, as it can't ensure GICR exist, what benefit we get from this bit?
Linux doesn't gain anything from this bit. It's needed for another operating system that can't tolerate errors being returned by PSCI.
Linux has to read the bit from the MADT:GICC so that firmware can have one MADT that works for Linux and this other operating system.
Clearing the Enabled bit from the MADT:GICC created a new problem, that the MADT:GICC:GICR isn't obviously accessible during boot. A side effect of the ACPI changes is that now we need to use the MADT:GICR entries to describe the redistributors.
This isn't a problem for a virtual machine as the redistributor is emulated by the hypervisor, and really is always present and always-on.
It seems the only way to enable a disabled CPU is that add GICR entry into MADT not by this bit. And how to understand the meaning conveyed by this bit that it indicates the related CPU can be enabled during OS runtime?
Describing the redistributors with an MADT:GICR entry is a prerequisite yes.
We have a hard requirement from the irqchip maintainer that he won't allow code that brings a redistributor online after boot - unless there is physical hardware that is doing that.
This is because linux accesses all the redistributors during boot to find the common set of features.
We don't know whether the GICR in a MADT:GICC:GICR is accessible if the CPU isn't marked as enabled, and isn't online. On real hardware that has CPUs that aren't present, it won't be accessible. If linux accesses the redistributor on this hardware, it would either crash due to an external-abort, or lockup. This would be very upsetting for users of stable-kernels on such hardware, so this series makes those CPUs impossible.
This isn't a problem for virtual machines, as the kernel emulates the redistributors, and they really are always present and always on. You can use the MADT:GICR to describe them.
Future physical hardware that does this won't be able to add CPUs that also need a redistributor until the ACPI spec is cleaned up to say whether the MADT:GICC:GICR is accessible - and linux supports bringing a redistributor online after boot, which it doesn't today.
As to what the 'online capable' bit means, it means those CPUs could return PSCI_DENIED. The distinction is only needed for operating systems that choke on errors from PSCI. Linux doesn't care, (you just get some weird messages during boot), hence this bit isn't relevant outside the irqchip code.
(does this answer your question!?)
Thanks,
James
Hi James,
Yes, I'm fully understand about it and have no other question now. Thanks for your great explanation!
Thanks Jianyong
From: Jean-Philippe Brucker jean-philippe@linaro.org
When a CPU is marked as disabled, but online capable in the MADT, PSCI applies some firmware policy to control when it can be brought online. PSCI returns DENIED to a CPU_ON request if this is not currently permitted. The OS can learn the current policy from the _STA enabled bit.
Handle the PSCI DENIED return code gracefully instead of printing an error.
Signed-off-by: Jean-Philippe Brucker jean-philippe@linaro.org [ morse: Rewrote commit message ] Signed-off-by: James Morse james.morse@arm.com --- arch/arm64/kernel/psci.c | 2 +- arch/arm64/kernel/smp.c | 3 ++- drivers/firmware/psci/psci.c | 2 ++ 3 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kernel/psci.c b/arch/arm64/kernel/psci.c index ab7f4c476104..8ce9afd6c535 100644 --- a/arch/arm64/kernel/psci.c +++ b/arch/arm64/kernel/psci.c @@ -40,7 +40,7 @@ static int cpu_psci_cpu_boot(unsigned int cpu) { phys_addr_t pa_secondary_entry = __pa_symbol(function_nocfi(secondary_entry)); int err = psci_ops.cpu_on(cpu_logical_map(cpu), pa_secondary_entry); - if (err) + if (err && err != -EPROBE_DEFER) pr_err("failed to boot CPU%d (%d)\n", cpu, err);
return err; diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index 5669b013c2b7..ea031641545d 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -125,7 +125,8 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle) /* Now bring the CPU into our world */ ret = boot_secondary(cpu, idle); if (ret) { - pr_err("CPU%u: failed to boot: %d\n", cpu, ret); + if (ret != -EPROBE_DEFER) + pr_err("CPU%u: failed to boot: %d\n", cpu, ret); return ret; }
diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c index cfb448eabdaa..a93ef45adebc 100644 --- a/drivers/firmware/psci/psci.c +++ b/drivers/firmware/psci/psci.c @@ -209,6 +209,8 @@ static int __psci_cpu_on(u32 fn, unsigned long cpuid, unsigned long entry_point) int err;
err = invoke_psci_fn(fn, cpuid, entry_point, 0); + if (err == PSCI_RET_DENIED) + return -EPROBE_DEFER; return psci_to_linux_errno(err); }
acpi_processor_get_info() registers all present CPUs. Registering a CPU is what creates the sysfs entries and triggers the udev notifications.
arm64 virtual machines that support 'virtual cpu hotplug' use the enabled bit to indicate whether the CPU can be brought online, as the existing ACPI tables require all hardware to be described and present.
If firmware describes a CPU as present, but disabled, skip the registration. Such CPUs are present, but can't be brought online for whatever reason. (e.g. firmware/hypervisor policy).
Once firmware sets the enabled bit, the CPU can be registered and brought online by user-space. Online CPUs, or CPUs that are missing an _STA method must always be registered.
Signed-off-by: James Morse james.morse@arm.com --- drivers/acpi/acpi_processor.c | 31 ++++++++++++++++++++++++++++++- 1 file changed, 30 insertions(+), 1 deletion(-)
diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c index 1bd6e4b8ab66..42521d89c378 100644 --- a/drivers/acpi/acpi_processor.c +++ b/drivers/acpi/acpi_processor.c @@ -194,6 +194,32 @@ static int acpi_processor_make_present(struct acpi_processor *pr) return ret; }
+static int acpi_processor_make_enabled(struct acpi_processor *pr) +{ + unsigned long long sta; + acpi_status status; + bool present, enabled; + + if (!acpi_has_method(pr->handle, "_STA")) + return arch_register_cpu(pr->id); + + status = acpi_evaluate_integer(pr->handle, "_STA", NULL, &sta); + if (ACPI_FAILURE(status)) + return -ENODEV; + + present = sta & ACPI_STA_DEVICE_PRESENT; + enabled = sta & ACPI_STA_DEVICE_ENABLED; + + if (cpu_online(pr->id) && (!present || !enabled)) { + pr_err_once(FW_BUG "CPU %u is online, but described as not present or disabled!\n", pr->id); + add_taint(TAINT_FIRMWARE_WORKAROUND, LOCKDEP_STILL_OK); + } else if (!present || !enabled) { + return -ENODEV; + } + + return arch_register_cpu(pr->id); +} + static int acpi_processor_get_info(struct acpi_device *device) { union acpi_object object = { 0 }; @@ -271,7 +297,7 @@ static int acpi_processor_get_info(struct acpi_device *device) }
if (!invalid_logical_cpuid(pr->id) && cpu_present(pr->id)) { - int ret = arch_register_cpu(pr->id); + int ret = acpi_processor_make_enabled(pr);
if (ret) return ret; @@ -479,6 +505,9 @@ static void acpi_processor_remove(struct acpi_device *device) acpi_processor_make_not_present(device); return; } + + if (cpu_present(pr->id) && !(sta & ACPI_STA_DEVICE_ENABLED)) + arch_unregister_cpu(pr->id); }
#ifdef CONFIG_X86
Hi James,
From: James Morse [mailto:james.morse@arm.com] Sent: Wednesday, August 31, 2022 12:09 PM To: linaro-open-discussions@op-lists.linaro.org Cc: Salil Mehta salil.mehta@huawei.com; james.morse@arm.com; lorenzo.pieralisi@linaro.org; Jean-Philippe Brucker jean-philippe@linaro.org Subject: [RFC PATCH v0.1 22/25] ACPI: add support to register CPUs based on the _STA enabled bit
acpi_processor_get_info() registers all present CPUs. Registering a CPU is what creates the sysfs entries and triggers the udev notifications.
arm64 virtual machines that support 'virtual cpu hotplug' use the enabled bit to indicate whether the CPU can be brought online, as the existing ACPI tables require all hardware to be described and present.
If firmware describes a CPU as present, but disabled, skip the registration. Such CPUs are present, but can't be brought online for whatever reason. (e.g. firmware/hypervisor policy).
Once firmware sets the enabled bit, the CPU can be registered and brought online by user-space. Online CPUs, or CPUs that are missing an _STA method must always be registered.
Signed-off-by: James Morse james.morse@arm.com
drivers/acpi/acpi_processor.c | 31 ++++++++++++++++++++++++++++++- 1 file changed, 30 insertions(+), 1 deletion(-)
diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c index 1bd6e4b8ab66..42521d89c378 100644 --- a/drivers/acpi/acpi_processor.c +++ b/drivers/acpi/acpi_processor.c @@ -194,6 +194,32 @@ static int acpi_processor_make_present(struct acpi_processor *pr) return ret; }
+static int acpi_processor_make_enabled(struct acpi_processor *pr) +{
- unsigned long long sta;
- acpi_status status;
- bool present, enabled;
- if (!acpi_has_method(pr->handle, "_STA"))
return arch_register_cpu(pr->id);
- status = acpi_evaluate_integer(pr->handle, "_STA", NULL, &sta);
- if (ACPI_FAILURE(status))
return -ENODEV;
- present = sta & ACPI_STA_DEVICE_PRESENT;
- enabled = sta & ACPI_STA_DEVICE_ENABLED;
- if (cpu_online(pr->id) && (!present || !enabled)) {
pr_err_once(FW_BUG "CPU %u is online, but described as not present or
disabled!\n", pr->id);
add_taint(TAINT_FIRMWARE_WORKAROUND, LOCKDEP_STILL_OK);
- } else if (!present || !enabled) {
return -ENODEV;
- }
This change and setting all possible cpus as *present* in smp_prepare_cpus() will always cause all present == possible in the guest kernel. I think we can avoid that by the trick which Jean-Phillipe exploited in his patch-set[1] sent earlier last year.
Please check Bit[2] in the ACPI specification: Spec Version: "Advanced Configuration and Power Interface (ACPI) Specification, Release 6.5, Section 6.3.7 _STA (Device Status)"
* Bit [0] - Set if the device is present. * Bit [1] - Set if the device is enabled and decoding its resources. * Bit [2] - Set if the device should be shown in the UI
File: linux/include/acpi/actypes.h /* Flags for _STA method */ #define ACPI_STA_DEVICE_PRESENT 0x01 #define ACPI_STA_DEVICE_ENABLED 0x02 #define ACPI_STA_DEVICE_UI 0x04---> (This bit)
_STA[0] & _STA[1] bits ensures that devices is present and enabled at the firmware level and OS can now decode its resources but if for some reason this device does not have to be made visible in the OS then _STA[2] bit could be *unset*. We can leverage this flag.
I guess we need a _STA[2]/UI bit check here before, [1] Registering the cpus and [2] Setting the present bit in __cpu_present_mask using set_cpu_present().
Both of above can co-exist in arch_{un}register_cpu().
This shall ensure that we correctly reflect only present vcpus to the linux kernel although the sizing and initialization of the GICC/GICR would have already happened for the complete set for possible vcpus i.e. the ones with
[1] _STA[0] is set & _STA[1] bit is set and [2] Either GICC_flag_Intf_Flag.Enabled set OR GICC_flag.online_capable set
so effectively we are only deferring populating the cpu present mask for the disabled cpus but which are now online capable(or Hotplug capable in future?)
Question: Q1: Current acpi_processor.c code is not using ACPI_STA_DEVICE_{ENABLED, UI} bits. Could it break other architecture if we use these bits but some of their legacy devices or firmware does not initialize these bits to their defaults?
I am trying to test this change. If you don't have any apprehensions then we can include this change (both at the kernel and qemu) as well. Thoughts?
Reference: [1] acpi/cpuhp: Allow hiding disabled CPUs for Arm https://jpbrucker.net/git/qemu/commit/?h=cpuhp/devel&id=5f12c581328bb2a8...
- return arch_register_cpu(pr->id);
+}
static int acpi_processor_get_info(struct acpi_device *device) { union acpi_object object = { 0 }; @@ -271,7 +297,7 @@ static int acpi_processor_get_info(struct acpi_device *device) }
if (!invalid_logical_cpuid(pr->id) && cpu_present(pr->id)) {
int ret = arch_register_cpu(pr->id);
int ret = acpi_processor_make_enabled(pr);
if (ret) return ret;
@@ -479,6 +505,9 @@ static void acpi_processor_remove(struct acpi_device *device) acpi_processor_make_not_present(device); return; }
- if (cpu_present(pr->id) && !(sta & ACPI_STA_DEVICE_ENABLED))
arch_unregister_cpu(pr->id);
}
#ifdef CONFIG_X86
2.30.2
Hi Salil,
On 09/09/2022 15:53, Salil Mehta wrote:
From: James Morse [mailto:james.morse@arm.com] Sent: Wednesday, August 31, 2022 12:09 PM To: linaro-open-discussions@op-lists.linaro.org Cc: Salil Mehta salil.mehta@huawei.com; james.morse@arm.com; lorenzo.pieralisi@linaro.org; Jean-Philippe Brucker jean-philippe@linaro.org Subject: [RFC PATCH v0.1 22/25] ACPI: add support to register CPUs based on the _STA enabled bit
acpi_processor_get_info() registers all present CPUs. Registering a CPU is what creates the sysfs entries and triggers the udev notifications.
arm64 virtual machines that support 'virtual cpu hotplug' use the enabled bit to indicate whether the CPU can be brought online, as the existing ACPI tables require all hardware to be described and present.
If firmware describes a CPU as present, but disabled, skip the registration. Such CPUs are present, but can't be brought online for whatever reason. (e.g. firmware/hypervisor policy).
Once firmware sets the enabled bit, the CPU can be registered and brought online by user-space. Online CPUs, or CPUs that are missing an _STA method must always be registered.
diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c index 1bd6e4b8ab66..42521d89c378 100644 --- a/drivers/acpi/acpi_processor.c +++ b/drivers/acpi/acpi_processor.c @@ -194,6 +194,32 @@ static int acpi_processor_make_present(struct acpi_processor *pr) return ret; }
+static int acpi_processor_make_enabled(struct acpi_processor *pr) +{
- unsigned long long sta;
- acpi_status status;
- bool present, enabled;
- if (!acpi_has_method(pr->handle, "_STA"))
return arch_register_cpu(pr->id);
- status = acpi_evaluate_integer(pr->handle, "_STA", NULL, &sta);
- if (ACPI_FAILURE(status))
return -ENODEV;
- present = sta & ACPI_STA_DEVICE_PRESENT;
- enabled = sta & ACPI_STA_DEVICE_ENABLED;
- if (cpu_online(pr->id) && (!present || !enabled)) {
pr_err_once(FW_BUG "CPU %u is online, but described as not present or
disabled!\n", pr->id);
add_taint(TAINT_FIRMWARE_WORKAROUND, LOCKDEP_STILL_OK);
- } else if (!present || !enabled) {
return -ENODEV;
- }
This change and setting all possible cpus as *present* in smp_prepare_cpus() will always cause all present == possible in the guest kernel.
This is quite deliberate. I don't want to redefine present without a machine that actually supports hotplug/package-hotadd. This stuff is the tip of an ill-defined iceberg in the ACPI spec. Once there is hardware that supports this, we will have a better idea of what needs changing. Until then: everything described by ACPI must be present.
I think we can avoid that by the trick which Jean-Phillipe exploited in his patch-set[1] sent earlier last year.
That was the other side of this: https://gitlab.arm.com/linux-arm/linux-jm/-/commit/3106cccf5b9f01f44789b748a...
This was an attempt to do all this without changes to the ACPI spec - it doesn't touch the present cpumask.
[..]
This shall ensure that we correctly reflect only present vcpus to the linux kernel although the sizing and initialization of the GICC/GICR would have already happened for the complete set for possible vcpus i.e. the ones with
[1] _STA[0] is set & _STA[1] bit is set and [2] Either GICC_flag_Intf_Flag.Enabled set OR GICC_flag.online_capable set
so effectively we are only deferring populating the cpu present mask for the disabled cpus but which are now online capable(or Hotplug capable in future?)
What is the user observable effect of the kernel knowing this CPUs are really present?
The intention of this series is to do this as pure policy.
I anticipate pressure on the "use the MADT GICR" line, even though ACPI doesn't say anything about the presence of MADT GICC's redistributor entry. If this happens, we'd depend on present meaning present.
All the hotplug/package-hotadd machinery is triggered by udev. We don't need to hack the cpu present mask to make that work.
Question: Q1: Current acpi_processor.c code is not using ACPI_STA_DEVICE_{ENABLED, UI} bits. Could it break other architecture if we use these bits but some of their legacy devices or firmware does not initialize these bits to their defaults?
Almost certainly! I'm pretty confident some vendors generate their ACPI tables using markov-models. (It boots! Ship it!)
The approach that used the UI bit to mean sysfs had to be hidden behind a Kconfig symbol, which is only marginally better than #ifdef CONFIG_ARM64.
This new version walks a fine line described in the cover-letter: any platform with firmware tables that get this wrong should get the same user-experience as there is no policy enforcement on x86, so the !online_capable CPUs can be detected as being online, and the policy stuff gets ignored.
I don't want to bring the UI bi back, as that would mean bringing back the #ifdeffery. I'm pretty sure that bit was only for Windows-95!
Thanks,
James
I am trying to test this change. If you don't have any apprehensions then we can include this change (both at the kernel and qemu) as well. Thoughts?
Hi James
From: James Morse [mailto:james.morse@arm.com] Sent: Friday, September 9, 2022 5:53 PM To: Salil Mehta salil.mehta@huawei.com; linaro-open-discussions@op-lists.linaro.org Cc: lorenzo.pieralisi@linaro.org; Jean-Philippe Brucker jean-philippe@linaro.org; Jonathan Cameron jonathan.cameron@huawei.com Subject: Re: [RFC PATCH v0.1 22/25] ACPI: add support to register CPUs based on the _STA enabled bit
Hi Salil,
On 09/09/2022 15:53, Salil Mehta wrote:
From: James Morse [mailto:james.morse@arm.com] Sent: Wednesday, August 31, 2022 12:09 PM To: linaro-open-discussions@op-lists.linaro.org Cc: Salil Mehta salil.mehta@huawei.com; james.morse@arm.com; lorenzo.pieralisi@linaro.org; Jean-Philippe Brucker jean-philippe@linaro.org Subject: [RFC PATCH v0.1 22/25] ACPI: add support to register CPUs based on
the
_STA enabled bit
acpi_processor_get_info() registers all present CPUs. Registering a CPU is what creates the sysfs entries and triggers the udev notifications.
arm64 virtual machines that support 'virtual cpu hotplug' use the enabled bit to indicate whether the CPU can be brought online, as the existing ACPI tables require all hardware to be described and present.
If firmware describes a CPU as present, but disabled, skip the registration. Such CPUs are present, but can't be brought online for whatever reason. (e.g. firmware/hypervisor policy).
Once firmware sets the enabled bit, the CPU can be registered and brought online by user-space. Online CPUs, or CPUs that are missing an _STA method must always be registered.
diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c index 1bd6e4b8ab66..42521d89c378 100644 --- a/drivers/acpi/acpi_processor.c +++ b/drivers/acpi/acpi_processor.c @@ -194,6 +194,32 @@ static int acpi_processor_make_present(struct acpi_processor *pr) return ret; }
+static int acpi_processor_make_enabled(struct acpi_processor *pr) +{
- unsigned long long sta;
- acpi_status status;
- bool present, enabled;
- if (!acpi_has_method(pr->handle, "_STA"))
return arch_register_cpu(pr->id);
- status = acpi_evaluate_integer(pr->handle, "_STA", NULL, &sta);
- if (ACPI_FAILURE(status))
return -ENODEV;
- present = sta & ACPI_STA_DEVICE_PRESENT;
- enabled = sta & ACPI_STA_DEVICE_ENABLED;
- if (cpu_online(pr->id) && (!present || !enabled)) {
pr_err_once(FW_BUG "CPU %u is online, but described as not present
or
disabled!\n", pr->id);
add_taint(TAINT_FIRMWARE_WORKAROUND, LOCKDEP_STILL_OK);
- } else if (!present || !enabled) {
return -ENODEV;
- }
This change and setting all possible cpus as *present* in smp_prepare_cpus() will always cause all present == possible in the guest kernel.
This is quite deliberate. I don't want to redefine present without a machine that actually supports hotplug/package-hotadd. This stuff is the tip of an ill-defined iceberg in the ACPI spec. Once there is hardware that supports this, we will have a better idea of what needs changing. Until then: everything described by ACPI must be present.
Present mask operates on the logical cpuids. Later are more closely related to the Linux abstract model. I see no problem in masking certain available devices(in this case cpus) from upper user. This is done at many places inside the kernel to intentionally not/conditionally expose certain devices to user even after getting discovered at the boot time or later.
As such, this change can co-exists irrespective of whether Hotplug or Hotadd will ever exist in the system.
I agree with the ACPI part and maybe interface is broken but then you have used ACPI_STA_DEVICE_ENABLED which has not been used yet in acpi_processor.c code which is ACPI related. How can you make sure this bit is being set by firmware of other architectures, especially legacy?
I think we can avoid that by the trick which Jean-Phillipe exploited in his patch-set[1] sent earlier last year.
That was the other side of this: https://gitlab.arm.com/linux-arm/linux-jm/-/commit/3106cccf5b9f01f44789b748 aaee3a95fee99a97
This was an attempt to do all this without changes to the ACPI spec - it doesn't touch the present cpumask.
Yes, I did refer those but the idea was not to use that change as it is.
[..]
This shall ensure that we correctly reflect only present vcpus to the linux kernel although the sizing and initialization of the GICC/GICR would have already happened for the complete set for possible vcpus i.e. the ones with
[1] _STA[0] is set & _STA[1] bit is set and [2] Either GICC_flag_Intf_Flag.Enabled set OR GICC_flag.online_capable set
so effectively we are only deferring populating the cpu present mask for the disabled cpus but which are now online capable(or Hotplug capable in future?)
What is the user observable effect of the kernel knowing this CPUs are really present?
User Interface looks inconsistent and can break existing scripts.
As you can see, user requested max possible cpus(=6) and cold booted cpus(=4) Hence, the number of cpus directories correctly being shown are 4 but then total number of cpus present are being shown as 6 (i.e. 0-5).
If we can defer the registration of the disabled cpus (but are online capable i.e. for possible - present) then I don’t see why we can't mask availability of these cpus by not marking them as present to user so that the entries are consistent. With this scripts/utils using these values can go horribly wrong.
At Guest Kernel --------------- estuary:/$ ls -al /sys/devices/system/cpu/ total 0 drwxr-xr-x 12 root 0 0 Sep 9 19:19 . drwxr-xr-x 8 root 0 0 Sep 9 19:19 .. drwxr-xr-x 7 root 0 0 Sep 9 19:19 cpu0 drwxr-xr-x 7 root 0 0 Sep 9 19:19 cpu1 drwxr-xr-x 7 root 0 0 Sep 9 19:19 cpu2 drwxr-xr-x 7 root 0 0 Sep 9 19:19 cpu3 drwxr-xr-x 2 root 0 0 Sep 9 19:19 cpufreq drwxr-xr-x 2 root 0 0 Sep 9 19:19 cpuidle [...]
estuary:/$ cat /sys/devices/system/cpu/possible 0-5 estuary:/$ cat /sys/devices/system/cpu/present 0-5 estuary:/$ cat /sys/devices/system/cpu/offline 4-5 estuary:/$
At Qemu ------- $QEMUBIN --enable-kvm -machine virt,gic-version=3 -cpu host -smp cpus=4,maxcpus=6 -append "console=ttyAMA0 root=/dev/ram earlycon rdinit=/init maxcpus=4 acpi=force"
The intention of this series is to do this as pure policy.
I anticipate pressure on the "use the MADT GICR" line, even though ACPI doesn't say anything about the presence of MADT GICC's redistributor entry. If this happens, we'd depend on present meaning present.
If we are confident that flag ACPI_STA_DEVICE_ENABLED is being set properly by ARM and other architecture firmware, then Qemu can take care of that policy. It has all the information of the vcpus which are possible and disabled (but are online capable). We can use this info to conditionally return appropriate status when _STA ACPI method is evaluated.
I intentionally refrained to use the this approach in my first RFC[1] as the default code in the acpi_processor.c was only making use of the ACPI_STA_DEVICE_PRESENT bit after evaluation of _STA method. Qemu was also setting only present bit in the returned status value. Plus, I wanted to minimize the changes in the kernel in the first version of the RFC.
[1] https://lore.kernel.org/qemu-devel/38a034f82da78b8861af6d25a83fddea@kernel.o...
All the hotplug/package-hotadd machinery is triggered by udev. We don't need to hack the cpu present mask to make that work.
May I know what exactly are your apprehensions with 'udev'?
As such 'udev' should make use of the Linux device model and it is not necessary to present 1:1 picture of the hardware to the abstract model(and which by the way we are not doing by not registering the disabled cpus). It will just expose that limited picture of the hardware to the user whatever is being presented by the kernel.
AFAICS it should work just fine but we need to limit the present cpus.
Question: Q1: Current acpi_processor.c code is not using ACPI_STA_DEVICE_{ENABLED, UI} bits. Could it break other architecture if we use these bits but some of their legacy devices or firmware does not initialize these bits to their defaults?
Almost certainly! I'm pretty confident some vendors generate their ACPI tables using markov-models. (It boots! Ship it!)
The approach that used the UI bit to mean sysfs had to be hidden behind a Kconfig symbol, which is only marginally better than #ifdef CONFIG_ARM64.
If there are problems in using the ACPI_STA_DEVICE_UI Bit because it might conflict with the legacy firmware of other architectures then let us drop that.
We can alternatively use the ACPI_STA_DEVICE_ENABLED Bit in the _STA method which can be conditionally set by the Qemu?
This new version walks a fine line described in the cover-letter: any platform with firmware tables that get this wrong should get the same user-experience as there is no policy enforcement on x86, so the !online_capable CPUs can be detected as being online, and the policy stuff gets ignored.
Yes, I do understand your predicament, but ideally user experience is dictated by what *end* user sees. Here, by not masking the disabled cpus in the cpu present mask user will not have similar experience on ARM64 and x86_64 platforms and that is undeniable and will in the end matter the most since this feature will mostly be used on the servers.
Thanks Salil
Hi Salil,
On 12/09/2022 19:09, Salil Mehta wrote:
From: James Morse [mailto:james.morse@arm.com] Sent: Friday, September 9, 2022 5:53 PM To: Salil Mehta salil.mehta@huawei.com; linaro-open-discussions@op-lists.linaro.org Cc: lorenzo.pieralisi@linaro.org; Jean-Philippe Brucker jean-philippe@linaro.org; Jonathan Cameron jonathan.cameron@huawei.com Subject: Re: [RFC PATCH v0.1 22/25] ACPI: add support to register CPUs based on the _STA enabled bit
On 09/09/2022 15:53, Salil Mehta wrote:
From: James Morse [mailto:james.morse@arm.com] Sent: Wednesday, August 31, 2022 12:09 PM To: linaro-open-discussions@op-lists.linaro.org Cc: Salil Mehta salil.mehta@huawei.com; james.morse@arm.com; lorenzo.pieralisi@linaro.org; Jean-Philippe Brucker jean-philippe@linaro.org Subject: [RFC PATCH v0.1 22/25] ACPI: add support to register CPUs based on
the
_STA enabled bit
acpi_processor_get_info() registers all present CPUs. Registering a CPU is what creates the sysfs entries and triggers the udev notifications.
arm64 virtual machines that support 'virtual cpu hotplug' use the enabled bit to indicate whether the CPU can be brought online, as the existing ACPI tables require all hardware to be described and present.
If firmware describes a CPU as present, but disabled, skip the registration. Such CPUs are present, but can't be brought online for whatever reason. (e.g. firmware/hypervisor policy).
Once firmware sets the enabled bit, the CPU can be registered and brought online by user-space. Online CPUs, or CPUs that are missing an _STA method must always be registered.
This change and setting all possible cpus as *present* in smp_prepare_cpus() will always cause all present == possible in the guest kernel.
This is quite deliberate. I don't want to redefine present without a machine that actually supports hotplug/package-hotadd. This stuff is the tip of an ill-defined iceberg in the ACPI spec. Once there is hardware that supports this, we will have a better idea of what needs changing. Until then: everything described by ACPI must be present.
Present mask operates on the logical cpuids. Later are more closely related to the Linux abstract model. I see no problem in masking certain available devices(in this case cpus) from upper user. This is done at many places inside the kernel to intentionally not/conditionally expose certain devices to user even after getting discovered at the boot time or later.
And that is what this series does via sysfs. But! The cpu numbers are primarily for making 0x4 and 0x100 contiguous so that the percpu allocator and other kernel data structures can use 'cpu' as an index.
I am strongly against touching the cpu present mask as these CPUs really are present. We depend on everything described in ACPI being present. Having a ugly hack where we removed the CPUs in this case means we need a new word for 'present' when we get machines that really do have not-present CPUs. (simon-says-present?). That will be hard enough, without muddying the waters with things like this.
Diverging the definitions of linux:present and acpi:present will put the maintainer in an asylum.
Part of the problem here is selling this as 'virtual CPU hotplug', while it is really "firmware CPU online policy", which is the problem description the Kubernetes folk had.
As such, this change can co-exists irrespective of whether Hotplug or Hotadd will ever exist in the system.
I agree with the ACPI part and maybe interface is broken but then you have used ACPI_STA_DEVICE_ENABLED which has not been used yet in acpi_processor.c code which is ACPI related. How can you make sure this bit is being set by firmware of other architectures, especially legacy?
I assume that this is broken on many platforms. The cover letter describes how the workaround works, if the CPUs are online because firmware isn't enforcing the policy, then the CPUs still get registered. From memory you get a warning and a kernel taint.
[..]
This shall ensure that we correctly reflect only present vcpus to the linux kernel although the sizing and initialization of the GICC/GICR would have already happened for the complete set for possible vcpus i.e. the ones with
[1] _STA[0] is set & _STA[1] bit is set and [2] Either GICC_flag_Intf_Flag.Enabled set OR GICC_flag.online_capable set
so effectively we are only deferring populating the cpu present mask for the disabled cpus but which are now online capable(or Hotplug capable in future?)
What is the user observable effect of the kernel knowing this CPUs are really present?
User Interface looks inconsistent and can break existing scripts.
As you can see, user requested max possible cpus(=6) and cold booted cpus(=4) Hence, the number of cpus directories correctly being shown are 4 but then total number of cpus present are being shown as 6 (i.e. 0-5).
If we can defer the registration of the disabled cpus (but are online capable i.e. for possible - present) then I don’t see why we can't mask availability of these cpus by not marking them as present to user so that the entries are consistent. With this scripts/utils using these values can go horribly wrong.
At Guest Kernel
estuary:/$ ls -al /sys/devices/system/cpu/ total 0 drwxr-xr-x 12 root 0 0 Sep 9 19:19 . drwxr-xr-x 8 root 0 0 Sep 9 19:19 .. drwxr-xr-x 7 root 0 0 Sep 9 19:19 cpu0 drwxr-xr-x 7 root 0 0 Sep 9 19:19 cpu1 drwxr-xr-x 7 root 0 0 Sep 9 19:19 cpu2 drwxr-xr-x 7 root 0 0 Sep 9 19:19 cpu3 drwxr-xr-x 2 root 0 0 Sep 9 19:19 cpufreq drwxr-xr-x 2 root 0 0 Sep 9 19:19 cpuidle [...]
estuary:/$ cat /sys/devices/system/cpu/possible 0-5 estuary:/$ cat /sys/devices/system/cpu/present 0-5 estuary:/$ cat /sys/devices/system/cpu/offline 4-5 estuary:/$
At Qemu
$QEMUBIN --enable-kvm -machine virt,gic-version=3 -cpu host -smp cpus=4,maxcpus=6 -append "console=ttyAMA0 root=/dev/ram earlycon rdinit=/init maxcpus=4 acpi=force"
I allege there is no user-space that does this.
About a year ago I trawled through the debian codesearch for use of the cpu present mask in sysfs. All I found were web browsers that are using it as a hack because they can't trust the enabled list on android. This fed into the memory allocator, so the outcome would be more memory use on a system where not all the CPUs are online.
The present and possible masks really shouldn't be exposed to user-space at all.
The intention of this series is to do this as pure policy.
I anticipate pressure on the "use the MADT GICR" line, even though ACPI doesn't say anything about the presence of MADT GICC's redistributor entry. If this happens, we'd depend on present meaning present.
If we are confident that flag ACPI_STA_DEVICE_ENABLED is being set properly by ARM and other architecture firmware,
I don't follow. I expect this to be wrong on many, many, x86 laptops. The warning and the kernel taint exist to spot such platforms and keep them running.
then Qemu can take care of that policy. It has all the information of the vcpus which are possible and disabled (but are online capable). We can use this info to conditionally return appropriate status when _STA ACPI method is evaluated.
I intentionally refrained to use the this approach in my first RFC[1] as the default code in the acpi_processor.c was only making use of the ACPI_STA_DEVICE_PRESENT bit after evaluation of _STA method. Qemu was also setting only present bit in the returned status value. Plus, I wanted to minimize the changes in the kernel in the first version of the RFC.
We must avoid undermining what present means. The whole ACPI edifice is standing on the assumption that everything in ACPI is present.
Once we get systems that really do support physical hotadd, arm will have to specify whether the cpu present bit implies the presence of the associated GICR, ITS, PMU, SMMU etc etc. Until then, any change to the way linux handles the present bit is creating problems for the future.
I have tried to get some guidance from the folk who write the specs on this, but as there are no systems that support the feature, it isn't possible to know what the hardware constraints are.
All the hotplug/package-hotadd machinery is triggered by udev. We don't need to hack the cpu present mask to make that work.
May I know what exactly are your apprehensions with 'udev'?
I have no apprehensions. As far as I can see the 'requirement' is that the udev event corresponding to cpu-register is seen by user-space. The Kubernetes folk don't want to put their online/offline policy in a user-space agent. (or to manage the _physical_ resources that are being consumed in the hypervisor). But they must have some user-space agent to bring CPUs online because the kernel doesn't do it. I assume those are udev rules. (if not - why not?)
With this series, the output for 'udevadm monitor' should be the same for x86 doing physical hot-add of a package (including all the things that come with a CPU), and notifying arm64 of a change in the firmware online/offline policy.
Can we get the Qemu parts done so we can check with the Kubernetes folk that this works for them?
As such 'udev' should make use of the Linux device model and it is not necessary to present 1:1 picture of the hardware to the abstract model(and which by the way we are not doing by not registering the disabled cpus). It will just expose that limited picture of the hardware to the user whatever is being presented by the kernel.
AFAICS it should work just fine but we need to limit the present cpus.
which would be an ugly hack. They really are present - we need them to be present because the irqchip driver has to access all the GICR to find the system wide supported features. The GIC maintainer said he would not consider bringing GICR online late until there is hardware that needs it. Leaving the present bit alone means we can spot that hardware when it comes.
Question: Q1: Current acpi_processor.c code is not using ACPI_STA_DEVICE_{ENABLED, UI} bits. Could it break other architecture if we use these bits but some of their legacy devices or firmware does not initialize these bits to their defaults?
Almost certainly! I'm pretty confident some vendors generate their ACPI tables using markov-models. (It boots! Ship it!)
The approach that used the UI bit to mean sysfs had to be hidden behind a Kconfig symbol, which is only marginally better than #ifdef CONFIG_ARM64.
If there are problems in using the ACPI_STA_DEVICE_UI Bit because it might conflict with the legacy firmware of other architectures then let us drop that.
We can alternatively use the ACPI_STA_DEVICE_ENABLED Bit in the _STA method which can be conditionally set by the Qemu?
Yes ... this corresponds with the enabled bit in the MADT GICC flags that was left clear at boot. Sorry, I thought this MADT:GICC:online_capable thing had been discussed on one of the LOD calls.
The expectation is firmware clears the MADT:GICC:enabled bit, and sets 'online capable' instead, to indicate this CPU is present, but subject to some kind of policy. The OS can read _STA to find the current policy, and try to online it if the policy has changed. Linux keys this on the _STA:Enabled bit, as that has the same name as the bit that was left clear in the MADT:GICC flags field. The _STA:Present bit has to remain set throughout as the MADT:GICC entry exists - and everything we have today in ACPI assumes everything is present.
I did ask to replicated the _STA flags in the MADT:GICC flags field, but it didn't happen.
This new version walks a fine line described in the cover-letter: any platform with firmware tables that get this wrong should get the same user-experience as there is no policy enforcement on x86, so the !online_capable CPUs can be detected as being online, and the policy stuff gets ignored.
Yes, I do understand your predicament, but ideally user experience is dictated by what *end* user sees. Here, by not masking the disabled cpus in the cpu present mask user will not have similar experience on ARM64 and x86_64 platforms and that is undeniable and will in the end matter the most since this feature will mostly be used on the servers.
But the CPUs are present. If they are not, we don't know how to describe the system in ACPI - the whole edifice stands on the assumption that everything is present.
I agree this is a different solution to the "I wanna a webpage to add CPUs to vms" problem than x86's physical package hot add. x86 can do that because they had these physical machines before they had virtual machines.
Thanks,
James
Hi James/Salil,
On 31-08-2022 16:39, James Morse wrote:
acpi_processor_get_info() registers all present CPUs. Registering a CPU is what creates the sysfs entries and triggers the udev notifications.
arm64 virtual machines that support 'virtual cpu hotplug' use the enabled bit to indicate whether the CPU can be brought online, as the existing ACPI tables require all hardware to be described and present.
If firmware describes a CPU as present, but disabled, skip the registration. Such CPUs are present, but can't be brought online for whatever reason. (e.g. firmware/hypervisor policy).
Once firmware sets the enabled bit, the CPU can be registered and brought online by user-space. Online CPUs, or CPUs that are missing an _STA method must always be registered.
Signed-off-by: James Morse <james.morse(a)arm.com>
drivers/acpi/acpi_processor.c | 31 ++++++++++++++++++++++++++++++- 1 file changed, 30 insertions(+), 1 deletion(-)
diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c index 1bd6e4b8ab66..42521d89c378 100644 --- a/drivers/acpi/acpi_processor.c +++ b/drivers/acpi/acpi_processor.c @@ -194,6 +194,32 @@ static int acpi_processor_make_present(struct acpi_processor *pr) return ret; } +static int acpi_processor_make_enabled(struct acpi_processor *pr) +{
- unsigned long long sta;
- acpi_status status;
- bool present, enabled;
- if (!acpi_has_method(pr->handle, "_STA"))
return arch_register_cpu(pr->id);
- status = acpi_evaluate_integer(pr->handle, "_STA", NULL, &sta);
- if (ACPI_FAILURE(status))
return -ENODEV;
- present = sta & ACPI_STA_DEVICE_PRESENT;
- enabled = sta & ACPI_STA_DEVICE_ENABLED;
- if (cpu_online(pr->id) && (!present || !enabled)) {
pr_err_once(FW_BUG "CPU %u is online, but described as not present or disabled!\n", pr->id);
add_taint(TAINT_FIRMWARE_WORKAROUND, LOCKDEP_STILL_OK);
- } else if (!present || !enabled) {
return -ENODEV;
- }
- return arch_register_cpu(pr->id);
+}
- static int acpi_processor_get_info(struct acpi_device *device) { union acpi_object object = { 0 };
@@ -271,7 +297,7 @@ static int acpi_processor_get_info(struct acpi_device *device) } if (!invalid_logical_cpuid(pr->id) && cpu_present(pr->id)) {
int ret = arch_register_cpu(pr->id);
int ret = acpi_processor_make_enabled(pr);
if (ret) return ret; @@ -479,6 +505,9 @@ static void acpi_processor_remove(struct acpi_device *device) acpi_processor_make_not_present(device); return; }
- if (cpu_present(pr->id) && !(sta & ACPI_STA_DEVICE_ENABLED))
}arch_unregister_cpu(pr->id);
#ifdef CONFIG_X86
Enumeration failure while trying to hot plug vcpus after guest reboot
Following sequence of steps causes the issue
1. Boot VM with max 80 vcpus and current 72 vcpus(For example) 2. hot plug 3 more vcpus # virsh setvcpus vm2 75 --live 3. hot unplug 3 vcpus ( which were added earlier ) # virsh setvcpus vm2 72 --live 4. reboot VM # reboot 5. Again try to hot plug 3 vcpus -- ERROR # virsh setvcpus vm2 75 --live [ 47.367823] acpi ACPI0007:48: Enumeration failure [ 47.565173] acpi ACPI0007:49: Enumeration failure [ 47.703423] acpi ACPI0007:4a: Enumeration failure
I have tested the above scenario with following git repos
https://github.com/salil-mehta/qemu/tree/virt-cpuhp-armv8/rfc-v1-port2909202...
https://github.com/salil-mehta/linux/tree/virt-cpuhp-arm64/rfc-v2/jmorse-var... https://gitlab.arm.com/linux-arm/linux-jm/-/tree/virtual_cpu_hotplug/rfc/v/0...
I observed the following sequence while doing hotplug and unplug
When we do hotplug, making vcpu as *present*(_STA.Pres=1) and *enabled*(_STA.Enabled=1) When we do hot unplug, we just *disabled*(_STA.Enabled=0) vcpu but not *removed*(_STA.Pres=1)
In such a case CPU shall still be present at the firmware/VMM level and could be kept disabled i.e. _STA.Enable=0 and _STA.Present=1
After reboot, system is trying to enumerate vcpu(as *present*(_STA.Pres=1)) and it is marking flags flags.initialized = true and flags.visited = true
as the vcpu *disabled*(_STA.Enabled=0), vcpu has been made not present and flags flags.initialized and flags.visited were not cleared
And following fix resolves the issue
diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c index 558664d169fc..e93b933cb8a4 100644 --- a/drivers/acpi/scan.c +++ b/drivers/acpi/scan.c @@ -2158,6 +2158,9 @@ static int acpi_scan_attach_handler(struct acpi_device *device) break;
device->handler = NULL; + device->flags.initialized = false; + acpi_device_clear_enumerated(device); + device->flags.power_manageable = 0; if (ret < 0) break; }
_Regards_, -Vishnu.
Hi Vishnu,
(CC: +Salil^2)
On 23/02/2023 06:10, Vishnu Pajjuri via Linaro-open-discussions wrote:
On 31-08-2022 16:39, James Morse wrote:
acpi_processor_get_info() registers all present CPUs. Registering a CPU is what creates the sysfs entries and triggers the udev notifications.
arm64 virtual machines that support 'virtual cpu hotplug' use the enabled bit to indicate whether the CPU can be brought online, as the existing ACPI tables require all hardware to be described and present.
If firmware describes a CPU as present, but disabled, skip the registration. Such CPUs are present, but can't be brought online for whatever reason. (e.g. firmware/hypervisor policy).
Once firmware sets the enabled bit, the CPU can be registered and brought online by user-space. Online CPUs, or CPUs that are missing an _STA method must always be registered.
Enumeration failure while trying to hot plug vcpus after guest reboot
Following sequence of steps causes the issue
- Boot VM with max 80 vcpus and current 72 vcpus(For example)
- hot plug 3 more vcpus
# virsh setvcpus vm2 75 --live 3. hot unplug 3 vcpus ( which were added earlier ) # virsh setvcpus vm2 72 --live 4. reboot VM # reboot 5. Again try to hot plug 3 vcpus -- ERROR # virsh setvcpus vm2 75 --live [ 47.367823] acpi ACPI0007:48: Enumeration failure [ 47.565173] acpi ACPI0007:49: Enumeration failure [ 47.703423] acpi ACPI0007:4a: Enumeration failure
I can reproduce this, thanks for the detailed description. (I didn't use 'virsh', that doesn't seem to be relevant)
I observed the following sequence while doing hotplug and unplug
When we do hotplug, making vcpu as *present*(_STA.Pres=1) and *enabled*(_STA.Enabled=1) When we do hot unplug, we just *disabled*(_STA.Enabled=0) vcpu but not *removed*(_STA.Pres=1)
In such a case CPU shall still be present at the firmware/VMM level and could be kept disabled i.e. _STA.Enable=0 and _STA.Present=1
After reboot, system is trying to enumerate vcpu(as *present*(_STA.Pres=1)) and it is marking flags flags.initialized = true and flags.visited = true
This would be no different to the first boot...
as the vcpu *disabled*(_STA.Enabled=0), vcpu has been made not present and flags flags.initialized and flags.visited were not cleared
(It's important that everything remains present all the time. When a CPU is disabled, its offline and unregistered, but all the bits and pieces of logic that go with the CPU are still present. This is what makes it safe to do this on physical hardware.)
And following fix resolves the issue
diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c index 558664d169fc..e93b933cb8a4 100644 --- a/drivers/acpi/scan.c +++ b/drivers/acpi/scan.c @@ -2158,6 +2158,9 @@ static int acpi_scan_attach_handler(struct acpi_device *device) break;
device->handler = NULL; + device->flags.initialized = false; + acpi_device_clear_enumerated(device); + device->flags.power_manageable = 0; if (ret < 0) break; }
I think the bigger question is why is acpi_dev_ready_for_enumeration() false on the first boot, and true on the second, when the state of the CPU should be the same in both cases. Nothing should have been preserved over the reboot.
~
Turns out Qemu is returning in '0' in _STA until the CPU is added, which makes it 0xf. Once its removed, it becomes 0xd.
Qemu should be reporting the CPU as at least present from the very beginning, as it is described in the MADT.
Salil, could look into this on the qemu side?
~
I'd much prefer we change the ACPI logic to only enumerate devices that are both present and enabled. Do the last four patches here, (or indeed the whole branch), fix this issue for you: https://gitlab.arm.com/linux-arm/linux-jm/-/tree/virtual_cpu_hotplug/rfc/sna...
The only reason not to do this, is a widely deploy set of hardware where only the present bit is set, because previously that was sufficient. If this is found to be true, the workaround would be to use acpi_dev_ready_for_enumeration() in all the places I've added acpi_device_is_present_and_enabled(), and abuse acpi_dev_ready_for_enumeration() to return different values depending on the type of device, the architecture and/or phase of the moon.
Thanks for the bug report!
James
Hello, Thanks for this.
-----Original Message----- From: James Morse james.morse@arm.com Sent: Friday, February 24, 2023 5:39 PM To: Vishnu Pajjuri vishnu@amperemail.onmicrosoft.com; linaro-open- discussions@op-lists.linaro.org; Salil Mehta salil.mehta@huawei.com; salil.mehta@opnsrc.net Subject: Re: [Linaro-open-discussions] Re: [RFC PATCH v0.1 22/25] ACPI: add support to register CPUs based on the _STA enabled bit
Hi Vishnu,
(CC: +Salil^2)
On 23/02/2023 06:10, Vishnu Pajjuri via Linaro-open-discussions wrote:
On 31-08-2022 16:39, James Morse wrote:
acpi_processor_get_info() registers all present CPUs. Registering a CPU is what creates the sysfs entries and triggers the udev notifications.
arm64 virtual machines that support 'virtual cpu hotplug' use the enabled bit to indicate whether the CPU can be brought online, as the existing ACPI tables require all hardware to be described and present.
If firmware describes a CPU as present, but disabled, skip the registration. Such CPUs are present, but can't be brought online for whatever reason. (e.g. firmware/hypervisor policy).
Once firmware sets the enabled bit, the CPU can be registered and brought online by user-space. Online CPUs, or CPUs that are missing an _STA method must always be registered.
Enumeration failure while trying to hot plug vcpus after guest reboot
Following sequence of steps causes the issue
- Boot VM with max 80 vcpus and current 72 vcpus(For example)
- hot plug 3 more vcpus
# virsh setvcpus vm2 75 --live 3. hot unplug 3 vcpus ( which were added earlier ) # virsh setvcpus vm2 72 --live 4. reboot VM # reboot 5. Again try to hot plug 3 vcpus -- ERROR # virsh setvcpus vm2 75 --live [ 47.367823] acpi ACPI0007:48: Enumeration failure [ 47.565173] acpi ACPI0007:49: Enumeration failure [ 47.703423] acpi ACPI0007:4a: Enumeration failure
I can reproduce this, thanks for the detailed description. (I didn't use 'virsh', that doesn't seem to be relevant)
I observed the following sequence while doing hotplug and unplug
When we do hotplug, making vcpu as *present*(_STA.Pres=1) and *enabled*(_STA.Enabled=1) When we do hot unplug, we just *disabled*(_STA.Enabled=0) vcpu but not *removed*(_STA.Pres=1)
In such a case CPU shall still be present at the firmware/VMM level and could be kept disabled i.e. _STA.Enable=0 and _STA.Present=1
After reboot, system is trying to enumerate vcpu(as *present*(_STA.Pres=1)) and it is marking flags flags.initialized = true and flags.visited = true
This would be no different to the first boot...
as the vcpu *disabled*(_STA.Enabled=0), vcpu has been made not present and flags flags.initialized and flags.visited were not cleared
(It's important that everything remains present all the time. When a CPU is disabled, its offline and unregistered, but all the bits and pieces of logic that go with the CPU are still present. This is what makes it safe to do this on physical hardware.)
And following fix resolves the issue
diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c index 558664d169fc..e93b933cb8a4 100644 --- a/drivers/acpi/scan.c +++ b/drivers/acpi/scan.c @@ -2158,6 +2158,9 @@ static int acpi_scan_attach_handler(struct
acpi_device *device)
break;
device->handler = NULL; + device->flags.initialized = false; + acpi_device_clear_enumerated(device); + device->flags.power_manageable = 0; if (ret < 0) break; }
I think the bigger question is why is acpi_dev_ready_for_enumeration() false on the first boot, and true on the second, when the state of the CPU should be the same in both cases. Nothing should have been preserved over the reboot.
~
Turns out Qemu is returning in '0' in _STA until the CPU is added, which makes it 0xf. Once its removed, it becomes 0xd.
Qemu should be reporting the CPU as at least present from the very beginning, as it is described in the MADT.
Salil, could look into this on the qemu side?
Yes, It’s a valid point but can interfere with the legacy? But will again check this today.
I'd much prefer we change the ACPI logic to only enumerate devices that are both present and enabled. Do the last four patches here, (or indeed the whole branch), fix this issue for you: https://gitlab.arm.com/linux-arm/linux-jm/- /tree/virtual_cpu_hotplug/rfc/snapshot/20230224
The only reason not to do this, is a widely deploy set of hardware where only the present bit is set, because previously that was sufficient.
Exactly this.
I think we knew such problems would surface if we start to bank upon "enabled" than "present". There could be more such corners waiting to be rolled out.
If this is found to be true, the workaround would be to use acpi_dev_ready_for_enumeration() in all the places I've added acpi_device_is_present_and_enabled(), and abuse acpi_dev_ready_for_enumeration() to return different values depending on the type of device, the architecture and/or phase of the moon.
I guess just enabled check would have suffice here. Device has to be "present" to be "enabled"?
Thanks
Hi Salil,
On 27/02/2023 09:49, Salil Mehta wrote:
-----Original Message----- From: James Morse james.morse@arm.com Sent: Friday, February 24, 2023 5:39 PM To: Vishnu Pajjuri vishnu@amperemail.onmicrosoft.com; linaro-open- discussions@op-lists.linaro.org; Salil Mehta salil.mehta@huawei.com; salil.mehta@opnsrc.net Subject: Re: [Linaro-open-discussions] Re: [RFC PATCH v0.1 22/25] ACPI: add support to register CPUs based on the _STA enabled bit
On 23/02/2023 06:10, Vishnu Pajjuri via Linaro-open-discussions wrote:
On 31-08-2022 16:39, James Morse wrote:
acpi_processor_get_info() registers all present CPUs. Registering a CPU is what creates the sysfs entries and triggers the udev notifications.
arm64 virtual machines that support 'virtual cpu hotplug' use the enabled bit to indicate whether the CPU can be brought online, as the existing ACPI tables require all hardware to be described and present.
If firmware describes a CPU as present, but disabled, skip the registration. Such CPUs are present, but can't be brought online for whatever reason. (e.g. firmware/hypervisor policy).
Once firmware sets the enabled bit, the CPU can be registered and brought online by user-space. Online CPUs, or CPUs that are missing an _STA method must always be registered.
Enumeration failure while trying to hot plug vcpus after guest reboot
as the vcpu *disabled*(_STA.Enabled=0), vcpu has been made not present and flags flags.initialized and flags.visited were not cleared
(It's important that everything remains present all the time. When a CPU is disabled, its offline and unregistered, but all the bits and pieces of logic that go with the CPU are still present. This is what makes it safe to do this on physical hardware.)
And following fix resolves the issue
Turns out Qemu is returning in '0' in _STA until the CPU is added, which makes it 0xf. Once its removed, it becomes 0xd.
Qemu should be reporting the CPU as at least present from the very beginning, as it is described in the MADT.
Salil, could look into this on the qemu side?
Yes, It’s a valid point but can interfere with the legacy? But will again check this today.
It doesn't. Existing versions of linux don't comprehend the MADT ONLINE_CAPABLE bit, so those present CPUs are removed from the present mask. When ACPI tries to set them up, acpi_processor_hotadd_init() will bale out with ENODEV because the CPUID isn't "mapped", and can't be mapped, because the arch code doesn't consider that CPU present.
You can't bodge round legacy things like this in Qemu as this reboot case shows. Even if you try and hook reboot, the guest may use kexec instead.
I'd much prefer we change the ACPI logic to only enumerate devices that are both present and enabled. Do the last four patches here, (or indeed the whole branch), fix this issue for you: https://gitlab.arm.com/linux-arm/linux-jm/- /tree/virtual_cpu_hotplug/rfc/snapshot/20230224
The only reason not to do this, is a widely deploy set of hardware where only the present bit is set, because previously that was sufficient.
Exactly this.
I think we knew such problems would surface if we start to bank upon "enabled" than "present". There could be more such corners waiting to be rolled out.
ACPI 6.3's 6.3.7 is pretty clear that both these bits have to be set. I'd argue this is a bug in linux if its trying to probe disabled devices.
I'm not too worried about CPUs, but docks for ten year old laptops. In the worst case the wreckage should be confined to x86, so we can have a per-arch 'ignore the enabled bit' that keeps the existing behaviour.
If this is found to be true, the workaround would be to use acpi_dev_ready_for_enumeration() in all the places I've added acpi_device_is_present_and_enabled(), and abuse acpi_dev_ready_for_enumeration() to return different values depending on the type of device, the architecture and/or phase of the moon.
I guess just enabled check would have suffice here. Device has to be "present" to be "enabled"?
It is, lets hope the firmware folk have been reading that bit of the spec!
I think the best thing to do is check (enabled || functional) in acpi_dev_ready_for_enumeration(), with a patch we hope we don't have to merge that gives x86 a 'ACPI_IGNORE_STA_ENABLED' to keep the current behaviour.
Thanks,
James
From: Jean-Philippe Brucker jean-philippe@linaro.org
When capability KVM_CAP_ARM_HVC_TO_USER is available, userspace can request to handle all hypercalls that aren't handled by KVM. With the help of another capability, this will allow userspace to handle PSCI calls.
Suggested-by: James Morse james.morse@arm.com Signed-off-by: Jean-Philippe Brucker jean-philippe@linaro.org Signed-off-by: James Morse james.morse@arm.com
---
Notes on this implementation:
* A similar mechanism was proposed for SDEI some time ago [1]. This RFC generalizes the idea to all hypercalls, since that was suggested on the list [2, 3].
* We're reusing kvm_run.hypercall. I copied x0-x5 into kvm_run.hypercall.args[] to help userspace but I'm tempted to remove this, because: - Most user handlers will need to write results back into the registers (x0-x3 for SMCCC), so if we keep this shortcut we should go all the way and read them back on return to kernel. - QEMU doesn't care about this shortcut, it pulls all vcpu regs before handling the call. - SMCCC uses x0-x16 for parameters. x0 does contain the SMCCC function ID and may be useful for fast dispatch, we could keep that plus the immediate number.
* Add a flag in the kvm_run.hypercall telling whether this is HVC or SMC? Can be added later in those bottom longmode and pad fields.
* On top of this we could share with userspace which HVC ranges are available and which ones are handled by KVM. That can actually be added independently, through a vCPU/VM device attribute which doesn't consume a new ioctl: - userspace issues HAS_ATTR ioctl on the vcpu fd to query whether this feature is available. - userspace queries the number N of HVC ranges using one GET_ATTR. - userspace passes an array of N ranges using another GET_ATTR. The array is filled and returned by KVM.
* Enabling this using a vCPU arch feature rather than the whole-VM capability would be fine, but it would be difficult to do the same for the following psci-in-user capability. So let's enable everything at the VM scope.
* No idea whether this work out of the box for AArch32 guests.
[1] https://lore.kernel.org/linux-arm-kernel/20170808164616.25949-12-james.morse... [2] https://lore.kernel.org/linux-arm-kernel/bf7e83f1-c58e-8d65-edd0-d08f27b8b76... [3] https://lore.kernel.org/linux-arm-kernel/f56cf420-affc-35f0-2355-801a924b8a3... --- Documentation/virt/kvm/api.rst | 17 +++++++++++++++-- arch/arm64/include/asm/kvm_host.h | 1 + arch/arm64/kvm/arm.c | 5 +++++ arch/arm64/kvm/hypercalls.c | 28 +++++++++++++++++++++++++++- include/kvm/arm_psci.h | 4 ++++ include/uapi/linux/kvm.h | 1 + 6 files changed, 53 insertions(+), 3 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index abd7c32126ce..ea995e00d314 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6204,8 +6204,12 @@ to the byte array. __u32 pad; } hypercall;
-Unused. This was once used for 'hypercall to userspace'. To implement -such functionality, use KVM_EXIT_IO (x86) or KVM_EXIT_MMIO (all except s390). +On x86 this was once used for 'hypercall to userspace'. To implement such +functionality, use KVM_EXIT_IO (x86) or KVM_EXIT_MMIO (all except s390). + +On arm64 it is used for hypercalls, when the KVM_CAP_ARM_HVC_TO_USER capability +is enabled. 'nr' contains the HVC or SMC immediate. 'args' contains registers +x0 - x5. The other parameters are unused.
.. note:: KVM_EXIT_IO is significantly faster than KVM_EXIT_MMIO.
@@ -8303,6 +8307,15 @@ structure. When getting the Modified Change Topology Report value, the attr->addr must point to a byte where the value will be stored or retrieved from.
+8.40 KVM_CAP_ARM_HVC_TO_USER +---------------------------- + +:Architecture: arm64 + +This capability indicates that KVM can pass unhandled hypercalls to userspace, +if the VMM enables it. Hypercalls are passed with KVM_EXIT_HYPERCALL in +kvm_run::hypercall. + 9. Known KVM API problems =========================
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index e9c9388ccc02..8d69a196e241 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -149,6 +149,7 @@ struct kvm_arch { #define KVM_ARCH_FLAG_EL1_32BIT 4 /* PSCI SYSTEM_SUSPEND enabled for the guest */ #define KVM_ARCH_FLAG_SYSTEM_SUSPEND_ENABLED 5 +#define KVM_ARCH_FLAG_HVC_TO_USER 6
unsigned long flags;
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 2ff0ef62abad..ac76ab053478 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -101,6 +101,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, r = 0; set_bit(KVM_ARCH_FLAG_SYSTEM_SUSPEND_ENABLED, &kvm->arch.flags); break; + case KVM_CAP_ARM_HVC_TO_USER: + r = 0; + set_bit(KVM_ARCH_FLAG_HVC_TO_USER, &kvm->arch.flags); + break; default: r = -EINVAL; break; @@ -218,6 +222,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_VCPU_ATTRIBUTES: case KVM_CAP_PTP_KVM: case KVM_CAP_ARM_SYSTEM_SUSPEND: + case KVM_CAP_ARM_HVC_TO_USER: r = 1; break; case KVM_CAP_SET_GUEST_DEBUG2: diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c index c9f401fa01a9..61ce10fdad40 100644 --- a/arch/arm64/kvm/hypercalls.c +++ b/arch/arm64/kvm/hypercalls.c @@ -121,6 +121,28 @@ static bool kvm_hvc_call_allowed(struct kvm_vcpu *vcpu, u32 func_id) } }
+static int kvm_hvc_user(struct kvm_vcpu *vcpu) +{ + int i; + struct kvm_run *run = vcpu->run; + + if (test_bit(KVM_ARCH_FLAG_HVC_TO_USER, &vcpu->kvm->arch.flags)) { + smccc_set_retval(vcpu, SMCCC_RET_NOT_SUPPORTED, 0, 0, 0); + return 1; + } + + run->exit_reason = KVM_EXIT_HYPERCALL; + run->hypercall.nr = kvm_vcpu_hvc_get_imm(vcpu); + /* Add the first parameters for fast access. */ + for (i = 0; i < 6; i++) + run->hypercall.args[i] = vcpu_get_reg(vcpu, i); + run->hypercall.ret = 0; + run->hypercall.longmode = 0; + run->hypercall.pad = 0; + + return 0; +} + int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) { struct kvm_smccc_features *smccc_feat = &vcpu->kvm->arch.smccc_feat; @@ -219,8 +241,12 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) case ARM_SMCCC_TRNG_RND32: case ARM_SMCCC_TRNG_RND64: return kvm_trng_call(vcpu); - default: + case KVM_PSCI_FN_BASE...KVM_PSCI_FN_LAST: + case PSCI_0_2_FN_BASE...PSCI_0_2_FN_LAST: + case PSCI_0_2_FN64_BASE...PSCI_0_2_FN64_LAST: return kvm_psci_call(vcpu); + default: + return kvm_hvc_user(vcpu); }
out: diff --git a/include/kvm/arm_psci.h b/include/kvm/arm_psci.h index 6e55b9283789..7391fa51419a 100644 --- a/include/kvm/arm_psci.h +++ b/include/kvm/arm_psci.h @@ -17,6 +17,10 @@
#define KVM_ARM_PSCI_LATEST KVM_ARM_PSCI_1_1
+#define KVM_PSCI_FN_LAST KVM_PSCI_FN(3) +#define PSCI_0_2_FN_LAST PSCI_0_2_FN(0x3f) +#define PSCI_0_2_FN64_LAST PSCI_0_2_FN64(0x3f) + static inline int kvm_psci_version(struct kvm_vcpu *vcpu) { /* diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index eed0315a77a6..fd93cbc6f444 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1177,6 +1177,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_VM_DISABLE_NX_HUGE_PAGES 220 #define KVM_CAP_S390_ZPCI_OP 221 #define KVM_CAP_S390_CPU_TOPOLOGY 222 +#define KVM_CAP_ARM_HVC_TO_USER 223
#ifdef KVM_CAP_IRQ_ROUTING
From: Jean-Philippe Brucker jean-philippe@linaro.org
When the KVM_CAP_ARM_PSCI_TO_USER capability is available, userspace can request to handle PSCI calls.
SMCCC probe requires PSCI v1.x. If userspace only implements PSCI v0.2, the guest won't query SMCCC support through PSCI and won't use the spectre workarounds. We could hijack PSCI_VERSION and pretend to support v1.0 if userspace does not, then handle all v1.0 calls ourselves (including guessing the PSCI feature set implemented by the guest), but that seems unnecessary. After all the API already allows userspace to force a version lower than v1.0 using the firmware pseudo-registers.
The KVM_REG_ARM_PSCI_VERSION pseudo-register currently resets to either v0.1 if userspace doesn't set KVM_ARM_VCPU_PSCI_0_2, or KVM_ARM_PSCI_LATEST (1.0).
Suggested-by: James Morse james.morse@arm.com Signed-off-by: Jean-Philippe Brucker jean-philippe@linaro.org Signed-off-by: James Morse james.morse@arm.com --- Documentation/virt/kvm/api.rst | 14 ++++++++++++++ Documentation/virt/kvm/arm/hypercalls.rst | 1 + arch/arm64/include/asm/kvm_host.h | 1 + arch/arm64/kvm/arm.c | 10 +++++++--- arch/arm64/kvm/hypercalls.c | 2 +- arch/arm64/kvm/psci.c | 13 +++++++++++++ include/kvm/arm_hypercalls.h | 1 + include/uapi/linux/kvm.h | 1 + 8 files changed, 39 insertions(+), 4 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index ea995e00d314..72eaeb166e27 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -8316,6 +8316,20 @@ This capability indicates that KVM can pass unhandled hypercalls to userspace, if the VMM enables it. Hypercalls are passed with KVM_EXIT_HYPERCALL in kvm_run::hypercall.
+8.38 KVM_CAP_ARM_PSCI_TO_USER +----------------------------- + +:Architectures: arm64 + +When the VMM enables this capability, all PSCI calls are passed to userspace +instead of being handled by KVM. Capability KVM_CAP_ARM_HVC_TO_USER must be +enabled first. + +Userspace should support at least PSCI v1.0. Otherwise SMCCC features won't be +available to the guest. Userspace does not need to handle the SMCCC_VERSION +parameter for the PSCI_FEATURES function. The KVM_ARM_VCPU_PSCI_0_2 vCPU +feature should be set even if this capability is enabled. + 9. Known KVM API problems =========================
diff --git a/Documentation/virt/kvm/arm/hypercalls.rst b/Documentation/virt/kvm/arm/hypercalls.rst index 3e23084644ba..4c111afa7d74 100644 --- a/Documentation/virt/kvm/arm/hypercalls.rst +++ b/Documentation/virt/kvm/arm/hypercalls.rst @@ -34,6 +34,7 @@ The following registers are defined: - Allows any PSCI version implemented by KVM and compatible with v0.2 to be set with SET_ONE_REG - Affects the whole VM (even if the register view is per-vcpu) + - Defaults to PSCI 1.0 if userspace enables KVM_CAP_ARM_PSCI_TO_USER.
* KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1: Holds the state of the firmware support to mitigate CVE-2017-5715, as diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 8d69a196e241..933be88e6f06 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -150,6 +150,7 @@ struct kvm_arch { /* PSCI SYSTEM_SUSPEND enabled for the guest */ #define KVM_ARCH_FLAG_SYSTEM_SUSPEND_ENABLED 5 #define KVM_ARCH_FLAG_HVC_TO_USER 6 +#define KVM_ARCH_FLAG_PSCI_TO_USER 7
unsigned long flags;
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index ac76ab053478..b031723fdc81 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -76,7 +76,7 @@ int kvm_arch_check_processor_compat(void *opaque) int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap) { - int r; + int r = -EINVAL;
if (cap->flags) return -EINVAL; @@ -105,8 +105,11 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, r = 0; set_bit(KVM_ARCH_FLAG_HVC_TO_USER, &kvm->arch.flags); break; - default: - r = -EINVAL; + case KVM_CAP_ARM_PSCI_TO_USER: + if (test_bit(KVM_ARCH_FLAG_HVC_TO_USER, &kvm->arch.flags)) { + r = 0; + set_bit(KVM_ARCH_FLAG_PSCI_TO_USER, &kvm->arch.flags); + } break; }
@@ -223,6 +226,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_PTP_KVM: case KVM_CAP_ARM_SYSTEM_SUSPEND: case KVM_CAP_ARM_HVC_TO_USER: + case KVM_CAP_ARM_PSCI_TO_USER: r = 1; break; case KVM_CAP_SET_GUEST_DEBUG2: diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c index 61ce10fdad40..fcab358de228 100644 --- a/arch/arm64/kvm/hypercalls.c +++ b/arch/arm64/kvm/hypercalls.c @@ -121,7 +121,7 @@ static bool kvm_hvc_call_allowed(struct kvm_vcpu *vcpu, u32 func_id) } }
-static int kvm_hvc_user(struct kvm_vcpu *vcpu) +int kvm_hvc_user(struct kvm_vcpu *vcpu) { int i; struct kvm_run *run = vcpu->run; diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c index 7fbc4c1b9df0..8505b26f0a83 100644 --- a/arch/arm64/kvm/psci.c +++ b/arch/arm64/kvm/psci.c @@ -418,6 +418,16 @@ static int kvm_psci_0_1_call(struct kvm_vcpu *vcpu) return 1; }
+static bool kvm_psci_call_is_user(struct kvm_vcpu *vcpu) +{ + /* Handle the special case of SMCCC probe through PSCI */ + if (smccc_get_function(vcpu) == PSCI_1_0_FN_PSCI_FEATURES && + smccc_get_arg1(vcpu) == ARM_SMCCC_VERSION_FUNC_ID) + return false; + + return test_bit(KVM_ARCH_FLAG_PSCI_TO_USER, &vcpu->kvm->arch.flags); +} + /** * kvm_psci_call - handle PSCI call if r0 value is in range * @vcpu: Pointer to the VCPU struct @@ -443,6 +453,9 @@ int kvm_psci_call(struct kvm_vcpu *vcpu) return 1; }
+ if (kvm_psci_call_is_user(vcpu)) + return kvm_hvc_user(vcpu); + switch (kvm_psci_version(vcpu)) { case KVM_ARM_PSCI_1_1: return kvm_psci_1_x_call(vcpu, 1); diff --git a/include/kvm/arm_hypercalls.h b/include/kvm/arm_hypercalls.h index 1188f116cf4e..ea7073d1a82e 100644 --- a/include/kvm/arm_hypercalls.h +++ b/include/kvm/arm_hypercalls.h @@ -6,6 +6,7 @@
#include <asm/kvm_emulate.h>
+int kvm_hvc_user(struct kvm_vcpu *vcpu); int kvm_hvc_call_handler(struct kvm_vcpu *vcpu);
static inline u32 smccc_get_function(struct kvm_vcpu *vcpu) diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index fd93cbc6f444..75ef95ca6d04 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1178,6 +1178,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_S390_ZPCI_OP 221 #define KVM_CAP_S390_CPU_TOPOLOGY 222 #define KVM_CAP_ARM_HVC_TO_USER 223 +#define KVM_CAP_ARM_PSCI_TO_USER 224
#ifdef KVM_CAP_IRQ_ROUTING
Add a description of physical and virtual CPU hotplug, explain the differences and elaborate on what is required in ACPI for a working virtual hotplug system.
Signed-off-by: James Morse james.morse@arm.com --- Documentation/arm64/cpu-hotplug.rst | 79 +++++++++++++++++++++++++++++ Documentation/arm64/index.rst | 1 + 2 files changed, 80 insertions(+) create mode 100644 Documentation/arm64/cpu-hotplug.rst
diff --git a/Documentation/arm64/cpu-hotplug.rst b/Documentation/arm64/cpu-hotplug.rst new file mode 100644 index 000000000000..76ba8d932c72 --- /dev/null +++ b/Documentation/arm64/cpu-hotplug.rst @@ -0,0 +1,79 @@ +.. SPDX-License-Identifier: GPL-2.0 +.. _cpuhp_index: + +==================== +CPU Hotplug and ACPI +==================== + +CPU hotplug in the arm64 world is commonly used to describe the kernel taking +CPUs online/offline using PSCI. This document is about ACPI firmware allowing +CPUs that were not available during boot to be added to the system later. + +``possible`` and ``present`` refer to the state of the CPU as seen by linux. + + +CPU Hotplug on physical systems - CPUs not present at boot +---------------------------------------------------------- + +Physical systems need to mark a CPU that is ``possible`` but not ``present`` as +being ``present``. An example would be a dual socket machine, where the package +in one of the sockets can be replaced while the system is running. + +This is not supported. + +In the arm64 world CPUs are not a single device but a slice of the system. +There are no systems that support the physical addition (or removal) of CPUs +while the system is running, and ACPI is not able to sufficiently describe +them. + +e.g. New CPUs come with new caches, but the platform's cache toplogy is +described in a static table, the PPTT. How caches are shared between CPUs is +not discoverable, and must be described by firmware. + +e.g. The GIC redistributor for each CPU must be accessed by the driver during +boot to discover the system wide supported features. ACPI's MADT GICC +structures can describe a redistributor associated with a disabled CPU, but +can't describe whether the redistributor is accessible, only that it is not +'always on'. + +arm64's ACPI tables assume that everything described is ``present``. + + +CPU Hotplug on virtual systems - CPUs not enabled at boot +--------------------------------------------------------- + +Virtual systems have the advantage that all the properties the system will +ever have can be described at boot. There are no power-domain considerations +as such devices are emulated. + +CPU Hotplug on virtual systems is supported. It is distinct from physical +CPU Hotplug as all resources are described as ``present``, but CPUs may be +marked as disabled by firmware. Only the CPU's online/offline behaviour is +influenced by firmware. An example is where a virtual machine boots with a +single CPU, and additional CPUs are added once a cloud orchestrator deploys +the workload. + +For a virtual machine, the VMM (e.g. Qemu) plays the part of firmware. + +Virtual hotplug is implemented as a firmware policy affecting which CPUs can be +brought online. Firmware can enforce its policy via PSCI's return codes. e.g. +``DENIED``. + +The ACPI tables must describe all the resources of the virtual machine. CPUs +that firmware wishes to disable either from boot (or later) should not be +``enabled`` in the MADT GICC structures, but should have the ``online capable`` +bit set, to indicate they can be enabled later. The boot CPU must be marked as +``enabled``. The 'always on' GICR structure must be used to describe the +redistributors. + +CPUs described as ``online capable`` but not ``enabled`` can be set to enabled +by the DSDT's Processor object's _STA method. On virtual systems the _STA method +must always report the CPU as ``present``. Changes to the firmware policy can +be notified to the OS via device-check or eject-request. + +CPUs described as ``enabled`` in the static table, should not have their _STA +modified dynamically by firmware. Soft-restart features such as kexec will +re-read the static properties of the system from these static tables, and +may malfunction if these no longer describe the running system. Linux will +re-discover the dynamic properties of the system from the _STA method later +during boot. diff --git a/Documentation/arm64/index.rst b/Documentation/arm64/index.rst index ae21f8118830..54da62534871 100644 --- a/Documentation/arm64/index.rst +++ b/Documentation/arm64/index.rst @@ -13,6 +13,7 @@ ARM64 Architecture asymmetric-32bit booting cpu-feature-registers + cpu-hotplug elf_hwcaps hugetlbpage legacy_instructions
linaro-open-discussions@op-lists.linaro.org