Hi James,
From: James Morse [mailto:james.morse@arm.com] Sent: Friday, September 2, 2022 10:21 AM To: Jonathan Cameron jonathan.cameron@huawei.com; Salil Mehta salil.mehta@huawei.com Cc: linaro-open-discussions@op-lists.linaro.org; lorenzo.pieralisi@linaro.org; Jean-Philippe Brucker jean-philippe@linaro.org; mehta.salil.lnk@gmail.com Subject: Re: [RFC PATCH v0.1 20/25] irqchip/gic-v3: Add support for ACPI's disabled but 'online capable' CPUs
Hi Salil,
On 9/2/22 10:12, Jonathan Cameron wrote:
On Thu, 1 Sep 2022 21:18:48 +0100 Salil Mehta salil.mehta@huawei.com wrote:
To support virtual CPU hotplug, ACPI has added an 'online capable' bit to the MADT GICC entries. This indicates a disabled CPU entry may not be possible to online via PSCI until firmware has set enabled bit in _STA.
What about the redistributor in the GICC entry? ACPI doesn't want to say. Assume the worst: When a redistributor is described in the GICC entry, but the entry is marked as disabled at boot, assume the redistributor is inaccessible.
The GICv3 driver doesn't support late online of redistributors, so this means the corresponding CPU can't be brought online either. Clear the possible and present bits.
Systems that want CPU hotplug in a VM can ensure their redistributors are always-on, and describe them that way with a GICR entry in the MADT.
When mapping redistributors found via GICC entries, handle the case where the arch code believes the CPU is present and possible, but it does not have an accessible redistributor. Print a warning and clear the present and possible bits.
diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 358d9b971de8..81f5df4a536a 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -269,7 +269,8 @@ void acpi_table_print_madt_entry (struct acpi_subtable_header *madt);
static inline bool acpi_gicc_is_usable(struct
acpi_madt_generic_interrupt
*gicc) {
- return (gicc->flags & ACPI_MADT_ENABLED);
- return ((gicc->flags & ACPI_MADT_ENABLED ||
gicc->flags & ACPI_MADT_GICC_CPU_CAPABLE));
This does not looks right to me.
As per the ACPI specification 6.5 Draft-12Aug2022" "Table 5.37: GICC CPU Interface Flags", below are the possible combinations of the existing "Enabled" Bit with *new* "online-capable" Bit:
Note the ACPI 6.5 specification is public: https://uefi.org/specifications so let's refer to that.
Table 5.37: GICC CPU Interface Flags +---------+---------------------------------------+ | | Enabled Bit(0) | | +-------+---------------+---------------+ | | Bool | False | True | +---------+-------+---------------+---------------+ | | | | | | | False | | | | | | CPU is | CPU is | | | | Not Usable | Ready to Use | | Online | | | | | Capable +-------+---------------+---------------+ | Bit(3) | | | | | | | CPU is | *Invalid* | | | True | Online-Capable| Combination | | | | | | | | | | | +---------+-------+---------------+---------------+
Description:
Enabled Bit(0): If this bit is set, the processor is ready for use. If this bit is clear and the Online Capable bit is set, the system supports enabling this processor during OS runtime. If this bit is clear and the Online Capable bit is also clear, this processor is unusable, and the operating system support will not attempt to use it.
Online-capable Bit(3): The information conveyed by this bit depends on the value of the Enabled bit. If the Enabled bit is set, this bit is reserved and must be zero. Otherwise, if this bit is set, the system supports enabling this processor later during OS runtime.
Above check will return "cpu is usable" even for Enabled=true && online-capable=true (INVALID Case)?
Yes. Why should linux sanity check the ACPI tables?
To ensure that other two fields which Linux wants to use are correct, which partly above code is already doing but there is an ambiguity.
In this case, trying to do this just results in harder to read code. What is the benefit?
This code is common to all ARM64 based platforms but BIOS are specific to platforms and sometimes can be buggy. This little extra piece of logic can really save lots of hours of that poor fellow working onsite who is trying to figure out why things are not working when there is no error being purged out in UEFI/Linux at bootup.
Usually, everything we do should be customer driven and should facilitate ease of doing business(here, deployment of particular software using vCPU Hotplug say microvms?)
In my humble opinion, above reason will always outweigh maintainability, though I agree that there is a balance to that as well. But then later is not exactly true in this particular case since amount of code being added is very trivial but has huge end user gains.
I very much doubt that combination can be used for anything that implies the CPU can't be used.
This is the precise ambiguity we can clear if we can put a harmless print to assist the code reviewer and the end user?
Thanks Salil.