() b On Thu, Aug 25, 2022 at 3:44 PM Harald Mommer harald.mommer@opensynergy.com wrote:
CAN Control
- "ip link set up can0" starts the virtual CAN controller,
- "ip link set up can0" stops the virtual CAN controller
CAN RX
Receive CAN frames. CAN frames can be standard or extended, classic or CAN FD. Classic CAN RTR frames are supported.
CAN TX
Send CAN frames. CAN frames can be standard or extended, classic or CAN FD. Classic CAN RTR frames are supported.
CAN Event indication (BusOff)
The bus off handling is considered code complete but until now bus off handling is largely untested.
Signed-off-by: Harald Mommer hmo@opensynergy.com
This looks nice overall, but as you say there is still some work needed in all the details. I've done a rough first pass at reviewing it, but I have no specific understanding of CAN, so these are mostly generic comments about coding style or network drivers.
drivers/net/can/Kconfig | 1 + drivers/net/can/Makefile | 1 + drivers/net/can/virtio_can/Kconfig | 12 + drivers/net/can/virtio_can/Makefile | 5 + drivers/net/can/virtio_can/virtio_can.c | 1176 +++++++++++++++++++++++ include/uapi/linux/virtio_can.h | 69 ++
Since the driver is just one file, you probably don't need the subdirectory.
+struct virtio_can_tx {
struct list_head list;
int prio; /* Currently always 0 "normal priority" */
int putidx;
struct virtio_can_tx_out tx_out;
struct virtio_can_tx_in tx_in;
+};
Having a linked list of these appears to add a little extra complexity. If they are always processed in sequence, using an array would be much simpler, as you just need to remember the index.
+#ifdef DEBUG +static void __attribute__((unused)) +virtio_can_hexdump(const void *data, size_t length, size_t base) +{ +#define VIRTIO_CAN_MAX_BYTES_PER_LINE 16u
This seems to duplicate print_hex_dump(), maybe just use that?
while (!virtqueue_get_buf(vq, &len) && !virtqueue_is_broken(vq))
cpu_relax();
mutex_unlock(&priv->ctrl_lock);
A busy loop is probably not what you want here. Maybe just wait_for_completion() until the callback happens?
/* Push loopback echo. Will be looped back on TX interrupt/TX NAPI */
can_put_echo_skb(skb, dev, can_tx_msg->putidx, 0);
err = virtqueue_add_sgs(vq, sgs, 1u, 1u, can_tx_msg, GFP_ATOMIC);
if (err != 0) {
list_del(&can_tx_msg->list);
virtio_can_free_tx_idx(priv, can_tx_msg->prio,
can_tx_msg->putidx);
netif_stop_queue(dev);
spin_unlock_irqrestore(&priv->tx_lock, flags);
kfree(can_tx_msg);
if (err == -ENOSPC)
netdev_info(dev, "TX: Stop queue, no space left\n");
else
netdev_warn(dev, "TX: Stop queue, reason = %d\n", err);
return NETDEV_TX_BUSY;
}
if (!virtqueue_kick(vq))
netdev_err(dev, "%s(): Kick failed\n", __func__);
spin_unlock_irqrestore(&priv->tx_lock, flags);
There should not be a need for a spinlock or disabling interrupts in the xmit function. What exactly are you protecting against here?
As a further optimization, you may want to use the xmit_more() function, as the virtqueue kick is fairly expensive and can be batched here.
kfree(can_tx_msg);
/* Flow control */
if (netif_queue_stopped(dev)) {
netdev_info(dev, "TX ACK: Wake up stopped queue\n");
netif_wake_queue(dev);
}
You may want to add netdev_sent_queue()/netdev_completed_queue() based BQL flow control here as well, so you don't have to rely on the queue filling up completely.
+static int virtio_can_probe(struct virtio_device *vdev) +{
struct net_device *dev;
struct virtio_can_priv *priv;
int err;
unsigned int echo_skb_max;
unsigned int idx;
u16 lo_tx = VIRTIO_CAN_ECHO_SKB_MAX;
BUG_ON(!vdev);
Not a useful debug check, just remove the BUG_ON(!vdev), here and elsewhere
echo_skb_max = lo_tx;
dev = alloc_candev(sizeof(struct virtio_can_priv), echo_skb_max);
if (!dev)
return -ENOMEM;
priv = netdev_priv(dev);
dev_info(&vdev->dev, "echo_skb_max = %u\n", priv->can.echo_skb_max);
Also remove the prints, I assume this is left over from initial debugging.
priv->can.do_set_mode = virtio_can_set_mode;
priv->can.state = CAN_STATE_STOPPED;
/* Set Virtio CAN supported operations */
priv->can.ctrlmode_supported = CAN_CTRLMODE_BERR_REPORTING;
if (virtio_has_feature(vdev, VIRTIO_CAN_F_CAN_FD)) {
dev_info(&vdev->dev, "CAN FD is supported\n");
} else {
dev_info(&vdev->dev, "CAN FD not supported\n");
}
Same here. There should be a way to see CAN FD support as an interactive user, but there is no point printing it to the console.
register_virtio_can_dev(dev);
/* Initialize virtqueues */
err = virtio_can_find_vqs(priv);
if (err != 0)
goto on_failure;
Should the register_virtio_can_dev() be done here? I would expect this to be the last thing after setting up the queues.
+static struct virtio_driver virtio_can_driver = {
.feature_table = features,
.feature_table_size = ARRAY_SIZE(features),
.feature_table_legacy = NULL,
.feature_table_size_legacy = 0u,
.driver.name = KBUILD_MODNAME,
.driver.owner = THIS_MODULE,
.id_table = virtio_can_id_table,
.validate = virtio_can_validate,
.probe = virtio_can_probe,
.remove = virtio_can_remove,
.config_changed = NULL,
+#ifdef CONFIG_PM_SLEEP
.freeze = virtio_can_freeze,
.restore = virtio_can_restore,
+#endif
You can remove the #ifdef here and above, and replace that with the pm_sleep_ptr() macro in the assignment.
diff --git a/include/uapi/linux/virtio_can.h b/include/uapi/linux/virtio_can.h new file mode 100644 index 000000000000..0ca75c7a98ee --- /dev/null +++ b/include/uapi/linux/virtio_can.h @@ -0,0 +1,69 @@ +/* SPDX-License-Identifier: BSD-3-Clause */ +/*
- Copyright (C) 2021 OpenSynergy GmbH
- */
+#ifndef _LINUX_VIRTIO_VIRTIO_CAN_H +#define _LINUX_VIRTIO_VIRTIO_CAN_H
+#include <linux/types.h> +#include <linux/virtio_types.h> +#include <linux/virtio_ids.h> +#include <linux/virtio_config.h>
Maybe a link to the specification here? I assume the definitions in this file are all lifted from that document, rather than specific to the driver, right?
Arnd
Hello,
currently in the preparation that changed code can go out to the list.
On 25.08.22 20:21, Arnd Bergmann wrote:
drivers/net/can/Kconfig | 1 + drivers/net/can/Makefile | 1 + drivers/net/can/virtio_can/Kconfig | 12 + drivers/net/can/virtio_can/Makefile | 5 + drivers/net/can/virtio_can/virtio_can.c | 1176 +++++++++++++++++++++++ include/uapi/linux/virtio_can.h | 69 ++
Since the driver is just one file, you probably don't need the subdirectory.
Easy to do, makes the changes smaller.
+struct virtio_can_tx {
struct list_head list;
int prio; /* Currently always 0 "normal priority" */
int putidx;
struct virtio_can_tx_out tx_out;
struct virtio_can_tx_in tx_in;
+};
Having a linked list of these appears to add a little extra complexity. If they are always processed in sequence, using an array would be much simpler, as you just need to remember the index.
The messages are not necessarily processed in sequence by the CAN stack. CAN is priority based. The lower the CAN ID the higher the priority. So a message with CAN ID 0x100 can surpass a message with ID 0x123 if the hardware is not just simple basic CAN controller using a single TX mailbox with a FIFO queue on top of it.
Thinking about this the code becomes more complex with the array. What I get from the device when the message has been processed is a pointer to the processed message by virtqueue_get_buf(). I can then simply do a list_del(), free the message and done.
+#ifdef DEBUG +static void __attribute__((unused)) +virtio_can_hexdump(const void *data, size_t length, size_t base) +{ +#define VIRTIO_CAN_MAX_BYTES_PER_LINE 16u
This seems to duplicate print_hex_dump(), maybe just use that?
Checked where it's still used. The code is not disabled by #ifdef DEBUG but simply commented out. Under this circumstances it's for now best to simply remove the code now and also the commented out places where is was used at some time in the past.
while (!virtqueue_get_buf(vq, &len) && !virtqueue_is_broken(vq))
cpu_relax();
mutex_unlock(&priv->ctrl_lock);
A busy loop is probably not what you want here. Maybe just wait_for_completion() until the callback happens?
Was done in the same way as elsewhere (virtio_console.c/__send_control_msg() & virtio_net.c/virtnet_send_command()). Yes, wait_for_completion() is better, this avoids polling.
/* Push loopback echo. Will be looped back on TX interrupt/TX NAPI */
can_put_echo_skb(skb, dev, can_tx_msg->putidx, 0);
err = virtqueue_add_sgs(vq, sgs, 1u, 1u, can_tx_msg, GFP_ATOMIC);
if (err != 0) {
list_del(&can_tx_msg->list);
virtio_can_free_tx_idx(priv, can_tx_msg->prio,
can_tx_msg->putidx);
netif_stop_queue(dev);
spin_unlock_irqrestore(&priv->tx_lock, flags);
kfree(can_tx_msg);
if (err == -ENOSPC)
netdev_info(dev, "TX: Stop queue, no space left\n");
else
netdev_warn(dev, "TX: Stop queue, reason = %d\n", err);
return NETDEV_TX_BUSY;
}
if (!virtqueue_kick(vq))
netdev_err(dev, "%s(): Kick failed\n", __func__);
spin_unlock_irqrestore(&priv->tx_lock, flags);
There should not be a need for a spinlock or disabling interrupts in the xmit function. What exactly are you protecting against here?
I'm using 2 NAPIs, one for TX and one for RX. The RX NAPI just receives RX messages and is of no interest here. The TX NAPI handles the TX messages which have been processed by the virtio CAN device in virtio_can_read_tx_queue(). If this was done without the TX NAPI this would have been done by the TX interrupt directly, no difference.
In virtio_can_start_xmit()
* Reserve putidx - done by an own mechanism using list operations in tx_putidx_list
Could be that it's simpler to use idr_alloc() and friends getting those numbers to get rid of this own mechanism, not sure yet. But this needs a locks as it's based on a linked list and the list operation has to be protected.
* Add the TX message to the pending list
Again a list operation which has to be protected.
* Try to send the message
Now it may happen that at the same time while we do something with the lists in virtio_can_start_xmit() the function virtio_can_read_tx_queue() is active accessing the same queue. Comment above virtqueue_add_sgs(): "Caller must ensure that we don't call this with other virtqueue operations at the same time (except when noted)."
Also tried, virtqueue_add_sgs() needs this lock.
* And then there is also a list operation on failure of the function
But the code needed to reworked to understand the necessity of each lock again.
As a further optimization, you may want to use the xmit_more() function, as the virtqueue kick is fairly expensive and can be batched here.
Looked elsewhere how it works and did.
kfree(can_tx_msg);
/* Flow control */
if (netif_queue_stopped(dev)) {
netdev_info(dev, "TX ACK: Wake up stopped queue\n");
netif_wake_queue(dev);
}
You may want to add netdev_sent_queue()/netdev_completed_queue() based BQL flow control here as well, so you don't have to rely on the queue filling up completely.
Not addressed, not yet completely understood.
+static int virtio_can_probe(struct virtio_device *vdev) +{
struct net_device *dev;
struct virtio_can_priv *priv;
int err;
unsigned int echo_skb_max;
unsigned int idx;
u16 lo_tx = VIRTIO_CAN_ECHO_SKB_MAX;
BUG_ON(!vdev);
Not a useful debug check, just remove the BUG_ON(!vdev), here and elsewhere
A lot of BUG_ON() were removed when not considered useful, some were reworked to contain better error handling code when this was possible, others were kept to ease further development. If anyone catches something would be seriously broken and had to be fixed in the code. But this then we want to know.
echo_skb_max = lo_tx;
dev = alloc_candev(sizeof(struct virtio_can_priv), echo_skb_max);
if (!dev)
return -ENOMEM;
priv = netdev_priv(dev);
dev_info(&vdev->dev, "echo_skb_max = %u\n", priv->can.echo_skb_max);
Also remove the prints, I assume this is left over from initial debugging.
Yes, this thing was overall too noisy.
register_virtio_can_dev(dev);
/* Initialize virtqueues */
err = virtio_can_find_vqs(priv);
if (err != 0)
goto on_failure;
Should the register_virtio_can_dev() be done here? I would expect this to be the last thing after setting up the queues.
Doing so makes the code somewhat simpler and shorter = better.
+static struct virtio_driver virtio_can_driver = {
.feature_table = features,
.feature_table_size = ARRAY_SIZE(features),
.feature_table_legacy = NULL,
.feature_table_size_legacy = 0u,
.driver.name = KBUILD_MODNAME,
.driver.owner = THIS_MODULE,
.id_table = virtio_can_id_table,
.validate = virtio_can_validate,
.probe = virtio_can_probe,
.remove = virtio_can_remove,
.config_changed = NULL,
+#ifdef CONFIG_PM_SLEEP
.freeze = virtio_can_freeze,
.restore = virtio_can_restore,
+#endif
You can remove the #ifdef here and above, and replace that with the pm_sleep_ptr() macro in the assignment.
This pm_sleep_ptr(_ptr) macro returns either the argument when CONFIG_PM_SLEEP is defined or NULL. But in struct virtio_driver there is
#ifdef CONFIG_PM int(*freeze) ...; int(*restore) ...; #endif
so without CONFIG_PM there are no freeze and restore structure members.
So
.freeze = pm_sleep_ptr(virtio_can_freeze)
won't work.
diff --git a/include/uapi/linux/virtio_can.h b/include/uapi/linux/virtio_can.h new file mode 100644 index 000000000000..0ca75c7a98ee --- /dev/null +++ b/include/uapi/linux/virtio_can.h @@ -0,0 +1,69 @@ +/* SPDX-License-Identifier: BSD-3-Clause */ +/*
- Copyright (C) 2021 OpenSynergy GmbH
- */
+#ifndef _LINUX_VIRTIO_VIRTIO_CAN_H +#define _LINUX_VIRTIO_VIRTIO_CAN_H
+#include <linux/types.h> +#include <linux/virtio_types.h> +#include <linux/virtio_ids.h> +#include <linux/virtio_config.h>
Maybe a link to the specification here? I assume the definitions in this file are all lifted from that document, rather than specific to the driver, right?
Arnd
The driver as made in parallel to the specification work. So there is no finished specification yet. To avoid traffic (mistake here) I've not sent the patch to the specification to all mailing lists.
Patch to the virtio specification is now here:
https://lore.kernel.org/all/20220825133410.18367-1-harald.mommer@opensynergy...
This was made on top of https://github.com/oasis-tcs/virtio-spec.git commit 26ed30ccb049
Harald
As mentioned, the updated code will be sent out to the list(s) soon.
On Thu, Nov 3, 2022, at 13:26, Harald Mommer wrote:
On 25.08.22 20:21, Arnd Bergmann wrote:
...
The messages are not necessarily processed in sequence by the CAN stack. CAN is priority based. The lower the CAN ID the higher the priority. So a message with CAN ID 0x100 can surpass a message with ID 0x123 if the hardware is not just simple basic CAN controller using a single TX mailbox with a FIFO queue on top of it.
Thinking about this the code becomes more complex with the array. What I get from the device when the message has been processed is a pointer to the processed message by virtqueue_get_buf(). I can then simply do a list_del(), free the message and done.
Ok
+#ifdef DEBUG +static void __attribute__((unused)) +virtio_can_hexdump(const void *data, size_t length, size_t base) +{ +#define VIRTIO_CAN_MAX_BYTES_PER_LINE 16u
This seems to duplicate print_hex_dump(), maybe just use that?
Checked where it's still used. The code is not disabled by #ifdef DEBUG but simply commented out. Under this circumstances it's for now best to simply remove the code now and also the commented out places where is was used at some time in the past.
Even better.
while (!virtqueue_get_buf(vq, &len) && !virtqueue_is_broken(vq))
cpu_relax();
mutex_unlock(&priv->ctrl_lock);
A busy loop is probably not what you want here. Maybe just wait_for_completion() until the callback happens?
Was done in the same way as elsewhere (virtio_console.c/__send_control_msg() & virtio_net.c/virtnet_send_command()). Yes, wait_for_completion() is better, this avoids polling.
Ok. FWIW, The others seem to do it like this because they are in non-atomic context where it is not allowed to call wait_for_completion(), but since you already have the mutex here, you know that sleeping is permitted.
/* Push loopback echo. Will be looped back on TX interrupt/TX NAPI */
can_put_echo_skb(skb, dev, can_tx_msg->putidx, 0);
err = virtqueue_add_sgs(vq, sgs, 1u, 1u, can_tx_msg, GFP_ATOMIC);
if (err != 0) {
list_del(&can_tx_msg->list);
virtio_can_free_tx_idx(priv, can_tx_msg->prio,
can_tx_msg->putidx);
netif_stop_queue(dev);
spin_unlock_irqrestore(&priv->tx_lock, flags);
kfree(can_tx_msg);
if (err == -ENOSPC)
netdev_info(dev, "TX: Stop queue, no space left\n");
else
netdev_warn(dev, "TX: Stop queue, reason = %d\n", err);
return NETDEV_TX_BUSY;
}
if (!virtqueue_kick(vq))
netdev_err(dev, "%s(): Kick failed\n", __func__);
spin_unlock_irqrestore(&priv->tx_lock, flags);
There should not be a need for a spinlock or disabling interrupts in the xmit function. What exactly are you protecting against here?
I'm using 2 NAPIs, one for TX and one for RX. The RX NAPI just receives RX messages and is of no interest here. The TX NAPI handles the TX messages which have been processed by the virtio CAN device in virtio_can_read_tx_queue(). If this was done without the TX NAPI this would have been done by the TX interrupt directly, no difference.
In virtio_can_start_xmit()
- Reserve putidx - done by an own mechanism using list operations in
tx_putidx_list
Could be that it's simpler to use idr_alloc() and friends getting those numbers to get rid of this own mechanism, not sure yet. But this needs a locks as it's based on a linked list and the list operation has to be protected.
Right, makes sense. Lockless transmission should generally work if your transmission queue is a strictly ordered ring buffer where you just need to atomically update the index, but you are right that this doesn't work with a linked list.
This probably directly ties into the specification of your tx virtqueue: if the device could guarantee that any descriptors are processed in sequence, you could avoid the spinlock in the tx path for a small performance optimization, but then you have to decide on the sequence in the driver already, which impacts the latency for high-priority frames that get queued to the device. It's possible that the reordering in the device would not be as critical if you correctly implement the byte queue limits.
kfree(can_tx_msg);
/* Flow control */
if (netif_queue_stopped(dev)) {
netdev_info(dev, "TX ACK: Wake up stopped queue\n");
netif_wake_queue(dev);
}
You may want to add netdev_sent_queue()/netdev_completed_queue() based BQL flow control here as well, so you don't have to rely on the queue filling up completely.
Not addressed, not yet completely understood.
https://lwn.net/Articles/454390/ is an older article but should still explain the background. Without byte queue limits, you risk introducing unbounded TX latency on a congested interface.
Ideally, the host device implementation should only send back the 'completed' interrupt after a frame has left the physical hardware. In this case, BQL will manage both the TX queue in the guest driver and the queue in the host device to keep the total queue length short enough to guarantee low latency even for low-priority frames but long enough to maintain wire-speed throughput.
register_virtio_can_dev(dev);
/* Initialize virtqueues */
err = virtio_can_find_vqs(priv);
if (err != 0)
goto on_failure;
Should the register_virtio_can_dev() be done here? I would expect this to be the last thing after setting up the queues.
Doing so makes the code somewhat simpler and shorter = better.
The problem is that as soon as an interface is registered, you can have userspace sending data to it. This may be a short race, but I fear that this would cause data corruption if data gets queued before the device is fully operational.
+#ifdef CONFIG_PM_SLEEP
.freeze = virtio_can_freeze,
.restore = virtio_can_restore,
+#endif
You can remove the #ifdef here and above, and replace that with the pm_sleep_ptr() macro in the assignment.
This pm_sleep_ptr(_ptr) macro returns either the argument when CONFIG_PM_SLEEP is defined or NULL. But in struct virtio_driver there is
#ifdef CONFIG_PM int(*freeze) ...; int(*restore) ...; #endif
so without CONFIG_PM there are no freeze and restore structure members.
So
.freeze = pm_sleep_ptr(virtio_can_freeze)
won't work.
I think this is a mistake in the virtio_driver definition, it would be best to send a small patch that removes this #ifdef along with your driver.
Arnd
On Fry. 4 nov. 2022 at 20:13, Arnd Bergmann arnd@kernel.org wrote:
On Thu, Nov 3, 2022, at 13:26, Harald Mommer wrote:
On 25.08.22 20:21, Arnd Bergmann wrote:
...
The messages are not necessarily processed in sequence by the CAN stack. CAN is priority based. The lower the CAN ID the higher the priority. So a message with CAN ID 0x100 can surpass a message with ID 0x123 if the hardware is not just simple basic CAN controller using a single TX mailbox with a FIFO queue on top of it.
Really? I acknowledge that it is priority based *on the bus*, i.e. if two devices A and B on the same bus try to send CAN ID 0x100 and 0x123 at the same time, then device A will win the CAN arbitration. However, I am not aware of any devices which reorder their own stack according to the CAN IDs. If I first send CAN ID 0x123 and then ID 0x100 on the device stack, 0x123 would still go out first, right?
Thinking about this the code becomes more complex with the array. What I get from the device when the message has been processed is a pointer to the processed message by virtqueue_get_buf(). I can then simply do a list_del(), free the message and done.
Ok
stratos-dev@op-lists.linaro.org