When a user calls the read/write system call and passes a pipe
descriptor, the pipe_read/pipe_write functions are invoked:
1. pipe_read():
1). Checks if the pipe is valid and if there is any data in the
pipe buffer.
2). Waits for data:
*If there is no data in the pipe and the write end is still open,
the current process enters a sleep state (wait_event()) until data
is written.
*If the write end is closed, return 0.
3). Reads data:
*Wakes up the process and copies data from the pipe's memory
buffer to user space.
*When the buffer is full, the writing process will go to sleep,
waiting for the pipe state to change to be awakened (using the
wake_up_interruptible_sync_poll() mechanism). Once data is read
from the buffer, the writing process can continue writing, and the
reading process can continue reading new data.
4). Returns the number of bytes read upon successful read.
2. pipe_write():
1). Checks if the pipe is valid and if there is any available
space in the pipe buffer.
2). Waits for buffer space:
*If the pipe buffer is full and the reading process has not
read any data, pipe_write() may put the current process to sleep
until there is space in the buffer.
*If the read end of the pipe is closed (no process is waiting
to read), an error code -EPIPE is returned, and a SIGPIPE signal may
be sent to the process.
3). Writes data:
*If there is enough space in the pipe buffer, pipe_write() copies
data from the user space buffer to the kernel buffer of the pipe
(using copy_from_user()).
*If the amount of data the user requests to write is larger than
the available space in the buffer, multiple writes may be required,
or the process may wait for new space to be freed.
4). Wakes up waiting reading processes:
*After the data is successfully written, pipe_write() wakes up
any processes that may be waiting to read data (using the
wake_up_interruptible_sync_poll() mechanism).
5). Returns the number of bytes successfully written.
Check if there are any waiting processes in the process wait queue
by introducing wq_has_sleeper() when waking up processes for pipe
read/write operations.
If no processes are waiting, there's no need to execute
wake_up_interruptible_sync_poll(), thus avoiding unnecessary wake-ups.
Unnecessary wake-ups can lead to context switches, where a process
is woken up to handle I/O events even when there is no immediate
need.
Only wake up processes when there are actually waiting processes to
reduce context switches and system overhead by checking
with wq_has_sleeper().
Additionally, by reducing unnecessary synchronization and wake-up
operations, wq_has_sleeper() can decrease system resource waste and
lock contention, improving overall system performance.
For pipe read/write operations, this eliminates ineffective scheduling
and enhances concurrency.
It's important to note that enabling this option means invoking
wq_has_sleeper() to check for sleeping processes in the wait queue
for every read or write operation.
While this is a lightweight operation, it still incurs some overhead.
In low-load or single-task scenarios, this overhead may not yield
significant benefits and could even introduce minor performance
degradation.
UnixBench Pipe benchmark results on Zhaoxin KX-U6780A processor:
With the option disabled: Single-core: 841.8, Multi-core (8): 4621.6
With the option enabled: Single-core: 877.8, Multi-core (8): 4854.7
Single-core performance improved by 4.1%, multi-core performance
improved by 4.8%.
Co-developed-by: Shengjin Yu <yushengjin(a)uniontech.com>
Signed-off-by: Shengjin Yu <yushengjin(a)uniontech.com>
Co-developed-by: Dandan Zhang <zhangdandan(a)uniontech.com>
Signed-off-by: Dandan Zhang <zhangdandan(a)uniontech.com>
Tested-by: Dandan Zhang <zhangdandan(a)uniontech.com>
Signed-off-by: WangYuli <wangyuli(a)uniontech.com>
---
fs/Kconfig | 13 +++++++++++++
fs/pipe.c | 21 +++++++++++++++------
2 files changed, 28 insertions(+), 6 deletions(-)
diff --git a/fs/Kconfig b/fs/Kconfig
index 64d420e3c475..0dacc46a73fe 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -429,4 +429,17 @@ source "fs/unicode/Kconfig"
config IO_WQ
bool
+config PIPE_SKIP_SLEEPER
+ bool "Skip sleeping processes during pipe read/write"
+ default n
+ help
+ This option introduces a check whether the sleep queue will
+ be awakened during pipe read/write.
+
+ It often leads to a performance improvement. However, in
+ low-load or single-task scenarios, it may introduce minor
+ performance overhead.
+
+ If unsure, say N.
+
endmenu
diff --git a/fs/pipe.c b/fs/pipe.c
index 12b22c2723b7..c085333ae72c 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -247,6 +247,15 @@ static inline unsigned int pipe_update_tail(struct pipe_inode_info *pipe,
return tail;
}
+static inline bool
+pipe_check_wq_has_sleeper(struct wait_queue_head *wq_head)
+{
+ if (IS_ENABLED(CONFIG_PIPE_SKIP_SLEEPER))
+ return wq_has_sleeper(wq_head);
+ else
+ return true;
+}
+
static ssize_t
pipe_read(struct kiocb *iocb, struct iov_iter *to)
{
@@ -377,7 +386,7 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)
* _very_ unlikely case that the pipe was full, but we got
* no data.
*/
- if (unlikely(was_full))
+ if (unlikely(was_full) && pipe_check_wq_has_sleeper(&pipe->wr_wait))
wake_up_interruptible_sync_poll(&pipe->wr_wait, EPOLLOUT | EPOLLWRNORM);
kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT);
@@ -398,9 +407,9 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)
wake_next_reader = false;
mutex_unlock(&pipe->mutex);
- if (was_full)
+ if (was_full && pipe_check_wq_has_sleeper(&pipe->wr_wait))
wake_up_interruptible_sync_poll(&pipe->wr_wait, EPOLLOUT | EPOLLWRNORM);
- if (wake_next_reader)
+ if (wake_next_reader && pipe_check_wq_has_sleeper(&pipe->rd_wait))
wake_up_interruptible_sync_poll(&pipe->rd_wait, EPOLLIN | EPOLLRDNORM);
kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT);
if (ret > 0)
@@ -573,7 +582,7 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from)
* become empty while we dropped the lock.
*/
mutex_unlock(&pipe->mutex);
- if (was_empty)
+ if (was_empty && pipe_check_wq_has_sleeper(&pipe->rd_wait))
wake_up_interruptible_sync_poll(&pipe->rd_wait, EPOLLIN | EPOLLRDNORM);
kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
wait_event_interruptible_exclusive(pipe->wr_wait, pipe_writable(pipe));
@@ -598,10 +607,10 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from)
* Epoll nonsensically wants a wakeup whether the pipe
* was already empty or not.
*/
- if (was_empty || pipe->poll_usage)
+ if ((was_empty || pipe->poll_usage) && pipe_check_wq_has_sleeper(&pipe->rd_wait))
wake_up_interruptible_sync_poll(&pipe->rd_wait, EPOLLIN | EPOLLRDNORM);
kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
- if (wake_next_writer)
+ if (wake_next_writer && pipe_check_wq_has_sleeper(&pipe->wr_wait))
wake_up_interruptible_sync_poll(&pipe->wr_wait, EPOLLOUT | EPOLLWRNORM);
if (ret > 0 && sb_start_write_trylock(file_inode(filp)->i_sb)) {
int err = file_update_time(filp);
--
2.45.2
Hi Kevin,
Sorry I know I said that I would get a spare moment to do this work over
a month ago, but the University of Manchester's elves are very busy this
time of year! Here are the changes you requested...
Please let me know if there are issues building. I still don't much
understand why that last series had a problem.
Changes from v3:
- [XX/05] Modified the use of __nf_kptr_t in the xtables plugin structs
to use a union, with the original struct as a member. This trick
allows for removal of the heavy casting in the kernel which was
required in the earlier version.
- [XX/05] Squashed many of the commits (those from the xtables plugin
header files) into a single commit, since each individual commit now
makes far fewer changes.
Testing:
- Tested with purecap iptables tests (nftables only), passing 65/68
tests. Those which fail are expected to fail at this point, due
primarily to improperly written test cases, or missing versions of
userspace tooling.
Joshua Lant (5):
netfilter: Create new type for kernel pointers.
x_tables.h: pointers to unions in uapi struct
xt plugins: pointers to unions in uapi struct
ebtables: pointers to unions in uapi struct
xtables: move include to headers
include/linux/netfilter.h | 6 +++++
include/uapi/linux/netfilter.h | 8 ++++++
include/uapi/linux/netfilter/x_tables.h | 18 +++++++++++--
include/uapi/linux/netfilter/xt_CT.h | 10 +++++--
include/uapi/linux/netfilter/xt_IDLETIMER.h | 12 +++++++--
include/uapi/linux/netfilter/xt_RATEEST.h | 6 ++++-
include/uapi/linux/netfilter/xt_TEE.h | 6 ++++-
include/uapi/linux/netfilter/xt_bpf.h | 13 +++++++--
include/uapi/linux/netfilter/xt_connlimit.h | 6 ++++-
include/uapi/linux/netfilter/xt_hashlimit.h | 24 ++++++++++++++---
include/uapi/linux/netfilter/xt_limit.h | 6 ++++-
include/uapi/linux/netfilter/xt_nfacct.h | 12 +++++++--
include/uapi/linux/netfilter/xt_quota.h | 6 ++++-
include/uapi/linux/netfilter/xt_rateest.h | 9 +++++--
include/uapi/linux/netfilter/xt_statistic.h | 7 ++++-
.../uapi/linux/netfilter_bridge/ebtables.h | 27 +++++++++++++++----
net/netfilter/xt_bpf.c | 1 -
net/netfilter/xt_statistic.c | 1 -
18 files changed, 149 insertions(+), 29 deletions(-)
--
2.34.1