New subject: [RESEND PATCH] fs/pipe: Introduce a check to skip sleeping processes during pipe read/write

28 Dec 2024

      Hi,
(I had to remove many cc, my mail server rejected to send)
On 12/26/24 9:11 PM, Oleg Nesterov wrote:
...
I mostly agree, see my reply to this patch, but...
On 12/26, Linus Torvalds wrote:
...
If the optimization is correct, there is no point to having a config option.
If the optimization is incorrect, there is no point to having the code.
Either way, there's no way we'd ever have a config option for this.
Agreed,
...
...

  if (was_full && pipe_check_wq_has_sleeper(&pipe->wr_wait))
           wake_up_interruptible_sync_poll(&pipe->wr_wait, EPOLLOUT | EPOLLWRNORM);

End result: you need to explain why the race cannot exist.
Agreed,
...
And I think your patch is simply buggy
Agreed again, but probably my reasoning was wrong,
...
But now waiters use "wait_event_interruptible_exclusive()" explicitly
outside the pipe mutex, so the waiters and wakers aren't actually
serialized.
This is what I don't understand... Could you spell ?
I _think_ that
wait_event_whatever(WQ, CONDITION);
vs
CONDITION = 1;
   if (wq_has_sleeper(WQ))
   	wake_up_xxx(WQ, ...);

is fine.
This pattern is documented in wait.h:
https://elixir.bootlin.com/linux/v6.12.6/source/include/linux/wait.h#L96
Thus if there an issue, then the documentation should be updated.
...
Both wq_has_sleeper() and wait_event_whatever()->prepare_to_wait_event()
have the necessary barriers to serialize the waiters and wakers.
Damn I am sure I missed something ;)
Actually:
Does this work universally? I.e. can we add the optimization into 
__wake_up()?
But I do not understand this comment (from 2.6.0)
https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/kern...
...
/* * Note: we use "set_current_state()" _after_ the wait-queue add, * 
because we need a memory barrier there on SMP, so that any * 
wake-function that tests for the wait-queue being active * will be 
guaranteed to see waitqueue addition _or_ subsequent * tests in this 
thread will see the wakeup having taken place. * * The spin_unlock() 
itself is semi-permeable and only protects * one way (it only protects 
stuff inside the critical region and * stops them from bleeding out - 
it would still allow subsequent * loads to move into the the critical 
region). */ voidprepare_to_wait 
https://elixir.bootlin.com/linux/v2.6.0/C/ident/prepare_to_wait(wait_queue_head_t 
https://elixir.bootlin.com/linux/v2.6.0/C/ident/wait_queue_head_t*q,wait_queue_t 
https://elixir.bootlin.com/linux/v2.6.0/C/ident/wait_queue_t*wait 
https://elixir.bootlin.com/linux/v2.6.0/C/ident/wait,intstate) { 
unsignedlongflags; wait 
https://elixir.bootlin.com/linux/v2.6.0/C/ident/wait->flags&=~WQ_FLAG_EXCLUSIVE 
https://elixir.bootlin.com/linux/v2.6.0/C/ident/WQ_FLAG_EXCLUSIVE; 
spin_lock_irqsave 
https://elixir.bootlin.com/linux/v2.6.0/C/ident/spin_lock_irqsave(&q->lock,flags); 
if(list_empty 
https://elixir.bootlin.com/linux/v2.6.0/C/ident/list_empty(&wait 
https://elixir.bootlin.com/linux/v2.6.0/C/ident/wait->task_list 
https://elixir.bootlin.com/linux/v2.6.0/C/ident/task_list)) 
__add_wait_queue 
https://elixir.bootlin.com/linux/v2.6.0/C/ident/__add_wait_queue(q,wait 
https://elixir.bootlin.com/linux/v2.6.0/C/ident/wait); 
set_current_state 
https://elixir.bootlin.com/linux/v2.6.0/C/ident/set_current_state(state); 
spin_unlock_irqrestore 
https://elixir.bootlin.com/linux/v2.6.0/C/ident/spin_unlock_irqrestore(&q->lock,flags); 
}
set_current_state() now uses smp_store_mb(), which is a memory barrier 
_after_ the store. Thus I do not see what enforces that the store 
happens before the store for the __add_wait_queue().
--
Manfred

Re: [RESEND PATCH] fs/pipe: Introduce a check to skip sleeping processes during pipe read/write