From: Evgeniy Polyakov Signed-off-by: Andrew Morton --- Documentation/kevent.txt | 161 +++++++++++++++++++++++++++++++++++++ 1 files changed, 161 insertions(+) diff -puN /dev/null Documentation/kevent.txt --- /dev/null +++ a/Documentation/kevent.txt @@ -0,0 +1,161 @@ + +int kevent_ctl(int fd, unsigned int cmd, unsigned int num, struct ukevent *arg); + +fd - is the file descriptor referring to the kevent queue to manipulate. +It is created by opening "/dev/kevent" char device, which is created with dynamic +minor number and major number assigned for misc devices. + +cmd - is the requested operation. It can be one of the following: + KEVENT_CTL_ADD - add event notification + KEVENT_CTL_REMOVE - remove event notification + KEVENT_CTL_MODIFY - modify existing notification + +num - number of struct ukevent in the array pointed to by arg +arg - array of struct ukevent + +When called, kevent_ctl will carry out the operation specified in the cmd parameter. +------------------------------------------------------------------------------------- + + int kevent_get_events(int ctl_fd, unsigned int min_nr, unsigned int max_nr, __u64 timeout, struct ukevent *buf, unsigned flags) + +ctl_fd - file descriptor referring to the kevent queue +min_nr - minimum number of completed events that kevent_get_events will block waiting for +max_nr - number of struct ukevent in buf +timeout - number of nanoseconds to wait before returning less than min_nr events. + If this is -1, then wait forever. +buf - pointer to an array of struct ukevent. +flags - unused + +kevent_get_events will wait timeout milliseconds for at least min_nr completed events, +copying completed struct ukevents to buf and deleting any KEVENT_REQ_ONESHOT event requests. +In nonblocking mode it returns as many events as possible, but not more than max_nr. +In blocking mode it waits until timeout or if at least min_nr events are ready. +------------------------------------------------------------------------------------- + + int kevent_wait(int ctl_fd, unsigned int num, __u64 timeout) + +ctl_fd - file descriptor referring to the kevent queue +num - number of processed kevents +timeout - this timeout specifies number of nanoseconds to wait until there is free space in kevent queue + +This syscall waits until either timeout expires or at least one event becomes ready. +It also copies that num events into special ring buffer and requeues them (or removes depending on flags). +------------------------------------------------------------------------------------- + + int kevent_ring_init(int ctl_fd, struct kevent_ring *ring, unsigned int num) + +ctl_fd - file descriptor referring to the kevent queue +num - size of the ring buffer in events + + struct kevent_ring + { + unsigned int ring_kidx; + struct ukevent event[0]; + } + +ring_kidx - is an index in the ring buffer where kernel will put new events when + kevent_wait() or kevent_get_events() is called + +Example userspace code (ring_buffer.c) can be found on project's homepage. + +Each kevent syscall can be so called cancellation point in glibc, i.e. when thread has +been cancelled in kevent syscall, thread can be safely removed and no events will be lost, +since each syscall (kevent_wait() or kevent_get_events()) will copy event into special ring buffer, +accessible from other threads or even processes (if shared memory is used). + +When kevent is removed (not dequeued when it is ready, but just removed), even if it was ready, +it is not copied into ring buffer, since if it is removed, no one cares about it (otherwise user +would wait until it becomes ready and got it through usual way using kevent_get_events() or kevent_wait()) +and thus no need to copy it to the ring buffer. + +It is possible with userspace ring buffer, that events in the ring buffer can be replaced without knowledge +for the thread currently reading them (when other thread calls kevent_get_events() or kevent_wait()), +so appropriate locking between threads or processes, which can simultaneously access the same ring buffer, +is required. +------------------------------------------------------------------------------------- + +The bulk of the interface is entirely done through the ukevent struct. +It is used to add event requests, modify existing event requests, +specify which event requests to remove, and return completed events. + +struct ukevent contains the following members: + +struct kevent_id id + Id of this request, e.g. socket number, file descriptor and so on +__u32 type + Event type, e.g. KEVENT_SOCK, KEVENT_INODE, KEVENT_TIMER and so on +__u32 event + Event itself, e.g. SOCK_ACCEPT, INODE_CREATED, TIMER_FIRED +__u32 req_flags + Per-event request flags, + + KEVENT_REQ_ONESHOT + event will be removed when it is ready + + KEVENT_REQ_WAKEUP_ONE + When several threads wait on the same kevent queue and requested the same event, + for example 'wake me up when new client has connected, so I could call accept()', + then all threads will be awakened when new client has connected, but only one of + them can process the data. This problem is known as thundering nerd problem. + Events which have this flag set will not be marked as ready (and appropriate threads + will not be awakened) if at least one event has been already marked. + + KEVENT_REQ_ET + Edge Triggered behaviour. It is an optimisation which allows to move ready and dequeued + (i.e. copied to userspace) event to move into set of interest for given storage (socket, + inode and so on) again. It is very usefull for cases when the same event should be used + many times (like reading from pipe). It is similar to epoll()'s EPOLLET flag. + +__u32 ret_flags + Per-event return flags + + KEVENT_RET_BROKEN + Kevent is broken + + KEVENT_RET_DONE + Kevent processing was finished successfully + + KEVENT_RET_COPY_FAILED + Kevent was not copied into ring buffer due to some error conditions. + +__u32 ret_data + Event return data. Event originator fills it with anything it likes (for example + timer notifications put number of milliseconds when timer has fired +union { __u32 user[2]; void *ptr; } + User's data. It is not used, just copied to/from user. The whole structure is aligned + to 8 bytes already, so the last union is aligned properly. + +--------------------------------------------------------------------------------- + +Usage + +For KEVENT_CTL_ADD, all fields relevant to the event type must be filled +(id, type, possibly event, req_flags). After kevent_ctl(..., KEVENT_CTL_ADD, ...) +returns each struct's ret_flags should be checked to see if the event is already broken or done. + +For KEVENT_CTL_MODIFY, the id, req_flags, and user and event fields must be set and an +existing kevent request must have matching id and user fields. If a match is found, +req_flags and event are replaced with the newly supplied values and requeueing is started, +so modified kevent can be checked and probably marked as ready immediately. If a match can't +be found, the passed in ukevent's ret_flags has KEVENT_RET_BROKEN set. KEVENT_RET_DONE is always set. + +For KEVENT_CTL_REMOVE, the id and user fields must be set and an existing kevent request must +have matching id and user fields. If a match is found, the kevent request is removed. +If a match can't be found, the passed in ukevent's ret_flags has KEVENT_RET_BROKEN set. +KEVENT_RET_DONE is always set. + +For kevent_get_events, the entire structure is returned. + +--------------------------------------------------------------------------------- + +Usage cases + +kevent_timer +struct ukevent should contain following fields: + type - KEVENT_TIMER + event - KEVENT_TIMER_FIRED + req_flags - KEVENT_REQ_ONESHOT if you want to fire that timer only once + id.raw[0] - number of seconds after commit when this timer shout expire + id.raw[0] - additional to number of seconds number of nanoseconds + + _