Juggling unix child processes
I started writing a C program, sockterm, a couple of days ago that creates a socket and runs shell scripts sent to it by other processes through that socket. The idea is that you can open a pseudoterminal in your favorite window manager/terminal multiplexer and decide what program to put in it whenever convenient, for example to run a compiler for some source code when you trigger a keybinding.
Sockterm has to reliably detect both when there's an update on the socket
and when the currently running program has died. A newly sent shell script is
allowed to replace a currently running one, and a program can connect to the
socket and be notified if the child process dies or is replaced by someone
else, so waiting for both socket updates and child process updates need to
happen simultaneously. The socket events can be handled with
poll
, and
EINTR
can be handled to check if a
SIGCHLD
was sent.
// remember what signal was caught int raised_signal = -1; void signal_handler(int signal) { raised_signal = signal; }
and later:
if(poll(fds, nfds, -1) == -1) { if(errno == EINTR && raised_signal == SIGCHLD) { // handle the zombie process } }
This only works, however, if the child process dies while
poll
is
blocking. If it happens any other time, then the fate of the child will be
unknown to sockterm. This could be solved by handling the dead child in the
signal handler function instead of from poll, but that would open up a whole
range of race conditions. We need to be able to decide when and how one of
these events will sent without missing any. So instead, I did what
poll
does best, and used another file descriptor. Specifically, a pipe.
void signal_handler(int signal) { raised_signal = signal; if(raised_sig == SIGCHLD) { write(pspipe[1], "n", 1); } }
Update: I didn't think I came up with a revolutionary solution, and I was right.
People came up with this in the 1990's.
This way,
pspipe
, a previously created pipe, will have new data
ready whenever a SIGCHLD is caught, and
poll
can detect that if it isn't
blocking when the signal is sent. Then, once it's detected, poll can just read
all the pending data from the pipe so the event isn't triggered again instantly
next poll, and handle the event. This introduces a nice selection of potential
race conditions too, though, such as if a new signal is sent while the current
one is being handled after
poll
, or if multiple are sent before
poll
runs a gain, but it can probably be made workable, despite the fact
that it's probably very bug-ridden in it's current state.
The problems aren't done there though. The design of sockterm is to have
a single child process running at any given time. If one is already running
when another is supposed start, then the running child has to be killed first.
Unfortunately, a lot of programs don't handle signals in a way that works well
with that. Killing bash, for example (or many other shells I'd imagine) with a
SIGTERM doesn't actually kill bash's children, only bash itself. So, if we ran
something like
/bin/sh -c yes
on sockterm and then sent a SIGTERM to the
shell,
yes
would continue to clog up the terminal screen for all of
eternity until killed directly. There's a fairly easy (and hacky) solution to
this, which is to simply instead do
/bin/sh -c 'exec program'
, so the
shell gets replaced by
program
and the pid held by sockterm is of the
program you actually wanted to run in the first place. Now there's another race
condition, though, if the shell is killed before it gets a chance to
exec
, So this is still a sub-optimal solution. It should work most of the
time though...
...except when it doesn't. Even if an interactive program is signaled to
die correctly, it still might not handle that signal in a way that would be
helpful. For example,
man
spawns a child process to act as a pager for a
manual entry. If the
man
is killed with a signal, it does die, but the
pager doesn't, so we're left with the same problem as with the shell. Even if a
signal would kill all the right processes, they might not kill themselves
correctly. Vim breaks your terminal when sent a SIGTERM, for example. If you
were to run the following script:
vim; echo first; echo second
(on a shell other than bash, which seems to fix your terminal for you)
and sent a TERM to vim, the output would look like this:
Vim: Caught deadly signal TERM Vim: Finished. Terminated first second
So ideally you would also need to attempt to fix your terminal before trying to run the next program.
Anyways, that was fun. I'll probably try to find a proper set of solutions to all the brokenness eventually if I don't give up.