Juggling unix child processes

created: 2022-06-13 00:41:39 UTC
modified: 2022-06-23 17:44:05 UTC
tagged: programming, linux, unix, c

I started writing a C program, sockterm, a couple of days ago that creates a socket and runs shell scripts sent to it by other processes through that socket. The idea is that you can open a pseudoterminal in your favorite window manager/terminal multiplexer and decide what program to put in it whenever convenient, for example to run a compiler for some source code when you trigger a keybinding.

sockterm

Sockterm has to reliably detect both when there's an update on the socket and when the currently running program has died. A newly sent shell script is allowed to replace a currently running one, and a program can connect to the socket and be notified if the child process dies or is replaced by someone else, so waiting for both socket updates and child process updates need to happen simultaneously. The socket events can be handled with poll , and EINTR can be handled to check if a SIGCHLD was sent.

// remember what signal was caught
int raised_signal = -1;
void signal_handler(int signal)
{
	raised_signal = signal;
}

and later:

if(poll(fds, nfds, -1) == -1) {
	if(errno == EINTR && raised_signal == SIGCHLD) {
		// handle the zombie process
	}
}

This only works, however, if the child process dies while poll is blocking. If it happens any other time, then the fate of the child will be unknown to sockterm. This could be solved by handling the dead child in the signal handler function instead of from poll, but that would open up a whole range of race conditions. We need to be able to decide when and how one of these events will sent without missing any. So instead, I did what poll does best, and used another file descriptor. Specifically, a pipe.

void signal_handler(int signal)
{
	raised_signal = signal;
	if(raised_sig == SIGCHLD) {
		write(pspipe[1], "n", 1);
	}
}

Update: I didn't think I came up with a revolutionary solution, and I was right.

People came up with this in the 1990's.

This way, pspipe , a previously created pipe, will have new data ready whenever a SIGCHLD is caught, and poll can detect that if it isn't blocking when the signal is sent. Then, once it's detected, poll can just read all the pending data from the pipe so the event isn't triggered again instantly next poll, and handle the event. This introduces a nice selection of potential race conditions too, though, such as if a new signal is sent while the current one is being handled after poll , or if multiple are sent before poll runs a gain, but it can probably be made workable, despite the fact that it's probably very bug-ridden in it's current state.

The problems aren't done there though. The design of sockterm is to have a single child process running at any given time. If one is already running when another is supposed start, then the running child has to be killed first. Unfortunately, a lot of programs don't handle signals in a way that works well with that. Killing bash, for example (or many other shells I'd imagine) with a SIGTERM doesn't actually kill bash's children, only bash itself. So, if we ran something like /bin/sh -c yes on sockterm and then sent a SIGTERM to the shell, yes would continue to clog up the terminal screen for all of eternity until killed directly. There's a fairly easy (and hacky) solution to this, which is to simply instead do /bin/sh -c 'exec program' , so the shell gets replaced by program and the pid held by sockterm is of the program you actually wanted to run in the first place. Now there's another race condition, though, if the shell is killed before it gets a chance to exec , So this is still a sub-optimal solution. It should work most of the time though...

...except when it doesn't. Even if an interactive program is signaled to die correctly, it still might not handle that signal in a way that would be helpful. For example, man spawns a child process to act as a pager for a manual entry. If the man is killed with a signal, it does die, but the pager doesn't, so we're left with the same problem as with the shell. Even if a signal would kill all the right processes, they might not kill themselves correctly. Vim breaks your terminal when sent a SIGTERM, for example. If you were to run the following script:

vim; echo first; echo second

(on a shell other than bash, which seems to fix your terminal for you)

and sent a TERM to vim, the output would look like this:

Vim: Caught deadly signal TERM
Vim: Finished.
Terminated
          first
               second

So ideally you would also need to attempt to fix your terminal before trying to run the next program.

Anyways, that was fun. I'll probably try to find a proper set of solutions to all the brokenness eventually if I don't give up.