L2 - Processes, Signals and Descriptors

Tutorial 2 - Processes, Signals and Descriptors #

This tutorial contains the explanations for the used functions and their parameters. It is still only a surface-level tutorial and it is vital that you read the man pages to familiarize yourself and understand all of the details.

Process Management #

Creating Processes #

A child process is created using the fork command. Let’s take a look at the definition of this function:

pid_t fork(void)

As you can see, it returns an object of type pid_t, which is a signed integer type. In the parent process, the function returns the identifier of the newly created process, while in the child process it returns 0. This makes it easy to distinguish between the logic executed by the child and the logic executed by the parent.

Of course, creating a new process may fail (for example, when the system runs out of necessary resources). In such a case, the fork function returns -1 and sets the appropriate value of the errno variable.

Processes created by a given process are called its children, while from the perspective of a child process, the process that created it is called the parent.

More information:

man 3p fork

Process Identification #

Each process has a unique identifier of type pid_t. To obtain information about the process identifier, we use the getpid() function, and to find out the identifier of the parent process, we use the getppid() function. Their definitions are as follows:

pid_t getpid(void)
pid_t getppid(void)

As you can see, they do not take any arguments and return an object of type pid_t.

According to the POSIX standard, both functions always succeed (man 3p getpid).

Exercise #

Write a program that creates ’n’ sub-processes (n is 1st program parameter), each of those processes waits for random [5-10]s time then prints its PID and terminates. Parent process prints the number of alive child processes every 3s. For now, do not worry about waiting for child processes.

New man pages

Solution #

solution prog13a.c:

#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/time.h>
#include <sys/wait.h>
#include <time.h>
#include <unistd.h>

#define ERR(source) \
    (fprintf(stderr, "%s:%d\n", __FILE__, __LINE__), perror(source), kill(0, SIGKILL), exit(EXIT_FAILURE))

void child_work(int i)
{
    srand(time(NULL) * getpid());
    int t = 5 + rand() % (10 - 5 + 1);
    sleep(t);
    printf("PROCESS with pid %d terminates\n", getpid());
}

void create_children(int n)
{
    pid_t s;
    for (int i = 0; i < n; ++i)
    {
        if ((s = fork()) < 0)
            ERR("Fork:");

        if (s == 0)
        {
            child_work(n);
            exit(EXIT_SUCCESS);
        }
    }
}

void usage(char *name)
{
    fprintf(stderr, "USAGE: %s 0<n\n", name);
    exit(EXIT_FAILURE);
}

int main(int argc, char **argv)
{
    if (argc < 2)
        usage(argv[0]);

    int child_count = atoi(argv[1]);
    if (child_count <= 0)
        usage(argv[0]);

    create_children(child_count);
    return EXIT_SUCCESS;
}

Notes and questions #

  • Make sure you know how the process group is created by shell, what processes belong to it?

  • Please note that macro ERR was extended with kill(0, SIGKILL), it is meant to terminate the whole program (all other processes) in case of error.

  • Provide zero as pid argument of kill and you can send a signal to all the processes in the group. It is very useful not to keep the PID’s list in your program.

  • Please notice that we do not test for errors inside of ERR macro (during error reporting), it is so to keep the program action at minimal level at emergency exit. What else can we do ? Call ERR recursively and have the same errors again?

  • Why after you run this program the command line returns immediately while processes are still working?

    Answer
    Parent process is not waiting for child processes, no wait or waitpid call. It will be fixed in the 2nd stage.

  • How to check the current parent of the created sub-processes (after the initial parent quits)? Why this process?

    Answer
    Right after the command line returns run: $ps -f, you should see that the PPID (parent PID) is 1 (init/systemd). It is caused by premature end of parent process, the orphaned processes can not "hang" outside of process three so they have to be attached somewhere. To make it simple, it is not the shell but the first process in the system.

  • Random number generator seed is set in child process, can it be moved to parent process? Will it affect the program?

    Answer
    Child processes will get the same "random" numbers because they will have the same random seed. Seeding can not be moved to parent.

  • Can we change the seed from PID to time() call?

    Answer
    No. Time you get from time() is returned in seconds since 1970, in most cases all sub-processes will have the same seed and will get the same (not random) numbers.

  • Try to derive a formula to get random number from the range [A,B], it should be obvious.

  • How this program works if you remove the exit call in child code (right after child_work call)?

    Answer
    Child process after exiting the child_work will continue back into forking loop! It will start it's own children. Grandchildren can start their children and so on. To mess it up a bit more child processes do not wait for their children.

  • How many processes will be started in above case if you supply 3 as starting parameter?

    Answer
    1 parent 3 children, 3 grand children and 1 grand grand child, 8 in total, draw a process three for it, tag the branches with current (on `fork`) n value.

  • What sleep returns? Should we react to this value somehow?

    Answer
    It returns the time left to sleep at the moment of interruption by signal handling function. In this code child processes does not receive nor handle the signals so this interruption is not possible. In other codes it may be vital to restart sleep with remaining time.

  • In the next stage child waiting and child counting will be added. How can we know how many child processes have exited?

    Answer
    SIGCHLD counting will not be precise as signals can marge, the only sure method is to count successful calls to wait or waitpid.

Waiting for Child Processes #

After completing the execution of all its instructions, a child process enters the zombie state (its identifier still remains in the process table) and stays in this state until the parent process retrieves information about its status (Status Information from man 3p wait). Only then are the child process’s resources fully released from the system.

The parent can obtain the status information of a waiting child process using the wait function, which waits for any child process, or the waitpid function, which allows specifying which particular child process to wait for. Let’s take a look at their definitions:

pid_t wait(int *stat_loc);
pid_t waitpid(pid_t pid, int *stat_loc, int options);

As we can see, both functions return a pid_t object, which is the identifier of the process whose status we receive information about.

Both functions take the stat_loc argument of type int*, which points to a memory location where the information about the process’s status will be stored (if we do not need this information, we can pass NULL).

The waitpid function has two additional arguments: pid of type pid_t and options of type int.

The pid argument specifies which processes we want to wait for. Depending on its value, the function behaves as follows:

  • pid == -1 - wait for any child process.
  • pid > 0 - wait for the process with the identifier equal to pid.
  • pid == 0 - wait for any process whose group ID is equal to the group ID of the calling process.
  • pid < -1 - wait for any process whose group ID is equal to the absolute value of pid.

The options argument specifies modifications to the function’s behavior and can be a combination of the following options:

  • WCONTINUED - the function should also return information about processes that were resumed after being stopped.
  • WNOHANG - the waitpid function should not block the calling process if none of the processes we are waiting for can immediately report their status. In this case, the function returns 0.
  • WUNTRACED - the function should also return information about processes that were stopped.

For the purposes of the lab, it is enough to know the WNOHANG option.

More information:

man 3p wait

In summary, we can consider the waitpid function as a more advanced version of the wait function: calling wait(stat_loc) is equivalent to calling waitpid(-1, stat_loc, 0).

Of course, both functions may fail, in which case they return -1 and set the appropriate errno value.

Note: If we call wait or waitpid and the pool of child processes to wait for is empty, the function returns -1 and sets errno to ECHILD. It is useful to take advantage of this to ensure that no processes are left orphaned before the parent process terminates.

Exercise #

Extend the program from the previous exercise to correctly wait for child processes.

New man pages

Solution #

solution prog13b.c:

#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/time.h>
#include <sys/wait.h>
#include <time.h>
#include <unistd.h>

#define ERR(source) \
    (fprintf(stderr, "%s:%d\n", __FILE__, __LINE__), perror(source), kill(0, SIGKILL), exit(EXIT_FAILURE))

void child_work(int i)
{
    srand(time(NULL) * getpid());
    int t = 5 + rand() % (10 - 5 + 1);
    sleep(t);
    printf("PROCESS with pid %d terminates\n", getpid());
}

void create_children(int n)
{
    pid_t s;
    for (int i = 0; i < n; ++i)
    {
        if ((s = fork()) < 0)
            ERR("Fork:");

        if (s == 0)
        {
            child_work(n);
            exit(EXIT_SUCCESS);
        }
    }
}

void usage(char *name)
{
    fprintf(stderr, "USAGE: %s 0<n\n", name);
    exit(EXIT_FAILURE);
}

int main(int argc, char **argv)
{
    if (argc < 2)
        usage(argv[0]);

    int n = atoi(argv[1]);
    if (n <= 0)
        usage(argv[0]);

    create_children(n);
    while (n > 0)
    {
        sleep(3);
        pid_t pid;
        while (1)
        {
            pid = waitpid(0, NULL, WNOHANG);
            if (pid > 0)
                n--;
            if (0 == pid)
                break;
            if (0 >= pid)
            {
                if (ECHILD == errno)
                    break;
                ERR("waitpid:");
            }
        }
        printf("PARENT: %d processes remain\n", n);
    }
    return EXIT_SUCCESS;
}

Notes and questions #

  • It is worth knowing that waitpid can tell us about temporary lack of terminated children (returns zero) and about permanent lack of them (error ECHILD). The second case is not a critical error, your code should expect it.

  • Why waitpid is in a loop?

    Answer
    we do not know how many zombie processes are there to collect, it can be from zero to n of them.

  • Why waitpid has the WNOHANG flag on?

    Answer
    we do not want to wait for alive child processes as we have to report the counter every 3 sec. to the user

  • Why zero in place of pid in waitpid call?

    Answer
    We want to wait for any child process, we do not need to know children pids, zero means any of them.

  • Does this program encounter signals?

    Answer
    Yes - SIGCHILD. there is no handling routine but in this case it's alright, children are handled promptly by the above loop.

  • Shouldn’t we check sleep return value as we have signals in this code?

    Answer
    No, as we do not handle them.

Signals #

Signals are an asynchronous event-handling mechanism used in Unix-like operating systems. They allow notifying processes about the occurrence of specific system events, exceptions, or requests to control execution.

Sending signals #

Signals are sent using the kill function:

#include <signal.h>

int kill(pid_t pid, int sig);

The pid argument specifies which process or group of processes the signal is directed to:

  • pid > 0 - the signal is sent to the process with PID equal to pid
  • pid = 0 - the signal is sent to processes belonging to the sender’s process group (the sender also receives the signal)
  • pid = -1 - the signal is sent to all processes to which the sender has permission (including itself)
  • pid < -1 - the signal is sent to the processes with group ID equal to the absolute value of pid

The sig argument specifies which signal should be sent. It can take the following values:

  • one of the macros defined in <signal.h>, such as SIGTERM, SIGKILL, or SIGUSR1. The full list can be found in the manual:
man 7 signal
  • value zero - in this case, no signal is sent the function only checks for potential execution errors.

The function returns 0 on success. Otherwise, it returns -1 and an appropriate value of errno is set.

More information:

man 3p kill

Signal handling #

Each signal has a default behavior assigned to it. You can check the list of signals and their default behavior using:

man 7 signal

The way a signal is handled can be checked or changed using:

#include <signal.h>

int sigaction(int sig, const struct sigaction *restrict act, struct sigaction *restrict oact);

Arguments:

  • sig specifies which signal we are referring to and takes one of the macros from <signal.h>.
  • act sets a new signal handler if it points to a sigaction structure. If it is NULL, the handler is not changed.
  • oact - if set to NULL, this argument is ignored. Otherwise, the structure it points to is set to the old (or current) handler.

According to POSIX, the sigaction structure must contain at least the following fields:

  • void(*) (int) sa_handler - a pointer to the signal-handling function, or one of the values SIG_IGN or SIG_DFL. The handler function must accept an int (the code of the signal being handled) and return nothing. The macro SIG_IGN means that the signal will be ignored, while SIG_DFL selects the default handling behavior for that signal.
  • sigset_t sa_mask - the set of signals that will be blocked during execution of the handler
  • int sa_flags - flags modifying signal behavior
  • void(*) (int, siginfo_t *, void *) sa_sigaction – pointer to the alternative handler called when SA_SIGINFO flag in sa_mask is set

The function returns 0 on success or -1 on error, setting errno.

It is worth noting, that child processes created with fork inherit the parent’s signal-handling settings.

More information:

man 3p sigaction

Task #

The program takes 4 positional parameters (n, k, p, and r). It creates n child processes. The parent alternately sends SIGUSR1 and SIGUSR2 to all children in a loop, waiting k and p seconds respectively. It terminates when all child processes end.

Each child process randomly selects a sleep time between 5 and 10 seconds, then in a loop sleeps and prints SUCCESS if the last received signal was SIGUSR1, or FAILURE if it was SIGUSR2. This loop repeats r times.

What the student must know:

  • man 7 signal
  • man 3p sigaction
  • man 3p nanosleep
  • man 3p alarm
  • man 3p memset
  • man 3p kill

Solution #

solution prog14.c:

#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/wait.h>
#include <time.h>
#include <unistd.h>

#define ERR(source) \
    (fprintf(stderr, "%s:%d\n", __FILE__, __LINE__), perror(source), kill(0, SIGKILL), exit(EXIT_FAILURE))

volatile sig_atomic_t last_signal = 0;

void sethandler(void (*f)(int), int sigNo)
{
    struct sigaction act;
    memset(&act, 0, sizeof(struct sigaction));
    act.sa_handler = f;

    if (-1 == sigaction(sigNo, &act, NULL))
        ERR("sigaction");
}

void sig_handler(int sig)
{
    printf("[%d] received signal %d\n", getpid(), sig);
    last_signal = sig;
}

void sigchld_handler(int sig)
{
    pid_t pid;

    while (1)
    {
        pid = waitpid(0, NULL, WNOHANG);

        if (pid == 0)
            return;

        if (pid <= 0)
        {
            if (errno == ECHILD)
                return;
            ERR("waitpid");
        }
    }
}

void child_work(int l)
{
    int t, tt;

    srand(getpid());
    t = rand() % 6 + 5;

    for (int i = 0; i < l; i++)
    {
        for (tt = t; tt > 0; tt = sleep(tt))
        {
        }

        if (last_signal == SIGUSR1)
            printf("Success [%d]\n", getpid());
        else
            printf("Failed [%d]\n", getpid());
    }
    printf("[%d] Terminates \n", getpid());
}

void parent_work(int k, int p, int l)
{
    struct timespec tk = {k, 0};
    struct timespec tp = {p, 0};

    sethandler(sig_handler, SIGALRM);
    alarm(l * 10);

    while (last_signal != SIGALRM)
    {
        nanosleep(&tk, NULL);
        if (kill(0, SIGUSR1) < 0)
            ERR("kill");

        nanosleep(&tp, NULL);
        if (kill(0, SIGUSR2) < 0)
            ERR("kill");
    }

    printf("[PARENT] Terminates \n");
}

void create_children(int n, int l)
{
    for (int i = 0; i < n; i++)
    {
        switch (fork())
        {
            case 0:
                sethandler(sig_handler, SIGUSR1);
                sethandler(sig_handler, SIGUSR2);
                child_work(l);
                exit(EXIT_SUCCESS);
            case -1:
                perror("Fork:");
                exit(EXIT_FAILURE);
        }
    }
}

void usage(void)
{
    fprintf(stderr, "USAGE: signals n k p l\n");
    fprintf(stderr, "n - number of children\n");
    fprintf(stderr, "k - Interval before SIGUSR1\n");
    fprintf(stderr, "p - Interval before SIGUSR2\n");
    fprintf(stderr, "l - lifetime of child in cycles\n");

    exit(EXIT_FAILURE);
}

int main(int argc, char **argv)
{
    int n, k, p, l;
    if (argc != 5)
        usage();

    n = atoi(argv[1]);
    k = atoi(argv[2]);
    p = atoi(argv[3]);
    l = atoi(argv[4]);
    if (n <= 0 || k <= 0 || p <= 0 || l <= 0)
        usage();

    sethandler(sigchld_handler, SIGCHLD);
    sethandler(SIG_IGN, SIGUSR1);
    sethandler(SIG_IGN, SIGUSR2);

    create_children(n, l);
    parent_work(k, p, l);

    while (wait(NULL) > 0)
    {
    }

    return EXIT_SUCCESS;
}

To exchange the data between signal handling routine and the main code we must use global variables, please remember that it is an exceptional situation as in general we do not use global variables. Also please remember that global variables are not shared between related processes. It should be obvious but sometimes students get confused.

The type of the global variable last_signal is not accidental moreover, it is the only SAFE and CORRECT type. This results from the asynchronous nature of calling the signal handler function and more precisely:

  • volatile disables compiler optimizations; it’s important that the compiler does not assume the value of the variable is constant, because its changes do not result from the code and it could turn a readable loop while(work) where work is a global variable into while(1) after optimization.
  • sig_atomic_t means the largest numerical type that is processed in a single CPU instruction. If we take a larger numerical type, an interrupt invoked by a signal handler may disrupt the resulting value even of a simple comparison a==0, if the interruption occurs during the comparison and changes already compared bytes.

From the above it follows that we do not pass anything between the handler and the main code besides simple numbers and states. Additionally, good practice dictates not interrupting the program for too long, which leaves very few correct, portable, and safe solutions regarding how to distribute program logic between the main code and the signal handler. The simplest rule is that handlers should be extremely short (assignment, variable increment, etc.) and all logic should remain in the main code.

The memset function is sometimes necessary and usually useful when initializing structures whose full layout is not completely known to us. In this case, POSIX explicitly states that the sigaction structure may contain more fields than required by the standard. In such a situation, these additional fields—whose values we would not set (here we zero them using memset) - may cause different behavior on different systems, or even different behavior between program runs.

Can more than one child process be terminated when handling the SIGCHLD signal?

Answer
Yes, signals may coalesce; a child may terminate during the handling of `SIGCHLD`. Hence the loop in the handler function.

Can we expect no terminated child during SIGCHLD handling? Look at the end of main.

Answer
Yes, the final `wait` in main may "steal" waiting zombies, i.e., `wait` may execute before the handler.

Remember the possible CONFLICT between sleep and alarm - according to POSIX, sleep may internally use SIGALRM and signals cannot be nested. Therefore code waiting for an alarm never uses sleep instead, you can use nanosleep like in the code above.

In sending signals (kill), PID zero appears; this allows us not to know the exact PIDs of child processes but also means that we send a signal to ourselves!

The placement of setting and blocking signal handlers in this program is very important. Note how it works and answer the questions below. Always carefully consider the order of these settings in your program - many student errors come from this!

Note the sleep, why is it in a loop? Can the sleep time be exact?

Answer
`sleep` is interrupted by signal handling, so restart is necessary. Since `sleep` returns remaining sleep time as seconds, rounding means that exact timing cannot be achieved after restart.

What is the default disposition of most signals (including SIGUSR1 and SIGUSR2)?

Answer
To kill the target process; in this program, lack of a handler, blocking, or ignoring `SIGUSR1` and `SIGUSR2` would cause premature termination.

What is the consequence of the parent process sending SIGUSR1/SIGUSR2 to the whole process group?

Answer
The parent must react to these signals even though it does not need them thus it ignores them.

What would happen if we did not enable ignoring of SIGUSR1 and SIGUSR2 in the parent?

Answer
The parent would kill itself with the first sent signal.

Can we move the ignoring of these signals after the create_children function? The children do not need ignoring - they get a handler on start.

Answer
No. It could happen (rarely) that children are created but not yet started, and CPU time returns to the parent, which sends them a `SIGUSR1`. On the next CPU allocation for children, they would first handle the signal and thus terminate!

Can we modify this program so that ignoring SIGUSR1 and SIGUSR2 is unnecessary?

Answer
Yes. This specific program can have identical reactions to these signals in both parent and children, so the handler can be set before fork.

What happens if we move the handling of SIGCHLD after fork?

Answer
If a child dies before we enable the handler, it will stay a "zombie" until another child terminates. Not a major error, but worth attention.

Is wait at the end of main necessary? Parent’s work should last at least as long as the slowest child process.

Answer
Timing in the parent loop is insufficient; in a loaded system, any delay is possible. Without `wait`, a race condition appears --- the parent or child may finish first.

Waiting for a Signal #

Often, when writing programs, we encounter a situation where a process, before performing its work, must be informed that another process has completed its task.
As you might guess, this problem can be easily solved using signals. Inspired by the previous task, we could write logic where our process sleeps in a loop and checks whether the last signal it received is the one it is waiting for. Unfortunately, not only is this solution inelegant, but it is also incorrect, it could happen that the signal we are waiting for gets “merged” with another signal, and we would never actually know that our process can start its work.

Fortunately, the operating system provides tools to solve this problem.

To block the program until it receives a signal, we use the sigsuspend function. Let’s look at its definition:

int sigsuspend(const sigset_t *sigmask);

As we can see, this function returns an int value, which is used to report a potential error (in that case, it returns -1). It takes an argument sigmask of type const sigset_t *, which is a pointer to a set of signals that the function will wait for.

The function works as follows: it sets the signal mask to the one provided in the argument, waits to catch one of these signals, then restores the previous signal mask and resumes the execution of the process.

More information:

man 3p sigsuspend

Managing the Signal Mask #

A set of signals is called a signal mask. We will store the signal mask as an object of type sigset_t. The standard does not specify how this type should be implemented; it can be either an int or a structure.

To modify the signal mask, we use the functions sigemptyset, which initializes the mask as an empty set, and sigaddset, which adds a signal to the mask.
Let’s look at their definitions:

int sigemptyset(sigset_t *set);
int sigaddset(sigset_t *set, int signo);

As we can see, both functions take set of type sigset_t * as the first argument, which is a pointer to the mask we want to edit.
The function sigaddset additionally takes the signo argument, which is the code of the signal that, we want to add to the mask.

Both functions return an int value, used to indicate the operation’s outcome: on success, they return 0, and on error, they return -1 and set the appropriate value of the errno variable.

More information:

man 3p sigaddset
man 3p sigemptyset

Changing the Signal Mask #

Once we have defined a new signal mask, we want it to affect the operation of our process.
For this purpose, we use the sigprocmask function, which determines how the signal mask we defined should affect the current signal mask of the process.
Let’s look at its definition:

int sigprocmask(int how, const sigset_t *restrict set, sigset_t *restrict oset);

As we can see, this function takes the following arguments:

  • how of type int specifies how the new mask should affect the current mask. Available options are:

    • SIG_BLOCK - the resulting signal mask is the union of the mask pointed to by set and the current signal mask (we specify which signals we want to add to the mask).
    • SIG_SETMASK - the resulting signal mask is the signal mask pointed to by set.
    • SIG_UNBLOCK - the resulting signal mask is the intersection of the current mask and the complement of the mask pointed to by set (we specify which signals we want to remove from the mask).
  • set of type const sigset_t * is a pointer to the mask we want to use to modify the previous mask.

  • oset of type sigset_t * is a pointer to an object where we want to save the signal mask before modification.

More information:

man 3p pthread_sigmask

Exercise #

Write a program that starts one child process, which sends every “m” (parameter) microseconds a SIGUSR1 signal to the parent. Every n-th signal is changed to SIGUSR2. Parent anticipates SIGUSR2 and counts the amount of signals received. Child process also counts the amount of SIGUSR2 sent. Both processes print out the counted amounts at each signal operation. We reuse some functions from previous code.

New man pages:

  • man 3p sigsuspend
  • Glibc signal waiting here
  • man 3p pthread_sigmask (sigprocmask only)
  • man 3p sigaddset
  • man 3p sigemptyset

Solution #

solution part prog15.c:

#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/wait.h>
#include <time.h>
#include <unistd.h>

#define ERR(source) \
    (fprintf(stderr, "%s:%d\n", __FILE__, __LINE__), perror(source), kill(0, SIGKILL), exit(EXIT_FAILURE))

volatile sig_atomic_t last_signal = 0;

void sethandler(void (*f)(int), int sigNo)
{
    struct sigaction act;
    memset(&act, 0, sizeof(struct sigaction));
    act.sa_handler = f;
    if (-1 == sigaction(sigNo, &act, NULL))
        ERR("sigaction");
}

void sig_handler(int sig) { last_signal = sig; }

void sigchld_handler(int sig)
{
    pid_t pid;
    while (1)
    {
        pid = waitpid(0, NULL, WNOHANG);
        if (pid == 0)
            return;
        if (pid <= 0)
        {
            if (errno == ECHILD)
                return;
            ERR("waitpid");
        }
    }
}

void child_work(int m, int p)
{
    int count = 0;
    struct timespec t = {0, m * 10000};
    while (1)
    {
        for (int i = 0; i < p; i++)
        {
            nanosleep(&t, NULL);
            if (kill(getppid(), SIGUSR1))
                ERR("kill");
        }
        nanosleep(&t, NULL);
        if (kill(getppid(), SIGUSR2))
            ERR("kill");
        count++;
        printf("[%d] sent %d SIGUSR2\n", getpid(), count);
    }
}

void parent_work(sigset_t oldmask)
{
    int count = 0;
    while (1)
    {
        last_signal = 0;
        while (last_signal != SIGUSR2)
            sigsuspend(&oldmask);
        count++;
        printf("[PARENT] received %d SIGUSR2\n", count);
    }
}

void usage(char *name)
{
    fprintf(stderr, "USAGE: %s m  p\n", name);
    fprintf(stderr,
            "m - number of 1/1000 milliseconds between signals [1,999], "
            "i.e. one milisecond maximum\n");
    fprintf(stderr, "p - after p SIGUSR1 send one SIGUSER2  [1,999]\n");
    exit(EXIT_FAILURE);
}

int main(int argc, char **argv)
{
    if (argc != 3)
        usage(argv[0]);

    int m = atoi(argv[1]);
    int p = atoi(argv[2]);
    if (m <= 0 || m > 999 || p <= 0 || p > 999)
        usage(argv[0]);

    sethandler(sigchld_handler, SIGCHLD);
    sethandler(sig_handler, SIGUSR1);
    sethandler(sig_handler, SIGUSR2);

    sigset_t mask, oldmask;
    sigemptyset(&mask);
    sigaddset(&mask, SIGUSR1);
    sigaddset(&mask, SIGUSR2);
    sigprocmask(SIG_BLOCK, &mask, &oldmask);

    pid_t pid;
    if ((pid = fork()) < 0)
        ERR("fork");
    if (pid == 0)
        child_work(m, p);
    else
    {
        parent_work(oldmask);
        while (wait(NULL) > 0)
            ;
    }
    sigprocmask(SIG_UNBLOCK, &mask, NULL);
    return EXIT_SUCCESS;
}

The program terminates on SIGINT (C-c)

Notes and questions #

  • Try it with various parameters. The shorter microsecond brake and more frequent SIGUSER2 the faster growing gap between counters should be observable. In a moment the difference in numbers will be explained. If you do not observe the shift between counters let the program run a bit longer - 1 minute should do.

  • This code was written to show and explain certain problems, it can be easily improved, please keep this in mind when reusing the code!

  • Please do remember about getppid function. I have seen students programs passing parent pid as a parameter to the child process function.

  • Waiting for the signal with sigsuspend is a very common technique you must know. It is very well explained on GNU page linked above. The rule of the thumb is to block the anticipated signal first and for most of the program time. It gets unblocked at the moment program can wait - at sigsuspend call. Now the signal can influence our main code only in well defined points when it is not processing. It is a great advantage for us to limit the signals to certain moments only.

  • When above method is in use you can stop worrying about asynchronous codes, they are now synchronous and you can use more data types for communication via globals and have longer signal handlers.

  • Which counter gets skewed? Parent’s or child’s?

    Answer
    It must be the slower one, program can not count not sent signals, it can only lose some. Only the receiver can miss some of the signal thus the problem is in the parent process.

  • Why counters are shifted?

    Answer
    You probably blame signal merging but it has small chance to make any impact. The source of the problem is within sigsuspend as THERE IS NO GUARANTEE THAT DURING ONE CALL TO IT ONLY ONE SIGNAL WILL BE HANDLED! It is a very common misconception! Right after program executes the handler for SIGUSR2 in the duration of the same sigsuspend it executes the handler for SIGUSR1, global variable gets overwritten and parent process has no chance to count the SIGUSR2!

  • How can we run the program to lower SIGUSR2 merging chances to zero and still observe skewed counter?

    Answer
    Run with short brakes between signals and lots of SIGUSR1 between SIGUSR2. Now SIGUSR2 are very unlikely to merge as signals are separated in time by a lot of SIGUSR1, short brakes between signals rises the chance to have multiple handlers run in one sigsuspend.

  • Correct the above program to eliminate the above problem.

    Answer
    You can have a dedicated global variable only for SIGUSR2, increasing of the counter of SIGUSR2 can run in handler itself it will eliminate the problem of multiple SIGUSR2 handler call in one sigsuspend. Modify the counter printout and it is ready.

Low-level file operations and signals #

In this part of the tutorial, we will begin by showing what problems can arise during file operations while signals are being handled simultaneously, and then we will demonstrate how to deal with them.

Task #

Modify the previous program so that the parent process receives SIGUSR1 signals sent at a specified interval (parameter 1) and counts them. Additionally, the main process creates a file with a name given as parameter 4, containing a specified number of blocks of a given size (parameters 2 and 3). The content of the file should come from /dev/urandom. Each block should be copied separately, ensuring that the sizes are controlled. After copying a block, print to stderr the number of blocks actually copied and the state of the signal counters.

What the student must know:

  • man 4 urandom

Incorrect solution #

file prog16a.c:

#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/wait.h>
#include <time.h>
#include <unistd.h>

#define ERR(source) \
    (fprintf(stderr, "%s:%d\n", __FILE__, __LINE__), perror(source), kill(0, SIGKILL), exit(EXIT_FAILURE))

volatile sig_atomic_t sig_count = 0;

void sethandler(void (*f)(int), int sigNo)
{
    struct sigaction act;
    memset(&act, 0, sizeof(struct sigaction));
    act.sa_handler = f;

    if (-1 == sigaction(sigNo, &act, NULL))
        ERR("sigaction");
}

void sig_handler(int sig) { sig_count++; }

void child_work(int m)
{
    struct timespec t = {0, m * 10000};
    sethandler(SIG_DFL, SIGUSR1);

    while (1)
    {
        nanosleep(&t, NULL);
        if (kill(getppid(), SIGUSR1))
            ERR("kill");
    }
}

void parent_work(int b, int s, char *name)
{
    int i, in, out;
    ssize_t count;
    char *buf = malloc(s);
    if (!buf)
        ERR("malloc");

    if ((out = open(name, O_WRONLY | O_CREAT | O_TRUNC | O_APPEND, 0777)) < 0)
        ERR("open");

    if ((in = open("/dev/urandom", O_RDONLY)) < 0)
        ERR("open");

    for (i = 0; i < b; i++)
    {
        if ((count = read(in, buf, s)) < 0)
            ERR("read");

        if ((count = write(out, buf, count)) < 0)
            ERR("read");

        if (fprintf(stderr, "Block of %ld bytes transferred. Signals RX:%d\n", count, sig_count) < 0)
            ERR("fprintf");
    }

    if (close(in))
        ERR("close");
    if (close(out))
        ERR("close");
    free(buf);
    if (kill(0, SIGUSR1))
        ERR("kill");
}

void usage(char *name)
{
    fprintf(stderr, "USAGE: %s m b s \n", name);
    fprintf(stderr,
            "m - number of 1/1000 milliseconds between signals [1,999], "
            "i.e. one milisecond maximum\n");
    fprintf(stderr, "b - number of blocks [1,999]\n");
    fprintf(stderr, "s - size of of blocks [1,999] in MB\n");
    fprintf(stderr, "name of the output file\n");
    exit(EXIT_FAILURE);
}

int main(int argc, char **argv)
{
    int m, b, s;
    char *name;

    if (argc != 5)
        usage(argv[0]);

    m = atoi(argv[1]);
    b = atoi(argv[2]);
    s = atoi(argv[3]);
    name = argv[4];

    if (m <= 0 || m > 999 || b <= 0 || b > 999 || s <= 0 || s > 999)
        usage(argv[0]);

    sethandler(sig_handler, SIGUSR1);

    pid_t pid;
    if ((pid = fork()) < 0)
        ERR("fork");
    if (0 == pid)
        child_work(m);
    else
    {
        parent_work(b, s * 1024 * 1024, name);
        while (wait(NULL) > 0)
        {
        }
    }

    return EXIT_SUCCESS;
}

Remember: from /dev/random you can obtain truly random bytes but only in small amounts; from /dev/urandom the numbers are pseudo-random but available in unlimited quantities.
Whenever memory is allocated in a correct program, it must also be freed!
The permissions in the open function can also be given using constants (man 3p mknod). Due to strong tradition among programmers/administrators using octal notation and the ease of searching for such numbers in code, we do not consider this a stylistic error (“magic numbers” in this case are acceptable).

Problems #

After running the program with parameters 1 20 40 out.txt, you should observe the following problems:

  • Copying shorter blocks than expected. On my laptop, I never exceed 33,554,431 bytes even though it should be 40 MB, and shorter blocks sometimes occur as well. The reason is that the read operation is interrupted (DURING the transfer) by handling the signal.
  • Possible appearance of the error fprintf: Interrupted system call — the signal handler interrupts fprintf BEFORE it can display anything.
  • Similar messages may appear for open and close. This may be difficult to observe in this particular program, but according to POSIX, it is possible.
  • You can see that the parent process counts fewer signals than the child sends. Since the summing happens directly in a non-blocking signal handler, it is easy to guess that signal coalescing is happening — but why is it so strong in this program?
    Answer
    In this architecture (GNU/Linux), the CPU scheduler blocks the execution of signal handling during larger I/O operations, and during that time signals accumulate.

Notes and questions #

Why does the parent process send SIGUSR1 to the entire group at the end?

Answer
To terminate the child process.

How can the child process terminate upon receiving SIGUSR1 if it inherited the handler for this signal?

Answer
Immediately after starting, the child restores the default action for that signal, which ensures the process is killed.

Why doesn’t the parent kill itself with that signal?

Answer
It has the handler installed before it sends the signal to the group.

Can this strategy fail?

Answer
Yes, if the parent finishes its task before the child restores the default disposition for `SIGUSR1`.

Can this be improved? So that the parent always kills the child but does not risk killing itself prematurely?

Answer
Send `SIGUSR2` to the child.

Is this strategy for terminating a child always correct and easy to implement?

Answer
Only if the process being killed has no resources. Otherwise you must implement a handler for the terminating signal, which may not be easy.

Why do we check whether the allocated pointer is NULL after a memory allocation?

Answer
The system may be unable to allocate more memory; we must be prepared for this. Failing to check is a very common student mistake.

Couldn’t we make this buffer an automatic variable and avoid the code related to allocation and freeing?

Answer
Stacks on known architectures are not large enough for such huge variables (40 MB in the example). Typically the stack is only a few MB. If our buffer were small (a few KB) then we could do that.

Why are the permissions for the new file set to full (0777)?

Answer
`umask` will reduce the permissions. If we don’t require specific settings, this is a good strategy.

Solving the problems #

During I/O operations, functions can be interrupted by the signal handler. In such a case, the functions return -1 to indicate an error and set errno to EINTR. The POSIX standard states that in such a case, the function is interrupted before it achieves anything. Therefore, the correct and recommended response is to restart the function with the same parameters as before.

Manually handling this error can become inconvenient over time (especially when performing many I/O operations). For this reason, we can use the macro TEMP_FAILURE_RETRY, which is an extension provided by the GNU C library. Here you can read more about this function. To use the macros, we first need to define the _GNU_SOURCE, which gives us access to this kind of non-standard extensions.

solution, second stage, file prog16b.c:

#define _GNU_SOURCE
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/wait.h>
#include <time.h>
#include <unistd.h>

#define ERR(source) \
    (fprintf(stderr, "%s:%d\n", __FILE__, __LINE__), perror(source), kill(0, SIGKILL), exit(EXIT_FAILURE))

volatile sig_atomic_t sig_count = 0;

void sethandler(void (*f)(int), int sigNo)
{
    struct sigaction act;
    memset(&act, 0, sizeof(struct sigaction));
    act.sa_handler = f;
    if (-1 == sigaction(sigNo, &act, NULL))
        ERR("sigaction");
}

void sig_handler(int sig) { sig_count++; }

void child_work(int m)
{
    struct timespec t = {0, m * 10000};
    sethandler(SIG_DFL, SIGUSR1);

    while (1)
    {
        nanosleep(&t, NULL);
        if (kill(getppid(), SIGUSR1))
            ERR("kill");
    }
}

ssize_t bulk_read(int fd, char *buf, size_t count)
{
    ssize_t c;
    ssize_t len = 0;

    do
    {
        c = TEMP_FAILURE_RETRY(read(fd, buf, count));
        if (c < 0)
            return c;

        if (c == 0)
            return len;  // EOF

        buf += c;
        len += c;
        count -= c;
    } while (count > 0);

    return len;
}

ssize_t bulk_write(int fd, char *buf, size_t count)
{
    ssize_t c;
    ssize_t len = 0;

    do
    {
        c = TEMP_FAILURE_RETRY(write(fd, buf, count));
        if (c < 0)
            return c;

        buf += c;
        len += c;
        count -= c;
    } while (count > 0);

    return len;
}

void parent_work(int b, int s, char *name)
{
    int i, in, out;
    ssize_t count;

    char *buf = malloc(s);
    if (!buf)
        ERR("malloc");

    if ((out = TEMP_FAILURE_RETRY(open(name, O_WRONLY | O_CREAT | O_TRUNC | O_APPEND, 0777))) < 0)
        ERR("open");

    if ((in = TEMP_FAILURE_RETRY(open("/dev/urandom", O_RDONLY))) < 0)
        ERR("open");

    for (i = 0; i < b; i++)
    {
        if ((count = bulk_read(in, buf, s)) < 0)
            ERR("read");

        if ((count = bulk_write(out, buf, count)) < 0)
            ERR("read");

        if (TEMP_FAILURE_RETRY(fprintf(stderr, "Block of %ld bytes transferred. Signals RX:%d\n", count, sig_count)) <
            0)
            ERR("fprintf");
    }

    if (TEMP_FAILURE_RETRY(close(in)))
        ERR("close");
    if (TEMP_FAILURE_RETRY(close(out)))
        ERR("close");
    free(buf);
    if (kill(0, SIGUSR1))
        ERR("kill");
}

void usage(char *name)
{
    fprintf(stderr, "USAGE: %s m b s \n", name);
    fprintf(stderr,
            "m - number of 1/1000 milliseconds between signals [1,999], "
            "i.e. one milisecond maximum\n");
    fprintf(stderr, "b - number of blocks [1,999]\n");
    fprintf(stderr, "s - size of of blocks [1,999] in MB\n");
    fprintf(stderr, "name of the output file\n");
    exit(EXIT_FAILURE);
}

int main(int argc, char **argv)
{
    int m, b, s;
    char *name;

    if (argc != 5)
        usage(argv[0]);

    m = atoi(argv[1]);
    b = atoi(argv[2]);
    s = atoi(argv[3]);
    name = argv[4];

    if (m <= 0 || m > 999 || b <= 0 || b > 999 || s <= 0 || s > 999)
        usage(argv[0]);

    sethandler(sig_handler, SIGUSR1);

    pid_t pid;
    if ((pid = fork()) < 0)
        ERR("fork");
    if (0 == pid)
        child_work(m);
    else
    {
        parent_work(b, s * 1024 * 1024, name);
        while (wait(NULL) > 0)
        {
        }
    }

    return EXIT_SUCCESS;
}

Run the program as before — the errors disappear.

Notes and questions #

What other interruption in the program can the signal handler cause?

Answer
It can interrupt I/O operations or sleeping; these are not reported through `EINTR`. In both cases, handling the event is not trivial.

How do we know which functions can be interrupted before they achieve anything (EINTR)?

Answer
The man pages, in the section about returned errors. It is easy to guess: these are functions that may or must wait before doing anything.

Analyze how bulk_read and bulk_write work. You should know what cases are recognized in those functions, what types of interruption they can handle, how to recognize EOF on the descriptor.

Unlike during L1 lab, during L2 and following labs you have to use these functions (or similar ones) when calling read or write (because we use signals now). If you do not use them, you wont get points for your solution.

Both bulk_ functions are useful not only when dealing with signal handling or large I/O transfers, but also in situations where data is not available continuously — pipes/FIFOs/sockets - it will be covered by following tutorials.

All similar functions behave like read/write: fread/fwrite, send/recv.

As you know SA_RESTART flag can cause automatic restarts on delivery of a signal if this flag is set in the handler, it may not be apparent but this method has a lot of shortcomings:

  • You must control all the signal handlers used in the code, they all must be set with this flag, if one does not use this flag then you must handle EINTR as usual. It is easy to forget about this requirement if you extend/reuse the older code.
  • If you try to make some library functions (like bulk_read and write) you can not assume anything about the signals in the caller code.
  • It is hard to reuse a code depending on SA_RESTART flag, it can only be transferred to the similar strict handler control code.
  • Sometimes we want interruption notifications (e.g., sigsuspend loses meaning with SA_RESTART).

Why do we not check for errors other than EINTR after calling fprintf?

Answer
If we cannot write to stderr (usually the screen), then reporting an error is pointless.

Really big (f)printfs can get interrupted in the middle of the process (like write). Then it is difficult to restart the process especially if formatting is complicated. Avoid using printf where restarting would be critical (most cases except for the screen output) and the volume of transferred data is significant, use write instead.

Example tasks #

Do the example tasks. During the laboratory you will have more time and a starting code. If you do following tasks in the allotted time, it means that you are well-prepared.

Source codes presented in this tutorial #