Tutorial 2 - Processes, Signals and Descriptors #

Introduction notes:
Quick look at this material will not suffice, you should compile and run all the programs, check how they work, read additional materials like man pages. As you read the material please do all the exercises and questions. At the end you will find sample task similar to the one you will do during the labs, please do it at home.
You will find additional information in yellow sections, questions and tasks in blue ones. Under the question you will find the answer, to see it you have to click. Please try to answer on you own before checking.
Full programs’ codes are placed as attachments at the bottom of this page. On this page only vital parts of the code are displayed
Codes, information and tasks are organized in logical sequence, in order to fully understand it you should follow this sequence. Sometimes former task makes context for the next one and it is harder to comprehend it without the study of previous parts.
Most of exercises require command line to practice, I usually assume that all the files are placed in the current working folder and that we do not need to add path parts to file names.
Quite often you will find $ sign placed before commands you should run in the shell, obviously you do not need to rewrite this sight to command line, I put it there to remind you that it is a command to execute.
What you learn and practice in this tutorial will be required for the next ones. If you have a problem with this material after the graded lab you can still ask teachers for help.
This time some of the solutions are divided into two stages

Task 1 - processes #

Goal: Program creates ’n’ sub-processes (n is 1st program parameter), each of those processes waits for random [5-10]s time then prints its PID and terminates. Parent process prints the number of alive child processes every 3s. What you need to know:

man 3p fork
man 3p getpid
man 3p wait
man 3p waitpid
man 3p sleep
Job Control

solution, 1st stage prog13a.c:

#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/time.h>
#include <sys/wait.h>
#include <time.h>
#include <unistd.h>
#define ERR(source) \
    (fprintf(stderr, "%s:%d\n", __FILE__, __LINE__), perror(source), kill(0, SIGKILL), exit(EXIT_FAILURE))

void child_work(int i)
{
    srand(time(NULL) * getpid());
    int t = 5 + rand() % (10 - 5 + 1);
    sleep(t);
    printf("PROCESS with pid %d terminates\n", getpid());
}

void create_children(int n)
{
    pid_t s;
    for (n--; n >= 0; n--)
    {
        if ((s = fork()) < 0)
            ERR("Fork:");
        if (!s)
        {
            child_work(n);
            exit(EXIT_SUCCESS);
        }
    }
}

void usage(char *name)
{
    fprintf(stderr, "USAGE: %s 0<n\n", name);
    exit(EXIT_FAILURE);
}

int main(int argc, char **argv)
{
    int n;
    if (argc < 2)
        usage(argv[0]);
    n = atoi(argv[1]);
    if (n <= 0)
        usage(argv[0]);
    create_children(n);
    return EXIT_SUCCESS;
}

Use the general makefile (the last one) from the first tutorial, execute “make prog13a”

Make sure you know how the process group is created by shell, what processes belong to it?

Please note that macro ERR was extended with kill(0, SIGKILL), it is meant to terminate the whole program (all other processes) in case of error.

Provide zero as pid argument of kill and you can send a signal to all the processes in the group. It is very useful not to keep the PID’s list in your program.

Please notice that we do not test for errors inside of ERR macro (during error reporting), it is so to keep the program action at minimal level at emergency exit. What else can we do ? Call ERR recursively and have the same errors again?

Why after you run this program the command line returns immediately while processes are still working?

Answer

Parent process is not waiting for child processes, no wait or waitpid call. It will be fixed in the 2nd stage.

How to check the current parent of the created sub-processes (after the initial parent quits)? Why this process?

Answer

Right after the command line returns run: $ps -f, you should see that the PPID (parent PID) is 1 (init/systemd). It is caused by premature end of parent process, the orphaned processes can not "hang" outside of process three so they have to be attached somewhere. To make it simple, it is not the shell but the first process in the system.

Random number generator seed is set in child process, can it be moved to parent process? Will it affect the program?

Answer

Child processes will get the same "random" numbers because they will have the same random seed. Seeding can not be moved to parent.

Can we change the seed from PID to time() call?

Answer

No. Time you get from time() is returned in seconds since 1970, in most cases all sub-processes will have the same seed and will get the same (not random) numbers.

Try to derive a formula to get random number from the range [A,B], it should be obvious.

How this program works if you remove the exit call in child code (right after child_work call)?

Answer

Child process after exiting the child_work will continue back into forking loop! It will start it's own children. Grandchildren can start their children and so on. To mess it up a bit more child processes do not wait for their children.

How many processes will be started in above case if you supply 3 as starting parameter?

Answer

1 parent 3 children, 3 grand children and 1 grand grand child, 8 in total, draw a process three for it, tag the branches with current (on fork) n value.

What sleep returns? Should we react to this value somehow?

Answer

It returns the time left to sleep at the moment of interruption bu signal handling function. In this code child processes does not receive nor handle the signals so this interruption is not possible. In other codes it may be vital to restart sleep with remaining time.

In the next stage child waiting and child counting will be added. How can we know how many child processes have exited?

Answer

SIGCHLD counting will not be precise as signals can marge, the only sure method is to count successful calls to wait or waitpid.

solution 2nd stage prog13b.c:

#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/time.h>
#include <sys/wait.h>
#include <time.h>
#include <unistd.h>
#define ERR(source) \
    (fprintf(stderr, "%s:%d\n", __FILE__, __LINE__), perror(source), kill(0, SIGKILL), exit(EXIT_FAILURE))

void child_work(int i)
{
    srand(time(NULL) * getpid());
    int t = 5 + rand() % (10 - 5 + 1);
    sleep(t);
    printf("PROCESS with pid %d terminates\n", getpid());
}

void create_children(int n)
{
    pid_t s;
    for (n--; n >= 0; n--)
    {
        if ((s = fork()) < 0)
            ERR("Fork:");
        if (!s)
        {
            child_work(n);
            exit(EXIT_SUCCESS);
        }
    }
}

void usage(char *name)
{
    fprintf(stderr, "USAGE: %s 0<n\n", name);
    exit(EXIT_FAILURE);
}

int main(int argc, char **argv)
{
    int n;
    if (argc < 2)
        usage(argv[0]);
    n = atoi(argv[1]);
    if (n <= 0)
        usage(argv[0]);
    create_children(n);
    while (n > 0)
    {
        sleep(3);
        pid_t pid;
        for (;;)
        {
            pid = waitpid(0, NULL, WNOHANG);
            if (pid > 0)
                n--;
            if (0 == pid)
                break;
            if (0 >= pid)
            {
                if (ECHILD == errno)
                    break;
                ERR("waitpid:");
            }
        }
        printf("PARENT: %d processes remain\n", n);
    }
    return EXIT_SUCCESS;
}

It is worth knowing that waitpid can tell us about temporary lack of terminated children (returns zero) and about permanent lack of them (error ECHILD). The second case is not a critical error, your code should expect it.

Why waitpid is in a loop?

Answer

we do not know how many zombie processes are there to collect, it can be from zero to n of them.

Why waitpid has the WNOHANG flag on?

Answer

we do not want to wait for alive child processes as we have to report the counter every 3 sec. to the user

Why zero in place of pid in waitpid call?

Answer

We want to wait for any child process, we do not need to know children pids, zero means any of them.

Does this program encounter signals?

Answer

Yes - SIGCHILD. there is no handling routine but in this case it's alright, children are handled promptly by the above loop.

Shouldn’t we check sleep return value as we have signals in this code?

Answer

No, as we do not handle them.

Task 2 - signals #

Goal: Program takes 4 positional parameters (n,k,p i r) and creates n sub-processes. Parent process sends sequentially SIGUSR1 and SIGUSR2 to all sub-processes in a loop with delays of k and p seconds (k sec. before SIGUSR1 and p sec. before SIGUSR2). Parent process terminates after all its sub-processes. Each sub-process determines its own random time delay [5,10] sec. and in a loop sleeps this time and prints SUCCESS on the stdout if the last signal received was SIGUSR1 or FAILURE if it was SIGUSER2. This loop iterates r times.

What you need to know:

man 7 signal
man 3p sigaction
man 3p nanosleep
man 3p alarm
man 3p memset
man 3p kill

solution prog14.c:

volatile sig_atomic_t last_signal = 0;

void sethandler( void (*f)(int), int sigNo) {
        struct sigaction act;
        memset(&act, 0, sizeof(struct sigaction));
        act.sa_handler = f;
        if (-1==sigaction(sigNo, &act, NULL)) ERR("sigaction");
}

void sig_handler(int sig) {
        printf("[%d] received signal %d\n", getpid(), sig);
        last_signal = sig;
}

void sigchld_handler(int sig) {
        pid_t pid;
        for(;;){
                pid=waitpid(0, NULL, WNOHANG);
                if(pid==0) return;
                if(pid<=0) {
                        if(errno==ECHILD) return;
                        ERR("waitpid");
                }
        }
}

To exchange the data between signal handling routine and the main code we must use global variables, please remember that it is an exceptional situation as in general we do not use global variables. Also please remember that global variables are not shared between related processes. It should be obvious but sometimes students get confused.

The type of global variable used for this communication is fixed to be volatile sig_atomic_t, it is the only CORRECT and SAFE type you can use here. The reason for this originates from asynchronous nature of the interruption. Primo " volatile" means that compiler optimization is off, it is critical to not let the compiler eradicate the variable that is not changing in the loop from the loop condition. With the optimization on while(work) may become while(1) as the compiler in unable to determine the asynchronous change of “work” variable. Secundo sig_atomic_t is the biggest integer variable that can be calculated and compared in single CPU cycle. If you try bigger integer simple comparison a==0 can get interrupted and already compared bits may be altered during the comparison!

As you can see not much data can be exchanged between the handling function and the rest of the code, only simple integers/states. Additionally we should not interrupt the main code for long. Putting that all together, we should always keep the most of program logic in the main code, do not put much in the signal functions, it should be as short as possible, only very simple expressions, assignment, increments and alike.

memset used to initialize the structures is quite often useful, especially if you do not know all the members of the structure (quite often only part of members is described in man page, internally used members are unknown).

SIGCHLD handling function has a very similar code to what you have seen in first stage.

Do we expect more than one terminated child during SIGCHLD handling?

Answer

Yes, signals can merge, another child can terminate at the very moment of signal handling.

Do we expect zero terminated children at this handler? See ahead at the end of main.

Answer

Yes, wait at the end of main can catch the child before SIGCHLD function does, then the function is left with zero children.It is a race condition.

solution prog14.c:

#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/wait.h>
#include <time.h>
#include <unistd.h>

#define ERR(source) \
    (fprintf(stderr, "%s:%d\n", __FILE__, __LINE__), perror(source), kill(0, SIGKILL), exit(EXIT_FAILURE))

volatile sig_atomic_t last_signal = 0;

void sethandler(void (*f)(int), int sigNo)
{
    struct sigaction act;
    memset(&act, 0, sizeof(struct sigaction));
    act.sa_handler = f;
    if (-1 == sigaction(sigNo, &act, NULL))
        ERR("sigaction");
}

void sig_handler(int sig)
{
    printf("[%d] received signal %d\n", getpid(), sig);
    last_signal = sig;
}

void sigchld_handler(int sig)
{
    pid_t pid;
    for (;;)
    {
        pid = waitpid(0, NULL, WNOHANG);
        if (pid == 0)
            return;
        if (pid <= 0)
        {
            if (errno == ECHILD)
                return;
            ERR("waitpid");
        }
    }
}

void child_work(int l)
{
    int t, tt;
    srand(getpid());
    t = rand() % 6 + 5;
    while (l-- > 0)
    {
        for (tt = t; tt > 0; tt = sleep(tt))
            ;
        if (last_signal == SIGUSR1)
            printf("Success [%d]\n", getpid());
        else
            printf("Failed [%d]\n", getpid());
    }
    printf("[%d] Terminates \n", getpid());
}

void parent_work(int k, int p, int l)
{
    struct timespec tk = {k, 0};
    struct timespec tp = {p, 0};
    sethandler(sig_handler, SIGALRM);
    alarm(l * 10);
    while (last_signal != SIGALRM)
    {
        nanosleep(&tk, NULL);
        if (kill(0, SIGUSR1) < 0)
            ERR("kill");
        nanosleep(&tp, NULL);
        if (kill(0, SIGUSR2) < 0)
            ERR("kill");
    }
    printf("[PARENT] Terminates \n");
}

void create_children(int n, int l)
{
    while (n-- > 0)
    {
        switch (fork())
        {
            case 0:
                sethandler(sig_handler, SIGUSR1);
                sethandler(sig_handler, SIGUSR2);
                child_work(l);
                exit(EXIT_SUCCESS);
            case -1:
                perror("Fork:");
                exit(EXIT_FAILURE);
        }
    }
}

void usage(void)
{
    fprintf(stderr, "USAGE: signals n k p l\n");
    fprintf(stderr, "n - number of children\n");
    fprintf(stderr, "k - Interval before SIGUSR1\n");
    fprintf(stderr, "p - Interval before SIGUSR2\n");
    fprintf(stderr, "l - lifetime of child in cycles\n");
    exit(EXIT_FAILURE);
}

int main(int argc, char **argv)
{
    int n, k, p, l;
    if (argc != 5)
        usage();
    n = atoi(argv[1]);
    k = atoi(argv[2]);
    p = atoi(argv[3]);
    l = atoi(argv[4]);
    if (n <= 0 || k <= 0 || p <= 0 || l <= 0)
        usage();
    sethandler(sigchld_handler, SIGCHLD);
    sethandler(SIG_IGN, SIGUSR1);
    sethandler(SIG_IGN, SIGUSR2);
    create_children(n, l);
    parent_work(k, p, l);
    while (wait(NULL) > 0)
        ;
    return EXIT_SUCCESS;
}

Please notice that sleep and alarm function can conflict, according to POSIX sleep can be implemented on SIGALRM and there is no way to nest signals. Never nest them or use nanosleep instead of sleep as in the code above.

Kill function is invoked with zero pid, means it is sending signal to whole group of processes, we do not need to keep track of pids but do notice that the signal will be delivered to the sender as well!

The location of setting of signal handling and blocking is not trivial, please analyze the example and answer the questions below. Always plan in advance the reactions to signals in your program, this is a common student mistake to overlook the problem.

Why sleep is in a loop, can the sleep time be exact in this case?

Answer

It gets interrupted by signal hadling, restart is a must. Sleep returns the remaining time rounded up to seconds so it can not be precise.

What is default disposition of most of the signals (incl. SIGUSR1 and 2)?

Answer

Most not handled signals will kill the receiver. In this example the lack of handling, ignoring or blocking of SIGUSR1 and 2 would kill the children.

How sending of SIGUSR1 and 2 to the process group affects the program?

Answer

Parent process has to be immune to them, the simplest solution is to ignore them.

What would happen if you turn this ignoring off?

Answer

Parent would kill itself with first signal sent.

Can we shift the signal ignoring setup past the create_children? Child processes set their own signal disposition right at the start, do they need this ignoring?

Answer

They do need it, if you shift the setup and there is no ignoring inherited from the parent process it may happen (rare case but possible) that child process gets created but didn't start its code yet. Immediately after the creation, CPU slice goes to the parent that continues its code and sends the SIGUSR1 signal to the children. If then CPU slice goes back to the child, signal default disposition will kill it before it has a chance to set up its own handler!

Can we modify this program to avoid ignoring in the code?

Answer

In this program both child and a parent can have the same signal handling routines for SIGUSR1 and 2, you can set it just before fork and it will solve the problem.

Would shifting the setup of SIGCHLD handler past the fork change the program?

Answer

If one of offspring "dies" very quickly (before parent sets its SIGCHLD handler) it will be a zombi until another offspring terminates. It is not a mayor mistake but it's worth attention.

Is wait call at the end of parent really needed? Parent waits long enough for children to finish, right?

Answer

Calculated time may not suffice, in overloaded system expect lags of any duration (few seconds and more), without "wait" children can terminate after the parent because of those lags.

Task 3 - signal waiting #

Goal: Program starts one child process, which sends every “m” (parameter) microseconds a SIGUSR1 signal to the parent. Every n-th signal is changed to SIGUSR2. Parent anticipates SIGUSR2 and counts the amount of signals received. Child process also counts the amount of SIGUSR2 sent. Both processes print out the counted amounts at each signal operation. We reuse some functions from previous code.

What you need to know:

man 3p sigsuspend
Glibc signal waiting here
man 3p getppid
man 3p pthread_sigmask (sigprocmask only)
man 3p sigaddset
man 3p sigemptyset

solution part prog15.c:

#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/wait.h>
#include <time.h>
#include <unistd.h>

#define ERR(source) \
    (fprintf(stderr, "%s:%d\n", __FILE__, __LINE__), perror(source), kill(0, SIGKILL), exit(EXIT_FAILURE))

volatile sig_atomic_t last_signal = 0;

void sethandler(void (*f)(int), int sigNo)
{
    struct sigaction act;
    memset(&act, 0, sizeof(struct sigaction));
    act.sa_handler = f;
    if (-1 == sigaction(sigNo, &act, NULL))
        ERR("sigaction");
}

void sig_handler(int sig) { last_signal = sig; }

void sigchld_handler(int sig)
{
    pid_t pid;
    for (;;)
    {
        pid = waitpid(0, NULL, WNOHANG);
        if (pid == 0)
            return;
        if (pid <= 0)
        {
            if (errno == ECHILD)
                return;
            ERR("waitpid");
        }
    }
}

void child_work(int m, int p)
{
    int count = 0;
    struct timespec t = {0, m * 10000};
    while (1)
    {
        for (int i = 0; i < p; i++)
        {
            nanosleep(&t, NULL);
            if (kill(getppid(), SIGUSR1))
                ERR("kill");
        }
        nanosleep(&t, NULL);
        if (kill(getppid(), SIGUSR2))
            ERR("kill");
        count++;
        printf("[%d] sent %d SIGUSR2\n", getpid(), count);
    }
}

void parent_work(sigset_t oldmask)
{
    int count = 0;
    while (1)
    {
        last_signal = 0;
        while (last_signal != SIGUSR2)
            sigsuspend(&oldmask);
        count++;
        printf("[PARENT] received %d SIGUSR2\n", count);
    }
}

void usage(char *name)
{
    fprintf(stderr, "USAGE: %s m  p\n", name);
    fprintf(stderr,
            "m - number of 1/1000 milliseconds between signals [1,999], "
            "i.e. one milisecond maximum\n");
    fprintf(stderr, "p - after p SIGUSR1 send one SIGUSER2  [1,999]\n");
    exit(EXIT_FAILURE);
}

int main(int argc, char **argv)
{
    int m, p;
    if (argc != 3)
        usage(argv[0]);
    m = atoi(argv[1]);
    p = atoi(argv[2]);
    if (m <= 0 || m > 999 || p <= 0 || p > 999)
        usage(argv[0]);
    sethandler(sigchld_handler, SIGCHLD);
    sethandler(sig_handler, SIGUSR1);
    sethandler(sig_handler, SIGUSR2);
    sigset_t mask, oldmask;
    sigemptyset(&mask);
    sigaddset(&mask, SIGUSR1);
    sigaddset(&mask, SIGUSR2);
    sigprocmask(SIG_BLOCK, &mask, &oldmask);
    pid_t pid;
    if ((pid = fork()) < 0)
        ERR("fork");
    if (0 == pid)
        child_work(m, p);
    else
    {
        parent_work(oldmask);
        while (wait(NULL) > 0)
            ;
    }
    sigprocmask(SIG_UNBLOCK, &mask, NULL);
    return EXIT_SUCCESS;
}

The program terminates on SIGINT (C-c)

Try it with various parameters. The shorter microsecond brake and more frequent SIGUSER2 the faster growing gap between counters should be observable. In a moment the difference in numbers will be explained. If you do not observe the shift between counters let the program run a bit longer - 1 minute should do.

This code was written to show and explain certain problems, it can be easily improved, please keep this in mind when reusing the code!

Please do remember about getppid function. I have seen students programs passing parent pid as a parameter to the child process function.

Waiting for the signal with sigsuspend is a very common technique you must know. It is very well explained on GNU page linked above. The rule of the thumb is to block the anticipated signal first and for most of the program time. It gets unblocked at the moment program can wait - at sigsuspend call. Now the signal can influence our main code only in well defined points when it is not processing. It is a great advantage for us to limit the signals to certain moments only.

When above method is in use you can stop worrying about asynchronous codes, they are now synchronous and you can use more data types for communication via globals and have longer signal handlers.

Which counter gets skewed? Parent’s or child’s?

Answer

It must be the slower one, program can not count not sent signals, it can only lose some. Only the receiver can miss some of the signal thus the problem is in the parent process.

Why counters are shifted?

Answer

You probably blame signal merging but it has small chance to make any impact. The source of the problem is within sigsuspend as THERE IS NO GUARANTEE THAT DURING ONE CALL TO IT ONLY ONE SIGNAL WILL BE HANDLED! It is a very common misconception! Right after program executes the handler for SIGUSR2 in the duration of the same sigsuspend it executes the handler for SIGUSR1, global variable gets overwritten and parent process has no chance to count the SIGUSR2!

How can we run the program to lower SIGUSR2 merging chances to zero and still observe skewed counter?

Answer

Run with short brakes between signals and lots of SIGUSR1 between SIGUSR2. Now SIGUSR2 are very unlikely to merge as signals are separated in time by a lot of SIGUSR1, short brakes between signals rises the chance to have multiple handlers run in one sigsuspend.

Correct the above program to eliminate the above problem.

Answer

You can have a dedicated global variable only for SIGUSR2, increasing of the counter of SIGUSR2 can run in handler itself it will eliminate the problem of multiple SIGUSR2 handler call in one sigsuspend. Modify the counter printout and it is ready.

Task 4 - low level file access and signals #

Goal: Modify task 3 code. Parent receives SIGUSR1 form child at set interval (1st parameter) and counts them. Additionally parent process creates a file of set (2nd parameter) amount of blocks of set size (3rd parameter) with a name given as 4th parameter. The content of the file is a copy of data read from /dev/urandom. Each block must be copied separately with sizes control. After each copy operation program prints the effective amount of data transferred and the amount of received signals on the stderr. What you need to know:

man 4 urandom

This task has two stages.

solution 1st stage, parts of prog16a.c:

#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/wait.h>
#include <time.h>
#include <unistd.h>

#define ERR(source) \
    (fprintf(stderr, "%s:%d\n", __FILE__, __LINE__), perror(source), kill(0, SIGKILL), exit(EXIT_FAILURE))

volatile sig_atomic_t sig_count = 0;

void sethandler(void (*f)(int), int sigNo)
{
    struct sigaction act;
    memset(&act, 0, sizeof(struct sigaction));
    act.sa_handler = f;
    if (-1 == sigaction(sigNo, &act, NULL))
        ERR("sigaction");
}

void sig_handler(int sig)
{
    sig_count++;
    ;
}

void child_work(int m)
{
    struct timespec t = {0, m * 10000};
    sethandler(SIG_DFL, SIGUSR1);
    while (1)
    {
        nanosleep(&t, NULL);
        if (kill(getppid(), SIGUSR1))
            ERR("kill");
    }
}

void parent_work(int b, int s, char *name)
{
    int i, in, out;
    ssize_t count;
    char *buf = malloc(s);
    if (!buf)
        ERR("malloc");
    if ((out = open(name, O_WRONLY | O_CREAT | O_TRUNC | O_APPEND, 0777)) < 0)
        ERR("open");
    if ((in = open("/dev/urandom", O_RDONLY)) < 0)
        ERR("open");
    for (i = 0; i < b; i++)
    {
        if ((count = read(in, buf, s)) < 0)
            ERR("read");
        if ((count = write(out, buf, count)) < 0)
            ERR("read");
        if (fprintf(stderr, "Block of %ld bytes transfered. Signals RX:%d\n", count, sig_count) < 0)
            ERR("fprintf");
        ;
    }
    if (close(in))
        ERR("close");
    if (close(out))
        ERR("close");
    free(buf);
    if (kill(0, SIGUSR1))
        ERR("kill");
}

void usage(char *name)
{
    fprintf(stderr, "USAGE: %s m b s \n", name);
    fprintf(stderr,
            "m - number of 1/1000 milliseconds between signals [1,999], "
            "i.e. one milisecond maximum\n");
    fprintf(stderr, "b - number of blocks [1,999]\n");
    fprintf(stderr, "s - size of of blocks [1,999] in MB\n");
    fprintf(stderr, "name of the output file\n");
    exit(EXIT_FAILURE);
}

int main(int argc, char **argv)
{
    int m, b, s;
    char *name;
    if (argc != 5)
        usage(argv[0]);
    m = atoi(argv[1]);
    b = atoi(argv[2]);
    s = atoi(argv[3]);
    name = argv[4];
    if (m <= 0 || m > 999 || b <= 0 || b > 999 || s <= 0 || s > 999)
        usage(argv[0]);
    sethandler(sig_handler, SIGUSR1);
    pid_t pid;
    if ((pid = fork()) < 0)
        ERR("fork");
    if (0 == pid)
        child_work(m);
    else
    {
        parent_work(b, s * 1024 * 1024, name);
        while (wait(NULL) > 0)
            ;
    }
    return EXIT_SUCCESS;
}

Do remember that you can read good quality really random bytes from /dev/random file but the amount is limited or read unlimited amount of data from /dev/urandom but these are pseudo random bytes.

You should see the following flaws if you run the program with 1 20 40 out.txt params:

Coping of blocks shorter than 40Mb, in my case it was at most 33554431, it is due to signal interruption DURING the IO operation

fprintf: Interrupted system call - function interrupted by signal handling BEFORE it did anything

Similar messages for open and close - it may be hard to observe in this program but it is possible and described by POSIX documentation.

How to get rid of those flows is explained in the 2nd stage.

If there is a memory allocation in your code, there also HAS to be a memory release! Always.

Permissions passed to open function can also be expressed with predefined constants (man 3p mknod). As octal permission representation is well recognized by programmers and administrators it can also be noted in this way and will not be considered as “magic number” style mistake. It is fairly easy to trace those constants in the code.

Obviously the parent counts less signals than child sends, as summing runs inside the handler we can only blame merging for it. Can you tell why signal merging is so strong in this code?

Answer

In this architecture (GNU/Linux) CPU planer blocks signals during IO operations (to some size as we can see) and during IO signals have more time to merge.

What for the SIGUSR1 is sent to the process group at the end of the parent process?

Answer

To terminate the child.

How come it works? SIGUSR1 handling is inherited from the parent?

Answer

Child first action is to restore default signal disposition - killing of the receiver.

Why parent does not kill itself with this signal?

Answer

It sets the handler for SIGUSR1 before it sends it to the group.

Can this strategy fail?

Answer

Yes, if parent process finishes it's job before child is able to even start the code and reset SIGUSR1 disposition.

Can you improve it and at the same time not kill the parent with the signal from a child?

Answer

send SIGUSR2 to the child.

Is this child (children) termination strategy easy and correct at the same time in all possible programs?

Answer

Only if child processe does not have resources to release, if it has something to release you must add proper signal handling and this may be complicated.

Why to check if a pointer to newly allocated memory is not null?

Answer

Operating system may not be able to grant your program additional memory, in this case it reports the error with the NULL. You must be prepared for it. The lack of this check is a common students' mistake.

Can you turn the allocated buffer into automatic variable and avoid the effort of allocating and releasing the memory?

Answer

I don't know about OS architecture that uses stacks large enough to accommodate 40MB, typical stack has a few MB at most. For smaller buffers (a few KB) it can work.

Why permissions of a newly created file are supposed to be full (0777)? Are they really full?

Answer

umask will reduce the permissions, if no set permissions are required it is a good idea to allow the umask to regulate the effective rights

solution 2nd stage, parts of prog16b.c:

#define _GNU_SOURCE
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/wait.h>
#include <time.h>
#include <unistd.h>

#define ERR(source) \
    (fprintf(stderr, "%s:%d\n", __FILE__, __LINE__), perror(source), kill(0, SIGKILL), exit(EXIT_FAILURE))

volatile sig_atomic_t sig_count = 0;

void sethandler(void (*f)(int), int sigNo)
{
    struct sigaction act;
    memset(&act, 0, sizeof(struct sigaction));
    act.sa_handler = f;
    if (-1 == sigaction(sigNo, &act, NULL))
        ERR("sigaction");
}

void sig_handler(int sig) { sig_count++; }

void child_work(int m)
{
    struct timespec t = {0, m * 10000};
    sethandler(SIG_DFL, SIGUSR1);
    while (1)
    {
        nanosleep(&t, NULL);
        if (kill(getppid(), SIGUSR1))
            ERR("kill");
    }
}

ssize_t bulk_read(int fd, char *buf, size_t count)
{
    ssize_t c;
    ssize_t len = 0;
    do
    {
        c = TEMP_FAILURE_RETRY(read(fd, buf, count));
        if (c < 0)
            return c;
        if (c == 0)
            return len;  // EOF
        buf += c;
        len += c;
        count -= c;
    } while (count > 0);
    return len;
}

ssize_t bulk_write(int fd, char *buf, size_t count)
{
    ssize_t c;
    ssize_t len = 0;
    do
    {
        c = TEMP_FAILURE_RETRY(write(fd, buf, count));
        if (c < 0)
            return c;
        buf += c;
        len += c;
        count -= c;
    } while (count > 0);
    return len;
}

void parent_work(int b, int s, char *name)
{
    int i, in, out;
    ssize_t count;
    char *buf = malloc(s);
    if (!buf)
        ERR("malloc");
    if ((out = TEMP_FAILURE_RETRY(open(name, O_WRONLY | O_CREAT | O_TRUNC | O_APPEND, 0777))) < 0)
        ERR("open");
    if ((in = TEMP_FAILURE_RETRY(open("/dev/urandom", O_RDONLY))) < 0)
        ERR("open");
    for (i = 0; i < b; i++)
    {
        if ((count = bulk_read(in, buf, s)) < 0)
            ERR("read");
        if ((count = bulk_write(out, buf, count)) < 0)
            ERR("read");
        if (TEMP_FAILURE_RETRY(fprintf(stderr, "Block of %ld bytes transfered. Signals RX:%d\n", count, sig_count)) < 0)
            ERR("fprintf");
    }
    if (TEMP_FAILURE_RETRY(close(in)))
        ERR("close");
    if (TEMP_FAILURE_RETRY(close(out)))
        ERR("close");
    free(buf);
    if (kill(0, SIGUSR1))
        ERR("kill");
}

void usage(char *name)
{
    fprintf(stderr, "USAGE: %s m b s \n", name);
    fprintf(stderr,
            "m - number of 1/1000 milliseconds between signals [1,999], "
            "i.e. one milisecond maximum\n");
    fprintf(stderr, "b - number of blocks [1,999]\n");
    fprintf(stderr, "s - size of of blocks [1,999] in MB\n");
    fprintf(stderr, "name of the output file\n");
    exit(EXIT_FAILURE);
}

int main(int argc, char **argv)
{
    int m, b, s;
    char *name;
    if (argc != 5)
        usage(argv[0]);
    m = atoi(argv[1]);
    b = atoi(argv[2]);
    s = atoi(argv[3]);
    name = argv[4];
    if (m <= 0 || m > 999 || b <= 0 || b > 999 || s <= 0 || s > 999)
        usage(argv[0]);
    sethandler(sig_handler, SIGUSR1);
    pid_t pid;
    if ((pid = fork()) < 0)
        ERR("fork");
    if (0 == pid)
        child_work(m);
    else
    {
        parent_work(b, s * 1024 * 1024, name);
        while (wait(NULL) > 0)
            ;
    }
    return EXIT_SUCCESS;
}

Run it with the same parameters as before - flaws are gone now.

What error code EINTR represents?

Answer

This is not an error, it is a way for OS to inform the program that the signal handler has been invoked

How should you react to EINTR?

Answer

Unlike real errors do not exit the program, in most cases to recover the problem simply restart the interrupted function with the same set of parameters as in initial call.

At what stage functions are interrupted if EINTR is reported

Answer

Only before they start doing their job - in waiting stage. This means that you can safely restart with the same arguments all the functions used in OPS tutorials except "connect" (OPS2 sockets)

What are other types of interruption signal handler can cause?

Answer

IO transfer can be interrupted in the middle, this case is not reported with EINTR. Sleep and nanosleep similar. In both cases restarting can not reuse the same parameters, it gets complicated.

How do you know what function cat report EINTR?

Answer

Read man pages, error sections. It easy to guess those function must wait before they do their job.

Analyze how bulk_read and bulk_write work. You should know what cases are recognized in those functions, what types of interruption they can handle, how to recognize EOF on the descriptor. Unlike during L1 lab, during L2 and following labs you have to use these functions (or similar ones) when calling read or write (because we use signals now). If you do not use them, you wont get points for your solution.

Both bulk_ functions can be useful not only on signals but also to “glue” IO transfers where data comes from not continuous data sources like pipe/fifo and the socket - it wile be covered by following tutorials.

Not only read/write can be interrupted in the described way, the problem applies to the related pairs like fread/fwrite and send/recv.

As you know SA_RESTART flag can cause automatic restarts on delivery of a signal if this flag is set in the handler, it may not be apparent but this method has a lot of shortcomings:

You must control all the signal handlers used in the code, they all must be set with this flag, if one does not use this flag then you must handle EINTR as usual. It is easy to forget about this requirement if you extend/reuse the older code.

If you try to make some library functions (like bulk_read and write) you can not assume anything about the signals in the caller code.

It is hard to reuse a code depending on SA_RESTART flag, it can only be transferred to the similar strict handler control code.

Sometimes you wish to know about interruption ASAP to react quickly. Sigsuspend would not work if you use this flag!

Why do we not react on other (apart from EINTR) errors of fprintf? If program can not write on stderr (most likely screen) then it cannot report errors.

Really big (f)printfs can get interrupted in the middle of the process (like write). Then it is difficult to restart the process especially if formatting is complicated. Avoid using printf where restarting would be critical (most cases except for the screen output) and the volume of transferred data is significant, use write instead.

Do the example tasks. During the laboratory you will have more time and a starting code. If you do following tasks in the allotted time, it means that you are well-prepared.

Task 1 ~75 minutes
Task 2 ~120 minutes
Task 3 ~120 minutes

L2 - Processes, Signals and Descriptors

Tutorial 2 - Processes, Signals and Descriptors #

Task 1 - processes #

Task 2 - signals #

Task 3 - signal waiting #

Task 4 - low level file access and signals #

Source codes presented in this tutorial #