Linux system-process concept

Zero, Preface

This chapter mainly explains some basic conceptual knowledge of operating systems to pave the way for learning processes.

1. Von Neumann architecture

  • concept:

The von Neumann architecture specifies the data flow on the hardware, and most computers comply with the von Neumann architecture, such as notebooks, servers, etc.

  • Diagram:
  • Basic computer hardware components:
  1. Input unit: including keyboard, mouse, scanner, writing tablet, etc.

  2. Central processing unit (CPU): includes arithmetic units and controllers, etc.

  3. Output unit: monitor, printer, etc.

  4. Memory: physical memory

Note: Input units and output units are collectively referred to as peripherals

  • working principle:

When executing the program, the data of the input device is first preloaded into the memory, and the data is handed over to the CPU for processing through the memory. The CPU then prewrites the obtained results back to the memory, and finally the memory refreshes the results to the output device.

  • Notice:
  1. Regardless of the cache, the CPU can and can only read and write memory, and cannot access peripherals (input or output devices).

  2. Peripherals (input or output devices) need to input or output data, and they can only write to or read from memory.

  3. All devices can only deal with memory directly

  • Example: The data flow process of logging in to QQ and starting to chat with a friend
  1. Your computer: the input device keyboard writes the data to the memory, the memory sends the data to the CPU, the CPU gets the result and writes the data to the memory, and finally the memory refreshes the data to the network card.

  2. Friend’s computer: The input device network card writes the data to the memory, the memory then sends the data to the CPU, the CPU gets the result and writes the data to the memory, and finally the memory refreshes the data to the monitor.

  • Why running programs must be loaded into memory first:

This is not only stipulated by the von Neumann architecture, but also caused by the memory hierarchy.

  • Diagram: Memory Hierarchy
  1. Memory is cheaper and slower, while CPU is fast but expensive. In order to balance speed and economy, the CPU is generally not too large, so the CPU cannot directly store data.

  2. The input and output efficiency of peripherals will be slower. Direct communication with the CPU will cause the entire program to become sluggish, so the memory acts as an intermediary to communicate directly with all devices.

2. Operating system

  • concept:

Any computer system contains a basic collection of programs called an operating system (OS)

  • Operating systems include:
  1. Kernel (process management, memory management, file management, driver management)

  2. Other programs (such as function libraries, shell programs, etc.)

  • The purpose of designing OS:
  1. Interact with hardware and manage all software and hardware resources

  2. Provide a good execution environment for user programs (applications)

Note: The OS needs to protect the system software and hardware, so it does not trust any user. You need to access the system software and hardware through the OS.

  • position:

Operating system is also called software that manages software and hardware resources.

  • How to understand “management”:
  1. As a manager, the operating system mainly makes decisions

  2. The driver under the operating system serves as the executor and makes the decision to execute the operating system.

  3. The final underlying hardware is a managed role

  • Diagram:
  • How to manage:
  1. Description: First describe the managed person and obtain its attribute data (the data is written into the struct structure)

  2. Organization: Use linked lists or other efficient data structures to manage data (structures), that is, operate data to achieve management effects.

  • System call and library function concepts:
  1. From a development perspective, the operating system will appear as a whole to the outside world, but it will expose some of its interfaces for upper-layer development. This part of the interface provided by the operating system is called a system call.

  2. The functions of system calls are relatively basic in use, but the requirements for users are relatively high, so developers appropriately encapsulate some system calls to form libraries, which is conducive to secondary development by higher-level users or developers.

3. Process

  • concept:
  1. A process is an execution instance of a program. From the perspective of the kernel, a process is the entity responsible for allocating system resources (CPU time, memory)

  2. That is, process = program + kernel PCB

1. Describe the process-PCB

  • concept:
  1. Process information is placed in a data structure called the process control block, which can be understood as a collection of process attributes; the structure describing the process in Linux is called task_struct

  2. The PCB under the Linux operating system is: a type of task_struct-PCB. task_struct is a data structure of the Linux kernel. It will be loaded into RAM (memory) and contains process information.

  • task_struct content classification:
Identifier: A unique identifier that describes this process and is used to distinguish other processes. 
Status: task status, exit code, exit signal, etc.
Priority: Priority relative to other processes
Program Counter: The address of the next instruction to be executed in the program
Memory pointers: including pointers to program code and process-related data, as well as pointers to memory blocks shared with other processes
Context data: Data in the processor's registers when the process is executed [example of leave of absence, please add a picture of CPU, registers
//When multiple programs need to be executed at the same time, and one program has an execution time slice, the program needs to be switched when the time is reached. The process of switching to recovery requires context data to play a role to achieve a seamless switching effect.
I/O status information: includes displayed I/O requests, I/O devices assigned to the process and a list of files used by the process
Accounting information: may include total processor time, total number of clocks used, time limits, accounting accounts, etc.
other information

Note: The processes running in the system are stored in the kernel in the form of task_struct linked lists.

2. Check the process

  1. Process information can be viewed through the /proc system folder
  • Example:
  1. Most process information can also be obtained using user-level tools such as top and ps.
  • Example:

3. Get the process identifier

  • System call function:
  1. Use the getpid() system call function to obtain the current process id (PID)

  2. Use the getppid() system call function to obtain the parent process ID (PPID) of the current process

Note: getpid() and getppid() functions need to include the header file unistd.h

  • Example:
#include <stdio.h>
#include <unistd.h>

int main()
{
    printf("pid: %d\n", getpid());
    printf("ppid: %d\n", getppid());
    return 0;
}
  • result:

4. Create a process-fork()

  • fork() function:

Create a child process for the current process, share the code between the parent and child processes, and create a private copy of the data (copy-on-write: trigger copy when writing)

  • fork() return value:
  1. For the parent process, if the creation is successful, the pid of the child process is returned, otherwise a negative number is returned.

  2. Returns 0 for child processes (created successfully)

  • Notice:
  1. Returns 0 for the child process because the child process has only one parent process and can directly find the corresponding parent process.

  2. The meaning of returning the child process pid to the parent process is that you can directly get the child process ID in the parent process (there may be multiple child processes), and operate and manage a certain child process.

Note: The return type of fork is pid_t, and the header file sys/types.h needs to be included.

  • Example:
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
int main()
{
    pid_t ret = fork();
    printf("hello proc : %d!, ret: %d\n", getpid(), ret);
    sleep(1);
    return 0;
}
  • result:
  • Parent-child process split execution:

The purpose of creating a child process is to perform different tasks with the parent process. Because the code of the parent and child processes is shared, we use a branch structure to split the execution program.

  • Example:
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
int main()
{
    int ret = fork();
    if(ret < 0){
    perror("fork");
    return 1;
    }
    else if(ret == 0){ 
        //child
    	printf("I am child : %d!, ret: %d\n", getpid(), ret);
    }else{ 
        //father
    	printf("I am father : %d!, ret: %d\n", getpid(), ret);
    }
    sleep(1);
    return 0;
}
  • result:
  • Why ret has two return values:

The moment before the fork() function returns, all its main tasks are completed, that is, the child process is successfully created. At this time, the parent and child processes share code and each have one copy of the data (copy-on-write). A realistic copy occurs when returning. For the parent process The ret is the pid of the child process returned, and the ret of the child process is 0 returned.

5. Process status

Processes have different states. A process can have several states.

Note: In the Linux kernel, processes are sometimes called tasks

  • The state is defined in the kernel source code:
/*
* The task state array is a strange "bitmap" of
* reasons to sleep. Thus "running" is zero, and
* you can test for combinations of others with
* simple bit tests.
*/
static const char * const task_state_array[] = {
    "R (running)", /* 0 */
    "S (sleeping)", /* 1 */
    "D (disk sleep)", /* 2 */
    "T (stopped)", /* 4 */
    "t (tracing stop)", /* 8 */
    "X (dead)", /* 16 */
    "Z (zombie)", /* 32 */
};
  • explain:
  1. R running status (running): does not mean that the process must be running, it indicates that the process is either running or in the run queue (can be scheduled)

  2. S sleep state (sleeping): means that the process is waiting for the event to complete (the sleep here is sometimes also called interruptible sleep, that is, light sleep)

  3. D disk sleep state (Disk sleep): sometimes also called uninterruptible sleep state. Processes in this state usually wait for the end of IO (cannot be killed by the process, in addition to zombie processes)

  4. T stopped state (stopped): The (T) process can be stopped by sending the SIGSTOP signal to the process. The suspended process can continue to run by sending the SIGCONT signal.

  5. X Death state (dead): This state is just a return state, you will not see this state in the task list (ends very soon)

  6. t (tracing stop) state: indicates the process tracking state, that is, a state in which the process stops during debugging (some kernel versions do not have this state)

  • Diagram:
  • View the basic format and options of the status command:
ps aux / ps axj: View all processes in the system
ps -la: View basic processes
a: Display all processes of a terminal, except the session leader
u: Displays the user belonging to the process and memory usage
x: Show processes that do not control the terminal
-l: Long format displays more detailed information
-e: show all processes
  • Effect:

6. Zombie process

  • concept:
  1. Zombies are a rather special state. A zombie process occurs when a process exits and the parent process (using the wait() system call) does not read the return code of the child process exit.

  2. A zombie process remains in the process table in a terminated state and is waiting for the parent process to read the exit status code.

  3. That is, as long as the child process exits, the parent process is still running, but the parent process does not read the child process status, and the child process enters the Z state.

  • Example:
#include <stdio.h>
#include <stdlib.h>
int main()
{
    pid_t id = fork();
    if(id < 0){
    	perror("fork");
    	return 1;
    }
    else if(id > 0){ //parent
    	printf("parent[%d] is sleeping...\n", getpid());
    	sleep(30);
    }else{
    	printf("child[%d] is begin Z...\n", getpid());
    	sleep(5);
    	exit(EXIT_SUCCESS);
    }
    return 0;
}
  • result:
  • Why are there zombie processes?

The process is created to perform tasks, and the exit status of the process saves the status information of the task execution, so it must be maintained to wait for the parent process to obtain its exit information and task status, and then proceed to the next step.

  • Dangers of zombie processes:

If the parent process never reads, the child process will always be in the Z state. Maintaining the exit status itself requires data maintenance, and it is also the basic information of the process, so it is stored in task_struct (PCB), that is, the zombie process may never be recycled. Causes a waste of memory resources, and its space cannot be released until it is recycled.

7. Orphan process

  • concept:

The parent process exits first, and the child process is called an “orphan process”. The orphan process will be adopted by the init process (system) No. 1, and finally recycled by the init process.

  • Example:
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
int main()
{
    pid_t id = fork();
    if(id < 0){
    	perror("fork");
    	return 1;
    }
    else if(id == 0){//child
    	printf("I am child, pid : %d\n", getpid());
    	sleep(10);
    }else{//parent
    	printf("I am parent, pid: %d\n", getpid());
    	sleep(3);
    	exit(0);
    }
    return 0;
}
  • result:

8. Process priority

  • concept:
  1. The order of CPU resource allocation refers to the priority of the process. Processes with higher priority have priority execution rights.

  2. Configuring process priorities is very useful for Linux in a multi-tasking environment and can improve system performance.

  3. You can also run the process on a designated CPU, which can greatly improve the overall performance of the system.

  • The difference between permissions and priorities:
  1. Permission is a question of whether it can be executed
  2. The priority can be executed, but whether it is executed first or last
  3. It is because resources are limited and competition is needed to obtain resources that the priority is proposed.
  • View priority:

Use the ps -l command

  • Example:
  • explain:
  1. UID: represents the identity of the executor

  2. PID: represents the codename of this process

  3. PPID: represents which process this process is derived from, that is, the code name of the parent process

  4. PRI: represents the priority at which this process can be executed. The smaller the value, the earlier it will be executed.

  5. NI: represents the nice value of this process

  • PRI and NI:
  1. PRI is the priority of the process, or in layman’s terms, the order in which programs are executed by the CPU. The smaller the value, the higher the priority of the process.

  2. NI is the nice value, which represents the modified value of the priority at which the process can be executed.

  3. So the final priority: PRI(new)=PRI(old)+nice

    Note: The PRI (old) here can be understood as always being the baseline value of 80

  4. When the nice value is negative, the priority value of the program will become smaller, that is, its priority will become higher, and the faster it will be executed.

  5. Therefore, adjusting the process priority is to adjust the process nice value under Linux.

  6. The value range of nice is -20 to 19, a total of 40 levels

  • PRI vs NI:
  1. The nice value of a process is not the priority of the process, but the nice value of the process will affect the priority change of the process.

  2. It can be understood that the nice value is the correction data of the process priority.

  • Modify the nice value:

First use the top command, enter top and press “r” –> enter the process PID –> enter the nice value

  • Example:
  • Why does PRI have a base value and NI has a range:
  1. Avoid excessively high or low priorities, ensure controllability and relatively fair competition, and improve efficiency.

  2. Easy to calculate, no need to read priority information, and simple to implement

  • Possibility of process exiting the CPU:
  1. A higher priority process seizes the CPU

  2. The process time slice has arrived (multi-process running)

Note: When a process gives up or occupies CPU resources, it is necessary to save or restore the context data of the process.

  • Other concepts:
  1. Competition: There are many system processes, but there are only a small number of CPU resources, or even one, so there is competition between processes. In order to complete tasks efficiently and compete for related resources more reasonably, priority is given

  2. Independence: Multi-process operation requires exclusive use of various resources, and multi-process operations do not interfere with each other.

  3. Parallelism: Multiple processes run on multiple CPUs at the same time. This is called parallelism.

  4. Concurrency: Multiple processes use process switching under one CPU to advance multiple processes within a period of time, which is called concurrency.

9. Environment variables

  • concept:
  1. Environment variables generally refer to some parameters used in the operating system to specify the operating environment of the operating system.

  2. Environment variables usually have some special purposes, and usually have global characteristics in the system (can be inherited by child processes)

  • Example:

When we write C/C++ code, when linking, we never know where our linked dynamic and static libraries are, but we can still link successfully and generate executable programs. The reason is that there are relevant environment variables to help the compiler. Find

  • Common environment variables:
  1. PATH: Specify the search path for the command

  2. HOME: Specify the user’s home working directory (that is, the default directory when the user logs in to the Linux system)

  3. SHELL: The current Shell, its value is usually /bin/bash

  • View environment variables:

echo $NAME //NAME: your environment variable name

  • Example:

1) Test PATH

Note: Take PATH as an example to show the role of environment variables

  • Example: simply write a program
#include <stdio.h>
int main()
{
    printf("hello world!\n");
    return 0;
}
  • result:
  • Introduction:

Why can some instructions be executed directly without a path, but our binary program needs a path to execute?

  • reason:
  1. Before executing the program, the system will search for the corresponding program in a specific path

  2. The role of PATH is to assist in searching for system process instructions. The PATH variable stores the path where instructions or programs may exist.

  • Diagram:

Note: In fact, programs, commands, instructions, executable programs, etc. are all the same concept

  • How to execute program like instruction (execute without path):
  1. Add the path where our program is located to the environment variable PATH

Use the command export PATH=$PATH:path (the path where the program is located)

  • Example:

Note: This adding method is only useful currently. It will be restored after exiting Linux. If you want to set it permanently, you need to add it in the environment variable file.

  1. Copy the program to a path in the PATH variable
  • Example:

2) Test HOME

  • Comparison effect: root and ordinary users execute echo $HOME
  • explain:

The default path of a user is determined by the environment variable HOME. The environment variable home determines the user’s main working directory.

  • Commands related to environment variables:
  1. echo: display the value of an environment variable

  2. export: Set a new environment variable

  3. env: displays all environment variables

  4. unset: clear environment variables

  5. set: Display locally defined shell variables and environment variables

  • How environment variables are organized:
  • explain:

Each program will receive an environment table. The environment table is an array of character pointers. Each pointer points to an environment string terminated by ‘\0’.

3) How to obtain environment variables

  1. The third parameter of the command line
  • Example:
#include <stdio.h>
int main(int argc,char *argv[],char* env[])//Number of command line parameters, command line parameters, environment variables 
{
   int i = 0;
   for(; env[i]; i++) //Traverse the env pointer array to print the value of the environment variable
   {
     printf("env[%d]:%s\n", i, env[i]);                           
   }
   return 0;
}
  • result:
  1. Obtained through the third-party variable environ
  • Example:
#include <stdio.h>
int main(int argc, char *argv[])
{
    extern char **environ;
    int i = 0;
    for(; environ[i]; i++)
    {
    	printf("%s\n", environ[i]);
    }
    return 0;
}
  • Notice:

The global variable environ defined in libc points to the environment variable table. environ is not included in any header file, so it must be declared with extern when using it.

  1. Get or set environment variables through system calls
  • Example:
#include <stdio.h>
#include <stdlib.h>
int main()
{
    printf("PATH:%s\n", getenv("PATH"));
    return 0;
}
  • result:

Note: The getenv and putenv functions are commonly used to access specific environment variables.

4) Command line variables

  • In the command line, we can define two types of variables:
  1. Local variables: can only be accessed within the current shell command line interpreter and cannot be inherited by child processes

Note: For instructions run on the command line, its parent process is bash.

  1. Environment variables: can be inherited by child processes
  • Example:

Related Posts

1+X intermediate mall cluster construction (three hosts)

[Set Sail] OpenHarmony Expedition 01

Kali system learning: practical demonstration of vulnerability scanning tool NMAP

[Resolved]Error: Failed to download metadata for repo ‘appstream’ : Cannot prepare internal mirrorlist

Linux[JavaEE]——Building a JavaEE development environment (with software installation tutorial and download address)

ESXI 7.0 version configures N card graphics card pass-through

7 tools that can replace the top command

Use the make tool to compile all .c in any directory and link & specify the output directory

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*