Once I became interested in the contents of the stack of the main process function in linux. I did some research and now I present you the result.

Options for describing the main function:
1. int main()
2. int main(int argc, char **argv)
3. int main(int argc, char **argv, char **env)
4. int main(int argc, char **argv, char **env, ElfW(auxv_t) auxv)
5. int main(int argc, char **argv, char **env, char **apple)

Argc - number of parameters
argv - a null-terminal array of pointers to strings of command line options
env is a null-terminal array of pointers to environment variable strings. Each line in the format NAME=VALUE
auxv - array of auxiliary values ​​(only available for PowerPC)
apple - path to the executable (in MacOS and Darwin)
Auxiliary vector - an array with different additional information, such as effective user ID, setuid bit flag, memory page size, and the like.

The size of the stack segment can be viewed in the maps file:
cat /proc/10918/maps

7ffffffa3000-7ffffffff000 rw-p 00000000 00:00 0

Before the loader transfers control to main, it initializes the contents of the arrays of command line parameters, environment variables, an auxiliary vector.
After initialization, the top of the stack looks something like this for the 64bit version.
Senior address on top.

1. 0x7ffffffff000 The top point of the stack segment. The call causes a segfault
0x7ffffffff0f8 NULL void* 8 0x00"
2. filename char 1+ "/tmp/a.out"
char 1 0x00
...
env char 1 0x00
...
char 1 0x00
3. 0x7fffffffe5e0 env char 1 ..
char 1 0x00
...
argv char 1 0x00
...
char 1 0x00
4. 0x7fffffffe5be argv char 1+ "/tmp/a.out"
5. Array of random length
6. data for auxv void* 48"
AT_NULL Elf64_auxv_t 16 {0,0}
...
auxv Elf64_auxv_t 16
7. auxv Elf64_auxv_t 16 Ex.: (0x0e,0x3e8)
NULL void* 8 0x00
...
env char* 8
8. 0x7fffffffe308 env char* 8 0x7fffffffe5e0
NULL void* 8 0x00
...
argv char* 8
9. 0x7fffffffe2f8 argv char* 8 0x7fffffffe5be
10. 0x7fffffffe2f0 argc long int 8" number of arguments + 1
11. Local variables and arguments, functions called before main
12. local variables main
13. 0x7fffffffe1fc argc int 4 number of arguments + 1
0x7fffffffe1f0 argv char** 8 0x7fffffffe2f8
0x7fffffffe1e8 env char** 8 0x7fffffffe308
14. Variables local functions

"- I did not find descriptions of the fields in the documents, but they are clearly visible in the dump.

For 32 bits I did not check, but most likely it is enough just to divide the sizes by two.

1. Accessing addresses above the top point causes a Segfault.
2. A string containing the path to the executable file.
3. Array of strings with environment variables
4. Array of strings with command line options
5. An array of random length. Its selection can be turned off with the commands
sysctl -w kernel.randomize_va_space=0
echo 0 > /proc/sys/kernel/randomize_va_space
6. Data for the auxiliary vector (for example, the string "x86_64")
7. Auxiliary vector. More details below.
8. Null-terminal array of pointers to environment variable strings
9. Null-terminal array of pointers to command line parameter strings
10. A machine word containing the number of command line parameters (one of the arguments of the "higher" functions, see item 11)
11.Local variables and arguments of functions called before main(_start,__libc_start_main..)
12. Variables declared in main
13. Main function arguments
14. Variables and arguments of local functions.

Auxiliary vector
For i386 and x86_64, you cannot get the address of the first element of the auxiliary vector, but the contents of this vector can be obtained in other ways. One of them is to access the area of ​​memory immediately after the array of pointers to environment variable strings.
It should look something like this:
#include #include int main(int argc, char** argv, char** env)( Elf64_auxv_t *auxv; //x86_64 // Elf32_auxv_t *auxv; //i386 while(*env++ != NULL); //find the beginning of the auxiliary vector for ( auxv = (Elf64_auxv_t *)env; auxv->a_type != AT_NULL; auxv++)( printf("addr: %p type: %lx is: 0x%lx\n", auxv, auxv->a_type, auxv->a_un .a_val); ) printf("\n (void*)(*argv) - (void*)auxv= %p - %p = %ld\n (void*)(argv)-(void*)(&auxv) =%p-%p = %ld\n ", (void*)(*argv), (void*)auxv, (void*)(*argv) - (void*)auxv, (void*)(argv) , (void*)(&auxv), (void*)(argv) - (void*)(&auxv)); printf("\n argc copy: %d\n",*((int *)(argv - 1 )); return 0; )
Elf(32,64)_auxv_t structures are described in /usr/include/elf.h. Functions for filling structures in linux-kernel/fs/binfmt_elf.c

The second way to get the contents of the vector:
hexdump /proc/self/auxv

The most readable representation is obtained by setting environment variable LD_SHOW_AUXV.

LD_SHOW_AUXV=1ls
AT_HWCAP: bfebfbff // processor capabilities
AT_PAGESZ: 4096 //memory page size
AT_CLKTCK: 100 //update frequency times()
AT_PHDR: 0x400040 //header information
AT_PHENT: 56
AT_PHNUM: 9
AT_BASE: 0x7fd00b5bc000 //address of the interpreter, i.e. ld.so
AT_FLAGS: 0x0
AT_ENTRY: 0x402490 //program entry point
AT_UID: 1000 //user and group ids
AT_EUID: 1000 //nominal and effective
AT_GID: 1000
AT_EGID: 1000
AT_SECURE: 0 //is the setuid flag set
AT_RANDOM: 0x7fff30bdc809 //Address 16 random bytes,
generated at startup
AT_SYSINFO_EHDR: 0x7fff30bff000 //pointer to page used for
//system calls
AT_EXECFN: /bin/ls
AT_PLATFORM: x86_64
On the left is the name of the variable, on the right is the value. All possible variable names and their descriptions can be found in the elf.h file. (constants prefixed with AT_)

Return from main()
After the process context is initialized, control is transferred not to main(), but to the _start() function.
main() calls already from __libc_start_main. This last function has an interesting feature - it is passed a pointer to a function to be executed after main(). And this pointer is passed naturally through the stack.
In general, the __libc_start_main arguments have the form, according to the file glibc-2.11/sysdeps/ia64/elf/start.S
/*
* Arguments for __libc_start_main:
*out0:main
*out1: argc
*out2: argv
* out3: init
* out4: fini //function called after main
* out5: rtld_fini
*out6: stack_end
*/
Those. to get the address of the fini pointer, you need to move two machine words from the last local variable main.
Here's what happened (operability depends on the compiler version):
#include void **ret; void *leave; void foo()( void (*boo)(void); //function pointer printf("Stack rewrite!\n"); boo = (void (*)(void))leave; boo(); // fini () ) int main(int argc, char *argv, char *envp) ( unsigned long int mark = 0xbfbfbfbfbfbfbfbf; // mark to work from ret = (void**)(&mark+2); // extract address , a function called after completion (fini) leave = *ret; // store *ret = (void*)foo; // overwrite return 0; // call foo() )

I hope it was interesting.
Good luck.

Thanks to user Xeor for the helpful tip.

When creating a console application in the C++ programming language, a line very similar to this is automatically created:

int main(int argc, char* argv) // main() function parameters

This line is the header main function main() , parameters argс and argv are declared in brackets. So, if the program is run through command line, then it is possible to pass some information to this program, for this there are parameters argc and argv . The argc parameter is of data type int , and contains the number of parameters passed to main function. Moreover, argc is always at least 1, even when we do not pass any information, since the name of the function is considered the first parameter. The argv parameter is an array of pointers to strings. Only string type data can be passed through the command line. Pointers and strings are two big topics for which separate sections have been created. So, it is through the argv parameter that any information is transmitted. Let's develop a program that we will run through the command line. Windows line, and pass some information to it.

// argc_argv.cpp: Specifies the entry point for the console application. #include "stdafx.h" #include using namespace std; int main(int argc, char* argv) ( if (argc ><< argv<

// code Code::Blocks

// Dev-C++ code

// argc_argv.cpp: Specifies the entry point for the console application. #include using namespace std; int main(int argc, char* argv) ( if (argc > 1)// if we pass arguments, then argc will be greater than 1 (depending on the number of arguments) ( cout<< argv<

After debugging the program, open the Windows command line and drag the executable of our program into the command line window, the full path to the program will be displayed on the command line (but you can write the path to the program manually), after that you can press ENTER and the program will start (see Figure 1).

Figure 1 - Parameters of the main function

Since we just ran the program and did not pass any arguments to it, the message Not arguments appeared. Figure 2 shows the launch of the same program through the command line, but with the Open argument passed to it.

Figure 2 - Parameters of the main function

The argument is the word Open , as you can see from the figure, this word appeared on the screen. You can pass several parameters at once, separating them with a comma. If it is necessary to pass a parameter consisting of several words, then they must be enclosed in double quotes, and then these words will be considered as one parameter. For example, the figure shows the launch of the program, passing it an argument consisting of two words - It work .

Figure 3 - Parameters of the main function

And if you remove the quotes. Then we will see only the word It. If you do not plan to pass any information when starting the program, then you can remove the arguments in the main() function, you can also change the names of these arguments. Sometimes there are modifications of the argc and argv parameters, but it all depends on the type of application being created or on the development environment.

Optional and named arguments

Optional Arguments

C# 4.0 introduces a new feature that improves the convenience of specifying arguments when calling a method. This tool is called optional arguments and allows you to define a default value for a method parameter. This value will be used by default if no corresponding argument is specified for the parameter when the method is called. Therefore, it is not necessary to specify an argument for such a parameter. Optional arguments make it easier to call methods, where default arguments are applied to some parameters. They can also be used as a "short" form of method overloading.

The main impetus for adding optional arguments was the need to simplify interaction with COM objects. In several Microsoft object models (for example, Microsoft Office), functionality is provided through COM objects, many of which were written long ago and were designed to use optional parameters.

An example of using optional arguments is shown below:

Using System; using System.Collections.Generic; using System.Linq; using System.Text; namespace ConsoleApplication1 ( class Program ( // Arguments b and c are optional when calling static int mySum(int a, int b = 5, int c = 10) ( return a + b + c; ) static void Main() ( int sum1 = mySum(3); int sum2 = mySum(3,12); Console.WriteLine("Sum1 = "+sum1); Console.WriteLine("Sum2 = "+sum2); Console.ReadLine(); ) ) )

It should be borne in mind that all optional arguments must necessarily be indicated to the right of the required ones. In addition to methods, optional arguments can be used in constructors, indexers, and delegates.

One advantage of optional arguments is that they make it easier for the programmer to deal with complex method and constructor calls. After all, it is often necessary to set more parameters in a method than is usually required. And in cases like this, some of these parameters can be made optional by careful use of the optional arguments. This means that only those arguments that are important in this particular case need to be passed, and not all the arguments that should otherwise be required. This approach allows us to rationalize the method and simplify the programmer's handling of it.

Named Arguments

Another feature that was added to C# with the release of .NET 4.0 is support for so-called named arguments. As you know, when passing arguments to a method, the order in which they appear, as a rule, must match the order in which the parameters are defined in the method itself. In other words, the argument value is assigned to the parameter by its position in the argument list.

Named arguments are designed to overcome this limitation. A named argument allows you to specify the name of the parameter to which its value is assigned. And in this case, the order of the arguments no longer matters. Thus, named arguments are somewhat similar to the object initializers mentioned earlier, although they differ from them in their syntax. To specify an argument by name, use the following form of syntax:

parameter_name: value

Here parameter_name denotes the name of the parameter to which the value is passed. Of course, parameter_name must be the name of a valid parameter for the method being called.

You can pass certain arguments to C programs. When main() is called at the beginning of the computation, three parameters are passed to it. The first of them determines the number of command arguments when accessing the program. The second one is an array of pointers to character strings containing these arguments (one argument per string). The third one is also an array of pointers to character strings, it is used to access operating system parameters (environment variables).

Any such line is represented as:

variable = value\0

The last line can be found by two trailing zeros.

Let's name the main() function arguments respectively: argc, argv and env (any other names are possible). Then the following descriptions are allowed:

main(int argc, char *argv)

main(int argc, char *argv, char *env)

Suppose there is some program prog.exe on drive A:. Let's address it like this:

A:\>prog.exe file1 file2 file3

Then argv is a pointer to the string A:\prog.exe, argv is a pointer to the string file1, and so on. The first actual argument is pointed to by argv and the last by argv. If argc=1, then there are no parameters after the program name on the command line. In our example, argc=4.

recursion

Recursion is a method of calling in which a function calls itself.

An important point in compiling a recursive program is the organization of the exit. It is easy to make the mistake here that the function will consistently call itself indefinitely. Therefore, the recursive process must, step by step, simplify the problem so that in the end a non-recursive solution appears for it. Using recursion is not always desirable as it can lead to a stack overflow.

Library Functions

In programming systems, subroutines for solving common problems are combined into libraries. These tasks include: calculation of mathematical functions, data input/output, string processing, interaction with operating system tools, etc. The use of library subroutines relieves the user of the need to develop appropriate tools and provides him with an additional service. The functions included in the libraries are supplied with the programming system. Their declarations are given in *.h files (these are the so-called include or header files). Therefore, as mentioned above, at the beginning of the program with library functions, there should be lines like:

#include<включаемый_файл_типа_h>

For example:

#include

There are also facilities for extending and creating new libraries with user programs.

Global variables are assigned a fixed place in memory for the duration of the program. Local variables are stored on the stack. Between them is a memory area for dynamic allocation.

The malloc() and free() functions are used to allocate free memory dynamically. The malloc() function allocates memory, the free() function frees it. The prototypes of these functions are stored in the stdlib.h header file and look like this:

void *malloc(size_t size);

void *free(void *p);

The malloc() function returns a pointer of type void; for proper use, the function value must be converted to a pointer to the appropriate type. On success, the function returns a pointer to the first byte of free memory of size size. If there is not enough memory, the value 0 is returned. The sizeof() operation is used to determine the number of bytes needed for a variable.

An example of using these functions:

#include

#include

p = (int *) malloc(100 * sizeof(int)); /* Allocate memory for 100

integers */

printf("Out of memory\n");

for (i = 0; i< 100; ++i) *(p+i) = i; /* Использование памяти */

for (i = 0; i< 100; ++i) printf("%d", *(p++));

free(p); /* Free memory */

Before using the pointer returned by malloc(), you need to make sure that there is enough memory (the pointer is not null).

Preprocessor

A C preprocessor is a program that processes input to a compiler. The preprocessor looks at the source program and performs the following actions: connects the given files to it, performs substitutions, and also manages compilation conditions. The preprocessor is intended for program lines that begin with the # symbol. Only one command (preprocessor directive) is allowed per line.

Directive

#define identifier substitution

causes the following program text to replace the named identifier with the substitution text (note the lack of a semicolon at the end of this command). Essentially, this directive introduces a macro definition (macro), where "identifier" is the name of the macro definition, and "substitution" is the sequence of characters that the preprocessor replaces the specified name with when it finds it in the program text. The name of a macro is usually capitalized.

Consider examples:

The first line causes the program to replace the identifier MAX with the constant 25. The second allows you to use in the text instead of the opening curly brace (() the word BEGIN.

Note that since the preprocessor does not check the compatibility between the symbolic names of macro definitions and the context in which they are used, it is recommended to define such identifiers not with the #define directive, but with the const keyword with an explicit type indication (this is more true for C + +):

const int MAX = 25;

(the type int can be omitted, as it is set by default).

If the #define directive looks like:

#define identifier(identifier, ..., identifier) ​​substitution

and there is no space between the first identifier and the opening parenthesis, then this is a macro substitution definition with arguments. For example, after the appearance of a line like:

#define READ(val) scanf("%d", &val)

statement READ(y); is treated the same as scanf("%d",&y);. Here val is an argument and macro substitution is performed with the argument.

If there are long definitions in the substitution that continue on the next line, a \ character is placed at the end of the next continued line.

You can place objects separated by ## characters in a macro definition, for example:

#define PR(x, y) x##y

After that, PR(a, 3) will call the substitution a3. Or, for example, a macro definition

#define z(a, b, c, d) a(b##c##d)

will change z(sin, x, +, y) to sin(x+y).

The # character placed before a macro argument indicates that it is converted to a string. For example, after the directive

#define PRIM(var) printf(#var"= %d", var)

the following fragment of the program text

is converted like this:

printf("year""= %d", year);

Let's describe other preprocessor directives. The #include directive has been seen before. It can be used in two forms:

#include "file name"

#include<имя файла>

The effect of both commands is to include files with the specified name in the program. The first one loads a file from the current directory or the directory specified as a prefix. The second command searches for the file in standard locations defined in the programming system. If the file whose name is written in double quotes is not found in the specified directory, then the search will continue in the subdirectories specified for the #include command<...>. #include directives can be nested within each other.

The next group of directives allows you to selectively compile parts of the program. This process is called conditional compilation. This group includes the directives #if, #else, #elif, #endif, #ifdef, #ifndef. The basic form of the #if directive is:

#if constant_expression statement_sequence

Here the value of the constant expression is checked. If it is true, then the given sequence of statements is executed, and if it is false, then this sequence of statements is skipped.

The action of the #else directive is similar to the action of the else command in the C language, for example:

#if constant_expression

statement_sequence_2

Here, if the constant expression is true, then sequence_of_operators_1 is executed, and if it is false, sequence_of_operators_2 is executed.

The #elif directive means an "else if" type action. The main form of its use is as follows:

#if constant_expression

statement_sequence

#elif constant_expression_1

statement_sequence_1

#elif constant_expression_n

sequence_of_statements_n

This form is similar to the C language construct of the form: if...else if...else if...

Directive

#ifdef identifier

sets whether the specified identifier is currently defined, i.e. whether it was included in directives of the form #define. View line

#ifndef identifier

checks whether the specified identifier is currently undefined. Any of these directives can be followed by an arbitrary number of lines of text, possibly containing an #else statement (#elif cannot be used) and ending with the line #endif. If the condition being checked is true, then all lines between #else and #endif are ignored, and if false, then the lines between the check and #else (if there is no #else word, then #endif). The #if and #ifndef directives can be nested one inside the other.

View Directive

#undef identifier

causes the specified identifier to be considered undefined, i.e. not replaceable.

Consider examples. The three directives are:

check if the identifier WRITE is defined (i.e. was a command of the form #define WRITE...), and if so, then the WRITE name is considered undefined, i.e. not replaceable.

directives

#define WRITE fprintf

check if the WRITE identifier is undefined, and if so, then the WRITE identifier is determined instead of the fprintf name.

The #error directive is written in the following form:

#error error_message

If it occurs in the program text, then compilation stops and an error message is displayed on the display screen. This command is mainly used during the debug phase. Note that the error message does not need to be enclosed in double quotes.

The #line directive is intended to change the values ​​of the _LINE_ and _FILE_ variables defined in the C programming system. The _LINE_ variable contains the line number of the program currently being executed. The _FILE_ identifier is a pointer to a string with the name of the program being compiled. The #line directive is written as follows:

#line number "filename"

Here number is any positive integer that will be assigned to the _LINE_ variable, filename is an optional parameter that overrides the value of _FILE_.

The #pragma directive allows you to pass some instructions to the compiler. For example, the line

indicates that there are assembly language strings in a C program. For example:

Consider some global identifiers or macro names (names of macro definitions). Five such names are defined: _LINE_, _FILE_, _DATE_, _TIME_, _STDC_. Two of them (_LINE_ and _FILE_) have already been described above. The _DATE_ identifier specifies a string that stores the date the source file was translated into object code. The _TIME_ identifier specifies a string that stores the time the source file was translated into object code. The _STDC_ macro has a value of 1 if standard-defined macro names are used. Otherwise, this variable will not be defined.


Sometimes when starting a program it is useful to pass some information to it. Typically, this information is passed to the main() function via command-line arguments. Command line argument is information that is entered on the operating system command line following the program name. For example, to start compiling a program, you need to type the following on the command line after the prompt:

CC program_name

program_name is a command line argument that specifies the name of the program you are about to compile.

To accept command line arguments, two special built-in arguments are used: argc and argv . The argc parameter contains the number of arguments on the command line and is an integer, and it is always at least 1 because the first argument is the name of the program. And the argv parameter is a pointer to an array of pointers to strings. In this array, each element points to some command line argument. All command-line arguments are strings, so converting any numbers to the desired binary format must be provided in the program when it is developed.

Here is a simple example of using the command line argument. The screen displays the word Hello and your name, which must be specified as a command line argument.

#include #include int main(int argc, char *argv) ( if(argc!=2) ( printf("You forgot to enter your name.\n"); exit(1); ) printf("Hello %s", argv); return 0; )

If you named this program name (name) and your name is Tom, then to run the program, enter name Tom at the command line. As a result of running the program, the message Hello, Tom will appear on the screen.

In many environments, all command-line arguments must be separated from each other by a space or tab. Commas, semicolons, and similar characters are not considered separators. For example,

Run Spot, run

consists of three character strings, while

Eric, Rick, Fred

is a single character string - commas are generally not considered delimiters.

If the string contains spaces, then in some environments the string can be enclosed in double quotes to prevent multiple arguments from being made. As a result, the entire string will be considered as one argument. To learn more about how command-line options are set on your operating system, see the documentation for that system.

It is very important to declare argv correctly. Here's how they do it most often:

Char *argv;

Empty square brackets indicate that the array has an indefinite length. You can now access individual arguments by indexing the argv array. For example, argv points to the first character string, which is always the program name; argv points to the first argument, and so on.

Another small example of using command line arguments is the countdown program below. This program counts down from some value (specified on the command line) and beeps when it reaches 0. Note that the first argument containing the initial value is converted to an integer value using the standard function atoi () . If the second argument of the command line (and if we take the name of the program as the third argument) is the string "display" (output to the screen), then the result of the countdown (in reverse order) will be displayed on the screen.

/* Program to count backwards. */ #include #include #include #include int main(int argc, char *argv) ( int disp, count; if(argc<2) { printf("В командной строке необходимо ввести число, с которого\n"); printf("начинается отсчет. Попробуйте снова.\n"); exit(1); } if(argc==3 && !strcmp(argv, "display")) disp = 1; else disp = 0; for(count=atoi(argv); count; --count) if(disp) printf("%d\n", count); putchar("\a"); /* здесь подается звуковой сигнал */ printf("Счет закончен"); return 0; }

Note that if command line arguments are not specified, an error message will be displayed. Programs with command-line arguments often do the following: when the user runs these programs without entering the required information, instructions are displayed on how to correctly specify the arguments.

To access a single character in one of the command line arguments, enter the second index into argv. For example, the following program prints character-by-character all the arguments with which it was called:

#include int main(int argc, char *argv) ( int t, i; for(t=0; t

Remember, the first index of argv provides access to the string, and the second index provides access to its individual characters.

Usually argc and argv are used to pass the initial commands to the program that it will need when it starts. For example, command-line arguments often specify information such as a filename, an option, or an alternate behavior. Using command line arguments gives your program a "professional look" and makes it easier to use in batch files.

The names argc and argv are traditional but not required. You can name these two parameters in the main() function whatever you like. Also, some compilers may support -additional arguments for main(), so be sure to check the documentation for your compiler.

When a program does not require command line parameters, it is most common to explicitly declare the main() function to have no parameters. In this case, the keyword void is used in the parameter list of this function.