In Linux, a process never actually starts. It changes. One of the oldest and most commonly misinterpreted assumptions in Unix-like systems is that running a program is a one-time event. We say “the process starts” without thinking about it, but Linux does not. The kernel does not believe it is feasible to create a process out of nothing. Instead, everything is a fork: each process begins by creating a duplicate of an existing one and, if desired, modifying its identity.
This distinction is significant because Linux separates the concept of a process from the program image it executes. A process is a living kernel object. It is a scheduling entity represented by a task structure. An ELF file is a type of program image stored on disk that does not exist until it is loaded into memory. Mixing them up makes it difficult to grasp how performance, security boundaries, and startup latency interact.
So, creating a process is similar to a “two-step dance.” The kernel initially creates a copy of an existing task by calling ‘fork()’, ‘vfork()’, or ‘clone()’. This results in a new process that is almost identical to the one that created it. Second, the ‘execve()’ method changes the process’s memory image with a new one. This eliminates the process’s previous identity while retaining the PID and kernel book keeping. This basic architecture serves as the foundation for the dynamic linker, language runtimes, and environment configuration.
The Genetic Split (Fork and Clone)
- The “task_struct” Cloning: When “fork()” is used, the kernel does not immediately copy memory pages. It creates copies of metadata. At the heart of this is the “task_struct,” a massive structure that contains scheduling state, credentials, signal handlers, namespace references, open file tables, and memory descriptors. This structure is duplicated and loaded into the kernel’s global task list. The scheduler perceives the child process as existing nearly immediately. It has a PID, can be scheduled, and can receive signals before executing any user-space code.
- Copy-on-Write Semantics: The concept that developing a new procedure is “cheap” stems from Copy-on-Write. Linux does not duplicate the parent address space. Instead, it marks memory pages as read-only and distributes them between the parent and the child. The kernel becomes engaged only when a process attempts to alter a page. It creates a private copy and updates the page table. This is why forking a process with a 32 GB virtual address space takes so little time, as long as neither process touches the majority of it. The charge is determined by the number of actually filthy pages rather than the size of the address space.
- Cgroup and Namespace Inheritance: The child process inherits its parent’s namespaces, such as PID, mount, network, user, and others, unless the ‘clone()’ parameters specify otherwise. This inheritance specifies what the process may see, including filesystem roots, process trees, network interfaces, and UID mappings. Additionally, cgroup membership is inherited, placing the child under defined limits on CPU, memory, and I/O. Containers are not a kernel-specific feature; rather, they are a tight implementation of certain inheritance rules.
- Return Value Paradox: After the fork, the parent and child resume running at the same instruction, but their register states differ. It is normal for the return value stored in ‘$RAX’ (or ‘$EAX’ on x86) to be zero in the child and the child’s PID in the parent. User-space uses this minor change to distinguish between two execution contexts that are nearly identical. This is also why ‘fork()’ is known to cause issues: both processes continue until you instruct them to quit.
Change of Identity (‘execve’)
- The Point of No Return: Calling ‘execve()’ is equivalent to murdering yourself. The process continues, but everything else does not. The kernel discards the whole user-space address space of the process, including the code, stack, heap, and thread-local storage. File descriptors remain the same (unless declared “CLOEXEC”), signal dispositions are reset, and execution begins with a new image. There is no going back from here.
- ELF Anatomy and Program Headers: The kernel does not recognize symbols, functions, or variables. It recognizes ELF program headers. These headers describe how to translate the file to memory: The Program Header Table is used to decide which regions should be mapped. Each ‘PT_LOAD’ section provides a virtual address range, access rights, and backup file offsets. The kernel maps these segments into the new address space immediately, often lazily, deferring actual I/O until page faults occur.
- Kernel-Side Setup: Execution is routed through’search_binary_handler’, a dispatcher that selects the correct loader for the file type. This directs ELF binaries to the ELF handler; scripts take a separate path.
Kernel constructs:
- A new user stack
- The original ‘argv’ and ‘envp.’
- The auxiliary vector (‘auxv’), which silently tells user-space about page size, CPU features, and the program entry point
- The ‘.bss’ segment is mapped to zero-filled anonymous memory, guaranteeing that uninitialized globals behave as the C standard expects.
The Shebang Case
Linux does not treat a file as a native executable if it starts with the characters #!. The kernel sees this prefix as an order to pass execution to another program, an interpreter, as the rest of the first line explains. This method lets plain-text files participate in the execution paradigm without requiring machine code, and they remain completely integrated throughout the process’s lifetime.
The execve() function only handles a shebang in the kernel. The kernel’s binary format dispatcher (search_binary_handler) looks at the initial bytes of the target file after it has been opened. If there is no ELF magic but a valid #! is found, the script execution handler takes over. The kernel parses the interpreter path and any optional single argument while following strict rules for line length and formatting. There is no shell involved by default; the behavior is predictable and not impacted by user-space policies.
Then, execution continues with a recursive call to execve(). The kernel creates a new argument vector with the interpreter as argv[0], then the script’s path, and finally any original arguments. This recursive function never goes back to the original picture of the program, which is very important. The interpreter doesn’t run the original script file directly; it just sends it data. The kernel treats the process as undergoing the same identity change as any other execve() call.
This design has significant effects. First of all, the choice of interpreter is set at run time and is not changed by shell aliasing or environment-based substitution. Second, permission checks are done on both the script and the interpreter. If one of them is not accessible or is mounted with conflicting execution flags, this might cause problems that are hard to see. Third, user-space tools that monitor the process will only see the final interpreter image, not the intermediate script “execution,” because the recursion occurs only in kernel space.
The shebang mechanism is a good example of Unix’s simple design. Linux doesn’t construct a new execution path for scripts; instead, it uses the process-replacement machinery that is already there. The ultimate result is a consistent model in which binaries and scripts differ only in how they resolve their initial program image, not in how they run.
The Function of the Dynamic Linker
- The Interpreter Segment: Dynamically linked ELF binaries include a unique ‘PT_INTERP’ segment. This instructs the kernel, “Do not start this program directly.” Instead, the kernel loads and transfers control to the dynamic linker (usually known as ‘ld-linux.so’). Your program’s’main() function is not the first user-space code to execute.
- Symbol Resolution and Relocation: The dynamic linker maps all required shared libraries into the address space, resolves symbols, performs relocations, and generates the Global Offset Table (GOT) and Procedure Linkage Table. Depending on the setup, some symbols are resolved quickly, while others are resolved slowly upon first usage. Many real-world applications spend the majority of their startup time in this phase.
- Environment Injection: Before executing user code, the dynamic linker processes environment variables such as ‘LD_PRELOAD’, ‘LD_LIBRARY_PATH’, and audit hooks. This enables code injection, profiling, sandboxing, and exploitation without changing the binary. By the time control reaches your program, the runtime environment has already been significantly altered.
Context Switching and Jump to Userspace
To prepare the CPU state, the kernel initializes the user-space register state ($).RIP is set to the ELF entry point (often ‘_start’), $RSP points to the newly built stack. Segment selectors and flags are sanitized. There is no C runtime yet. No stack frames. Just the raw execution state.
- Ring Transition: The final phase is a controlled transition from Ring 0 to Ring 3, enforced by the CPU’s privilege model. From this point forward, the process runs with restricted rights, interacting with the kernel exclusively via system calls and exceptions.
- Language Runtimes: The ‘_start’ symbol comes from the language runtime, not your program. In glibc, it calls ‘__libc_start_main’, which sets up thread-local storage, constructors, and the runtime before calling’main()’. By the time your first line of code runs, dozens of invisible steps have already transpired.
Hardware and Kernel Bookkeeping
- TLB Management: You have to flush or re-tag the Translation Lookaside Buffer when you add a new address space. Modern CPUs make this easier by employing Address Space Identifiers (ASIDs) or Process Context Identifiers (PCIDs), which allow translations to be kept only when needed. Still, TLB pressure and shootdowns have a measurable cost, especially for systems with many process changes.
- Page Tables: The kernel creates a new set of page tables that show how the program’s virtual memory is configured. These tables establish the hardware-enforced isolation barrier by listing valid virtual addresses and the permissions associated with them. This structure routes every memory access made by the process, which makes page table design a significant factor in both security and performance.
- Scheduler Integration: The process enters the TASK_RUNNING state and is placed on the correct run queue. From now on, it will compete for CPU time based on the active scheduling class (such as CFS or RT) and any cgroup constraints. It is the same as any other runnable task at the scheduler level.
- Setting up signal and timer states: Before the kernel starts running, it sets up signal dispositions, pending signal queues, and timers for each process. Default handlers are installed, alternate stacks are cleared, and interval timers are reset to zero. This ensures that the rules for sending signals are clear and that nothing from the prior program image’s state is carried into the current execution environment.
- Finalizing the Credential and Security Context: The kernel completes the process’s credential state, including user IDs, group IDs, capability sets, and security module labels. Kernel logic and LSM hooks now enforce these features on the new program image, which controls what the process can access and change.
- Accounting and Resource Tracking: Lastly, the kernel connects the process to its accounting system. The new image starts with zero for CPU use, RAM use, I/O stats, and fault counts. These measurements go into kernel enforcement mechanisms, user-space observability tools, and administrative controls. This ensures the process can be seen, measured, and controlled at all times.
Conclusion
In Linux, implementing a lifecycle-integrated process involves a well-planned set of interactions between kernel subsystems and hardware mechanisms. Users think that “program start” is a simple thing, but it’s really a pipeline that comprises task duplication, address-space redefinition, binary interpretation, dynamic relocation, and architectural state transitions. The kernel’s design deliberately separates the existence of processes from program identification. This makes it easy to create processes and lets them take on entirely new roles while they run. The most important thing to know is that the main cost of this pipeline is not the fork itself. Copy-on-write semantics and metadata cloning make genetic splitting incredibly cheap, even for processes with a lot of virtual address space. Instead, the real overhead occurs during execve(), which involves tearing down the previous address space, mapping ELF segments, creating page tables, and handling the inevitable stream of page faults. Dynamic linking raises this cost by loading shared objects, resolving symbols, applying relocations, and shaping the runtime environment before user code takes over. Cache invalidation, TLB effects, and cold instruction paths make sure that short-lived activities cost a lot more than they should.
This is why workloads composed of many short-lived tasks don’t scale well, and why modern systems are seeking other options. APIs like posix_spawn() consolidate process creation and execution into a single operation that the kernel helps with. This reduces unnecessary duplication and lowers the risk of synchronization problems. In the same manner, I/O models like io_uring represent a larger shift away from fork-based concurrency toward persistent processes with asynchronous interfaces that better align with contemporary hardware and cache behavior.
Even with these adjustments, the original Unix process design is still very beautiful. A Linux process is more than simply a place to store instructions. It is also a security and resource boundary, a programmable thing, and a way to keep track of things. The kernel creates it by inheriting it, changes it by running it, and controls it within certain, strict limits. This principle is basic, but it has had a big impact on architecture over the years. Its strength comes from its ability to last: a design that is flexible but yet follows rules, which is why it is still at the heart of modern Linux systems.
References
Linux Kernel Documentation
Manual Pages (POSIX + Linux)
fork(2)— Explains COW semantics, return values, and inheritanceclone(2)— Namespace and resource sharing semanticsexecve(2)— Address space replacement, file descriptor inheritanceposix_spawn(3)— Modern alternative tofork+execld.so(8)— Dynamic linker behavior,LD_PRELOAD, relocation- System V ABI – ELF Specification — Program headers,
PT_LOAD,PT_INTERP, entry points - Linux Standard Base (LSB) ABI — Process startup, dynamic linking expectations
- Ulrich Drepper – How To Write Shared Libraries — Dynamic linking, relocation costs, loader internals
- Ulrich Drepper – What Every Programmer Should Know About Memory — Caches, TLBs, page faults, startup costs
- Brendan Gregg – Linux Performance — Fork/exec overhead, page faults, scheduler effects
- LWN.net – Process Creation & execve Internals — Detailed kernel walk-throughs
- glibc Source Code —
_start,__libc_start_main, TLS initialization - musl libc Documentation — Minimal runtime startup path
- Go Runtime Initialization —
runtime.rt0_go, ELF entry behavior - Rust Runtime / crt0 — Startup before
main - Intel® 64 and IA-32 Architectures Software Developer’s Manual — Ring transitions, TLBs, page tables
- AMD64 Architecture Programmer’s Manual — Virtual memory and privilege levels
io_uringDocumentation — Reducing syscall and process overhead- Facebook / Meta –
posix_spawnat Scale — Why fork is expensive in large address spaces



