Two privilege levels
The x86-64 CPU has (among others) two privilege levels: ring 0 (kernel) and ring 3 (user). The kernel runs in ring 0 with full access to hardware — it can program the network card, read any physical memory, change page tables. Your program runs in ring 3 with restricted access — it can only touch its own virtual address space and cannot directly execute privileged instructions.
This separation is what makes the OS possible: the kernel enforces policy (which process can access which file, how much memory each gets), and user processes cannot bypass it. A bug in your program corrupts your memory, not the kernel's.
What a syscall is
A syscall is a deliberate, controlled transfer from ring 3 to ring 0.
The CPU provides a dedicated instruction for this (syscall on x86-64).
When your program executes it, the CPU saves your register state, switches
privilege level, jumps to a kernel entry point, the kernel performs the operation,
restores your registers, and returns to ring 3. The whole round trip costs roughly
100–300 nanoseconds — cheap, but measurable.
Every syscall has a number. On Linux x86-64: read is 0,
write is 1, open is 2, exit is 60.
Arguments go in registers. The return value (or error code) comes back in
rax.
printf is not a syscall — write is.
printf is a C library function that formats its arguments into a buffer,
then calls the write syscall to actually send bytes to the file descriptor.
The C standard library wraps raw syscalls in friendly, portable functions.
But under every I/O call is a syscall.
The core I/O syscalls
open returns -1 if the file
doesn't exist or permissions are wrong. read returns -1
on error and 0 at end of file. write can write fewer
bytes than requested. Ignoring return values leads to silent data corruption
and processes running with wrong state.
Errno: how syscall errors work
When a syscall fails, it returns -1 and sets the global variable
errno to an error code. perror() prints a human-readable
error message for the current errno value.
strace: seeing every syscall your program makes
strace traces all syscalls made by a process, printing each one
with its arguments and return value. It's one of the most useful debugging tools
for systems work — when a program silently fails, strace tells
you exactly which syscall failed and why.
Even a trivial "Hello, world!" program makes dozens of syscalls before
reaching main — the dynamic linker loading libc,
the runtime setting up memory maps, reading configuration files.
Only a handful are your code.
The cost of syscalls
Each syscall takes ~100–300 ns on modern hardware: saving registers, switching privilege level, invalidating TLB entries, executing kernel code, reversing all of that. For most programs this doesn't matter. For high-throughput servers handling millions of requests per second, it does.
This is why printf buffers output instead of calling write
for each character. It's why databases use mmap or large
read buffers instead of reading one record at a time.
Minimizing syscalls is a real optimization technique.
A syscall is a controlled trap into the kernel — every file I/O, memory mapping, and process operation goes through one, and each one costs a privilege-level switch.