Learning Rust: System Calls

Published in

Level Up Coding

8 min readApr 1, 2024

Do you know how your apps talk to the Linux kernel? Let’s explore how we can communicate directly, skipping any intermediate layer.

Imagine a simple application that reads a file and outputs its content to stdout. The code can be very simple in any modern language. We need to open a file, allocate a buffer, loop to read the file, write to stdout and finally free the memory and close the file. All these operations are well abstracted for a developer in any programming language, so devs don’t have to worry about the OS or CPU architecture.

In this story, we will bypass the abstraction provided by Rust and even run on a bare-metal setup. In the first iteration, we will define all necessary system calls and write a Rust app that resembles a C program. In the second iteration, we will introduce some abstraction to maintain Rust’s safety as much as possible.

I’ve already collected all the necessary system calls and completed basic assembly bindings in Rust. We don’t need to worry about them, assuming they are correct. But, of course, there could be some mistakes.

// 0, reads from a file descriptor
pub fn sys_read(fd: u32, buf: *const u8, count: usize) -> isize;

// 1, writes to a file descriptor
pub fn sys_write(fd: u32, buf: *const u8, count: usize) -> isize;

// 2, opens file
pub fn sys_open(pathname: *const u8, flags: i32, mode: u16) -> isize;

// 3, closes a file descriptor
pub fn sys_close(fd: u32) -> isize;

// 9, allocates heap memory
pub fn sys_mmap(addr: *mut u8, length: usize, prot: usize, flags: usize, fd: usize, offset: usize) -> isize;

// 11, frees heap memory
pub fn sys_munmap(addr: *mut u8, length: usize) -> isize;

// 60, terminates the calling process
pub fn sys_exit(status: i32) -> !;

The above system calls can be used to assemble a very naive program that will not check for any errors. Of course, it will work, because most errors are meant to be transient. As mentioned earlier, this approach resembles very naive C program.

#![no_std]
#![no_main]

mod syscall;

use crate::syscall::*;
use core::ptr::null_mut;

#[no_mangle]
extern "C" fn _start() -> ! {
    let size = 16 * 4096;
    let buffer = sys_mmap(null_mut(), size, 0x03, 0x22, 0, 0);

    let filename = c"enwiki-20240320-pages-articles-multistream-index.txt.bz2";
    let filename = filename.to_bytes_with_nul().as_ptr();
    let file = sys_open(filename, 0, 0);

    loop {
        let read = sys_read(file as u32, buffer, size);
        if read <= 0 {
            break;
        }

        let mut index = 0;
        while index < read {
            let buffer = unsafe { buffer.offset(index) };
            let write = sys_write(1, buffer, read as usize);
            index += write as isize;
        }
    }

    sys_munmap(buffer, size);
    sys_close(file as u32);

    sys_exit(0);
}

#[panic_handler]
fn panic(_panic: &core::panic::PanicInfo<'_>) -> ! {
    sys_exit(-1)
}

When I ran it on my local machine, I confirmed it generated the same hash as the regular cat command. Also, the performance was exactly the same. In my local environment, I use Docker running on Windows, so processing the 200MB file took around 1.3 seconds — fetching files from Windows is not the fastest.

vscode $ time cargo run --release | sha256sum

    Finished release [optimized] target(s) in 0.02s
     Running `/tmp/cargo/x86_64-unknown-none/release/learning-rust`
6144fc7e5abfa0baf60f37b92a4e839f5be2bb3287fa3bdb9758251a6a22f270  -

real    0m1.307s
user    0m0.992s
sys     0m0.147s

How can the naive program be improved? Let’s experiment with the abstraction of memory management. At the bare metal level, we don’t have an allocator, and we are not going to implement a new one using the standard allocator trait. Instead, we will approach it in a more classical way.

pub enum MemoryAllocation {
    Succeeded(MemoryAddress),
    Failed(isize),
}

pub fn mem_alloc(length: usize) -> MemoryAllocation {
    let address = match sys_mmap(null_mut(), length, 0x03, 0x22, 0, 0) {
        value if value <= 0 => return MemoryAllocation::Failed(value),
        value => MemoryAddress {
            ptr: value as *mut u8,
            len: length,
        },
    };

    MemoryAllocation::Succeeded(address)
}

pub enum MemoryDeallocation {
    Succeeded(),
    Failed(isize),
}

pub fn mem_free(memory: MemoryAddress) -> MemoryDeallocation {
    match sys_munmap(memory.ptr, memory.len) {
        value if value == 0 => MemoryDeallocation::Succeeded(),
        value => MemoryDeallocation::Failed(value),
    }
}

The snippet above can allocate a slice of memory and free it. It enhances Rust safety by explicitly separating successes and failures into different variants. Additionally, memory deallocation can be done exactly once, as the value is consumed. Let’s explore how we can work with the requested memory blocks.

pub struct MemorySlice {
    ptr: *const u8,
    len: usize,
}

impl MemorySlice {
    pub fn from(data: &[u8]) -> Self {
        Self {
            ptr: data.as_ptr(),
            len: data.len(),
        }
    }
}

pub struct MemoryAddress {
    ptr: *mut u8,
    len: usize,
}

pub enum MemorySlicing {
    Succeeded(MemorySlice),
    InvalidParameters(),
    OutOfRange(),
}

impl MemoryAddress {
    pub fn between(&self, start: usize, end: usize) -> MemorySlicing {
        if start > self.len || end > self.len {
            return MemorySlicing::OutOfRange();
        }

        if start > end {
            return MemorySlicing::InvalidParameters();
        }

        let slice = MemorySlice {
            ptr: unsafe { self.ptr.offset(start as isize) },
            len: end - start,
        };

        MemorySlicing::Succeeded(slice)
    }
}

We wrapped a pointer to a memory block together with its length. The wrapper can also be sliced into smaller blocks. We can even create a block from Rust’s native slice. Everything is safeguarded.

Apart from memory, we operate with various file descriptors — we open a file, write to stdin, and potentially may want to write to stderr. Let’s see how opening and closing a file can be safer:

pub struct FileDescriptor {
    value: u32,
}

pub enum FileOpenining {
    Succeeded(FileDescriptor),
    Failed(isize),
}

pub fn file_open(pathname: &CStr) -> FileOpenining {
    match sys_open(pathname.to_bytes_with_nul().as_ptr(), 0, 0) {
        value if value < 0 => FileOpenining::Failed(value),
        value => FileOpenining::Succeeded(FileDescriptor {
            value: value as u32,
        }),
    }
}

pub enum FileClosing {
    Succeeded(),
    Failed(isize),
}

pub fn file_close(file: FileDescriptor) -> FileClosing {
    match sys_close(file.value) {
        value if value < 0 => FileClosing::Failed(value),
        _ => FileClosing::Succeeded(),
    }
}

In the code above, we rely on the obtained file descriptor. Similarly, as in memory management, we create variants for successes and failures. Closing a file consumes a descriptor, making the process a bit safer than operating on integers. Now, we can even try to return stdout and stderr:

pub fn stdout_open() -> FileDescriptor {
    FileDescriptor { value: 1 }
}

pub fn stderr_open() -> FileDescriptor {
    FileDescriptor { value: 2 }
}

When the file descriptor is open, we can try to write to it or read from it. Both operations benefit from previous memory management and wrap error handling:

pub enum FileReading {
    Succeeded(usize),
    EndOfFile(),
    Failed(isize),
}

pub fn file_read(file: &FileDescriptor, buffer: &mut MemoryAddress) -> FileReading {
    match sys_read(file.value, buffer.ptr, buffer.len) {
        value if value < 0 => FileReading::Failed(value),
        value if value == 0 => FileReading::EndOfFile(),
        value => FileReading::Succeeded(value as usize),
    }
}

pub enum FileWriting {
    Succeeded(usize),
    Failed(isize),
}

pub fn file_write(file: &mut FileDescriptor, buffer: &MemorySlice) -> FileWriting {
    match sys_write(file.value, buffer.ptr, buffer.len) {
        value if value < 0 => FileWriting::Failed(value),
        value => FileWriting::Succeeded(value as usize),
    }
}

All operations have been successfully abstracted, offering more safety and reducing the likelihood of making mistakes. Yes, we can still forget about closing a file or freeing memory. We can even try to read from stdout or attempt to close it. There are many ways to improve it, but let’s focus on how to modify the naive code I wrote at the beginning. Now it looks like the following snippet:

#![no_std]
#![no_main]

mod linux;
mod syscall;

use crate::linux::{file_close, FileClosing};
use crate::linux::{file_open, FileOpenining};
use crate::linux::{file_read, FileReading};
use crate::linux::{file_write, FileWriting};
use crate::linux::{mem_alloc, mem_free, MemoryAllocation, MemoryDeallocation};
use crate::linux::{stderr_open, stdout_open};
use crate::linux::{MemorySlice, MemorySlicing};
use crate::syscall::sys_exit;

#[no_mangle]
extern "C" fn _start() -> ! {
    let buffer_size = 16 * 4096;
    let mut target = stdout_open();

    let mut buffer = match mem_alloc(buffer_size) {
        MemoryAllocation::Failed(_) => fail(-2, b"Cannot allocate memory.\n"),
        MemoryAllocation::Succeeded(value) => value,
    };

    let pathname = c"enwiki-20240320-pages-articles-multistream-index.txt.bz2";
    let source = match file_open(pathname) {
        FileOpenining::Failed(_) => fail(-2, b"Cannot open source file.\n"),
        FileOpenining::Succeeded(value) => value,
    };

    loop {
        let read = match file_read(&source, &mut buffer) {
            FileReading::Failed(_) => fail(-2, b"Cannot read from source file.\n"),
            FileReading::Succeeded(value) => value,
            FileReading::EndOfFile() => break,
        };

        let mut index = 0;
        while index < read {
            let write = match buffer.between(index, read) {
                MemorySlicing::Succeeded(value) => value,
                MemorySlicing::InvalidParameters() => fail(-2, b"Cannot slice memory.\n"),
                MemorySlicing::OutOfRange() => fail(-2, b"Cannot slice memory.\n"),
            };

            index += match file_write(&mut target, &write) {
                FileWriting::Failed(_) => fail(-2, b"Cannot write to stdout.\n"),
                FileWriting::Succeeded(value) => value,
            };
        }
    }

    if let MemoryDeallocation::Failed(_) = mem_free(buffer) {
        fail(-2, b"Cannot free memory.\n");
    }

    if let FileClosing::Failed(_) = file_close(source) {
        fail(-2, b"Cannot close source file descriptor.\n")
    }

    sys_exit(0);
}

fn fail(status: i32, msg: &[u8]) -> ! {
    file_write(&mut stderr_open(), &MemorySlice::from(msg));
    sys_exit(status);
}

#[panic_handler]
fn panic(_panic: &core::panic::PanicInfo<'_>) -> ! {
    sys_exit(-1)
}

The snippet is much longer but handles every possible error path. It reduces the chance of mistakes by introducing strong types, which, in Rust and LLVM, adds no extra cost. The listing of generated assembly code has doubled to 170 lines, yet the compiled binary remains around 1.4kB.

What about arguments passed from the command line? We could easily add some global assembly to pass them directly to our main function:

global_asm! {
    ".global _start",
    "_start:",
    "mov rdi, [rsp]",
    "lea rsi, [rsp + 8]",
    "call main"
}

#[no_mangle]
extern "C" fn main(argc: usize, argv: *const *const u8) -> ! {
    ...
}

But it would bring us again to the world of pointers. Both parameters are placed on the stack just for us. The first one is argc, and then comes the argv array. We can rearrange the stack and map it to a pure Rust structure:

global_asm! {
    ".global _start",
    "_start:",
    "mov rdi, [rsp]",
    "lea rsi, [rsp + 8]",
    "push rsi",
    "push rdi",
    "mov rdi, rsp",
    "call main"
}

#[repr(C)]
pub struct ProcessArguments {
    argc: usize,
    argv: *const *const u8,
}

impl ProcessArguments {
    pub fn len(&self) -> usize {
        self.argc
    }

    pub fn get(&self, index: usize) -> Option<&CStr> {
        if index >= self.argc {
            return None
        }

        unsafe {
            Some(CStr::from_ptr(*self.argv.add(index) as *const i8))
        }
    }
}

The ProcessArguments struct does a lot of abstraction for dealing with pointers. It enables the main function to be completely free of unsafe code.

#[no_mangle]
extern "C" fn main(args: &ProcessArguments) -> ! {
    let buffer_size = 16 * 4096;
    let mut target = stdout_open();

    let mut buffer = match mem_alloc(buffer_size) {
        MemoryAllocation::Failed(_) => fail(-2, b"Cannot allocate memory.\n"),
        MemoryAllocation::Succeeded(value) => value,
    };

    let pathname = match args.get(1) {
        None => fail(-2, b"Cannot find path in the first argument.\n"),
        Some(value) => value
    };

    ...
}

Programming in Rust is not about being dogmatic and only working as Rustaceans would like you to work. The language itself is very powerful, thanks to features like pattern matching, borrowing, lifetimes and zero-cost abstractions. It’s up to you how the language can be adapted in a given context. Sometimes it’s better to be 100% on the Rust side; other times, taking just the language and sticking to C ways of thinking is the way to go.

As always, I’ve left it on GitHub: https://github.com/amacal/learning-rust/tree/linux-syscalls

Learning Rust: System Calls

Written by Adrian Macal