Learning Rust: Streaming Tarball

Published in

Level Up Coding

15 min readMar 18, 2024

Is an async stream a mystery to you? You can totally grasp it if you write your own.

Async programming has become approachable for most software developers through the introduction of the async/await paradigm. The code almost looks like the classical blocking flow, which is achieved thanks to the compiler building a sophisticated state machine behind the scenes.

One layer down, Rust uses polling, wakers and pinning. These components do not use the async/await syntax, yet they still offer full non-blocking concurrency. The price to pay is the absence of a generated state machine, which you may need to write on your own. You will gain performance and full control over the working code.

One example where you want to and need to be at the polling level is with async streams. They integrate perfectly into the async/await world, still being a true non-blocking implementation in the background without significant overhead.

In this story, we are going to write a stream of binary chunks for an outgoing TAR archive. We will create a sophisticated state machine and attempt to demonstrate the final outcome. Additionally, we will try to integrate it into the process of uploading files into a running container using the Docker Runtime API.

Asynchronous streams are very similar to regular iterators. You often iterate over various collections. For example, you might write a regular loop like the following one:

fn main() {
    let numbers = vec![1, 2, 3, 4, 5];

    for number in numbers.iter() {
        println!("Number: {}", number);
    }
}

The code uses a for loop to idiomatically iterate over the collection. This is essentially equivalent to the following syntax. Note that the temporarily created iterator is a mutable variable.

fn main() {
    let numbers = vec![1, 2, 3, 4, 5];
    let mut iter = numbers.iter();

    while let Some(number) = iter.next() {
        println!("Number: {}", number);
    }
}

Because we are iterating over a vector, we know it is incredibly fast. The next function doesn’t perform any I/O operations. What if our iterator provides a series of numbers or network connections? We could change just a few lines to benefit from async/await.

#[tokio::main]
async fn main() {
    let mut numbers = fetch_numbers();

    while let Some(number) = numbers.next().await {
        println!("{}", number);
    }
}

You can see it almost resembles a regular iterator, but it’s actually an async stream. Isn’t that beautiful? Let’s explore how the not yet implemented traits appear.

pub trait Iterator {
    type Item;

    fn next(&mut self) -> Option<Self::Item>;
}

pub trait Stream {
    type Item;

    fn poll_next(
        self: Pin<&mut Self>,
        cx: &mut Context<'_>
    ) -> Poll<Option<Self::Item>>;
}

You may notice similarities. Both define a returning item type, exactly the one you’re matching in the while loop. They both require an code of a next function to provide the next item to be yielded.

What actually is TAR and how it relates to async streams in Rust, you may ask. The TAR file format is essentially a method for grouping a collection of files and directories into a single file. This format is closely linked to the history of tape drives, which were a form of data storage in the early days of computing. It was specifically designed for tape storage, reflecting its name “Tape ARchive”.

To create a tarball, we process multiple files. For each file, we need to create 512 bytes of header, then append the file content with padding of 512 bytes if needed. You see, a lot of files means a lot of I/O operations. Isn’t this an invitation to play with async streams?

Let’s start by looking at a quite simple TAR header created for a sample Cargo.toml file I had on hand. As mentioned earlier, the header must be exactly 512 bytes and encodes plenty of information about a single file.

00000000: 4361 7267 6f2e 746f 6d6c 0000 0000 0000  Cargo.toml......
00000010: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000020: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000030: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000040: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000050: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000060: 0000 0000 3030 3030 3634 3400 3030 3031  ....0000644.0001
00000070: 3735 3000 3030 3031 3735 3000 3030 3030  750.0001750.0000
00000080: 3030 3030 3037 3000 3134 3537 3136 3533  0000070.14571653
00000090: 3534 3000 3031 3233 3435 0020 3000 0000  540.012345. 0...
000000a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000b0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000c0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000f0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000100: 0075 7374 6172 2020 0076 7363 6f64 6500  .ustar  .vscode.
00000110: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000120: 0000 0000 0000 0000 0076 7363 6f64 6500  .........vscode.
00000130: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000140: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000150: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000160: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000170: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000180: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000190: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001b0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001c0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001f0: 0000 0000 0000 0000 0000 0000 0000 0000  ................

With the documentation at hand, we can try to decode what is encoded:

File name: Cargo.toml
File permissions: 0000644, encoded as an octal string
User and Group: 0001750, encoded as an octal string
File size: 0000070, encoded as an octal string -> 56 bytes
File timestamp: 14571653540, encoded as an octal string -> 1709660000
Header checksum: 012345, encoded as an octal string

Apart from that, the file contains other optional data that we could ignore when generating a header.

It‘s time to write basic Rust code. We will start by defining a few structs and a sample usage of them without actual implementation. Imagine you can create a new instance of the TarArchive struct and add a few files, just by names.

enum TarEntry {
    File(String),
}

struct TarArchive {
    entries: Vec<TarEntry>,
}

impl TarArchive {
    fn new() -> Self {
        Self { entries: Vec::new() }
    }

    fn append_file(&mut self, file: String) {
        self.entries.push(TarEntry::File(file));
    }
}

You see, the archive contains only entries to be archived. Now, we can add the possibility of converting it into a stream. This will consume the archive.

impl TarArchive {
    ...

    fn into_stream(self, buffer_size: usize) -> TarStream {
        TarStream::new(self.entries, buffer_size)
    }
}

struct TarStream {
    buffer_size: usize,
    entries: VecDeque<TarEntry>,
}

impl TarStream {
    fn new(entries: Vec<TarEntry>, buffer_size: usize) -> Self {
        Self {
            buffer_size: buffer_size / 512 * 512,
            entries: entries.into(),
        }
    }
}

We’ve almost created a stream. The only missing piece is the actual code of the Stream trait.

enum TarChunk {
    Header(String, Box<[u8; 512]>),
    Data(Vec<u8>),
    Padding(usize),
}

enum TarError {}
type TarResult<T> = Result<T, TarError>;

impl Stream for TarStream {
    type Item = TarResult<TarChunk>;

    fn poll_next(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Option<Self::Item>> {
        todo!()
    }
}

The trait implementation requires defining what a stream item is. In our case, we want to return a chunk, exactly three variants of a chunk. The header will contain a file name and 512 bytes of the payload. The data variant is a slice of the read file, but it may also contain padding, as the documentation specifies. And finally, a padding variant — two empty chunks are always sent at the end of each tarball.

Having such a draft, we can write a consumer. Writing out to the console what happens may be good enough. The following code will generate a tarball for three files. It won’t write it down but will only show interactive progress.

#[tokio::main]
async fn main() {
    let mut archive = TarArchive::new();

    archive.append_file("enwiki-20230801-pages-meta-history27.xml-p74198591p74500204".to_owned());
    archive.append_file("lubuntu-22.04.3-desktop-amd64.iso".to_owned());
    archive.append_file("qemu-8.2.1.tar.xz".to_owned());

    let mut stream = archive.into_stream(10 * 1024 * 1024);

    while let Some(chunk) = stream.next().await {
        match chunk {
            Ok(TarChunk::Header(path, _)) => println!("\nheader {path}"),
            Ok(TarChunk::Data(_)) => print!("."),
            Ok(TarChunk::Padding(0)) => println!("\npadding 0"),
            Ok(TarChunk::Padding(index)) => println!("padding {index}"),
            Err(error) => println!("error: {:?}", error),
        }

        std::io::stdout().flush().unwrap();
    }
}

On my machine it produces the following output:

header enwiki-20230801-pages-meta-history27.xml-p74198591p74500204
.........................................................................
.........................................................................
.........................................................................
.........................................................................
.........................................................................
.........................................................................
.............................................................
header lubuntu-22.04.3-desktop-amd64.iso
.........................................................................
.........................................................................
.........................................................................
............................................................
header qemu-8.2.1.tar.xz
.............
padding 0
padding 1

The number of dots represents each 10MB data chunk. The app runs for only a few seconds.

What could be added to our implementation to achieve such an outcome? We could add a state field to the stream to know where we are in the state machine.

struct TarStream {
    state: TarState,
    ...
}

impl TarStream {
    pub fn new(entries: Vec<TarEntry>, buffer_size: usize) -> Self {
        Self {
            state: TarState::init(),
            ...
        }
    }
}

impl Stream for TarStream {
    ...

    fn poll_next(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Option<Self::Item>> {
        let self_mut = self.get_mut();

        loop {
            let mut state = TarState::completed();
            mem::swap(&mut state, &mut self_mut.state);

            let result = match state {
                TarState::Init(state) => state.poll(cx),
                TarState::Open(state) => state.poll(cx),
                TarState::Header(state) => state.poll(cx),
                TarState::Read(state) => state.poll(cx),
                TarState::Padding(state) => state.poll(cx),
                TarState::Completed(state) => state.poll(cx),
            };

            let (state, poll) = match result {
                TarPollResult::ContinueLooping(state) => (state, None),
                TarPollResult::ReturnPolling(state, poll) => (state, Some(poll)),
                TarPollResult::NextEntry() => match self_mut.entries.pop_front() {
                    None => (TarState::padding(), None),
                    Some(entry) => (TarState::open(self_mut.buffer_size, entry), None),
                },
            };

            self_mut.state = state;

            if let Some(poll) = poll {
                return poll;
            }
        }
    }
}

You see, each state variant receives a poll. It returns a polling result, and based on this, a decision is made whether to continue iterating, return the found polling result, or just jump to the next entry. Each time, the current state is updated.

But what exactly is polling? We will be using it quite often. From one perspective, we are being polled when we implement an asynchronous stream. From the other side, we will delegate polling to other pollable entities. There is one common aspect in each poll function: it takes a Context and returns a Poll variant.

The Context contains a waker, which is used to resume a task if it’s blocked. We won’t delve deeper into this topic in this story. What’s more important to understand is the Poll enum. It has two variants: Pending and Ready.

The Pending variant indicates that the polling process is not yet complete. When you receive this variant from the function, you know you need to wait more, but there’s no need to poll again immediately. If you receive Ready, it means you’ve obtained the value you were looking for.

I hope this explanation has helped you understand the previous listing. You might also wonder why we need to have a state machine. Imagine a situation where someone calls your poll function and you return Pending. This means you’ve promised the caller to notify them when they can continue. When that happens, the notified caller will call your poll function again, and you need to identify where you last left off with them. You are responsible for tracking their progress. The state machine pattern is an elegant way to model this.

The defined state machine for handling tarball creation contains a few states modeled as enum variants:

enum TarState {
    Init(TarStateInit),
    Open(TarStateOpen),
    Header(TarStateHeader),
    Read(TarStateRead),
    Padding(TarStatePadding),
    Completed(TarStateCompleted),
}

Each of these states has a special meaning:

Init represents the beginning of processing a single file.
Open keeps information about the started process of opening a file.
Header holds information about the started process of reading metadata, such as length and permissions.
Read tracks the progress of reading from a file, for each chunk.
Padding is used to indicate which padding information was already outputted.
Completed represents the final stage of stream generation.

The essential flow of the state machine can be depicted as follows in the diagram. Keep in mind that in the case of any error, we transition directly to the completed state.

Let’s first try to understand the initial state as a warming-up exercise.

impl TarStateHandler for TarStateInit {
    fn poll(self, _cx: &mut Context<'_>) -> TarPollResult {
        TarPollResult::NextEntry()
    }
}

You see, the initial state does almost nothing. Its essential function is to instruct the stream to proceed to the next entry. Remember, the process of interpreting the result returned from any state is handled by the stream itself:

let (state, poll) = match result {
    TarPollResult::ContinueLooping(state) => (state, None),
    TarPollResult::ReturnPolling(state, poll) => (state, Some(poll)),
    TarPollResult::NextEntry() => match self_mut.entries.pop_front() {
        None => (TarState::padding(), None),
        Some(entry) => (TarState::open(self_mut.buffer_size, entry), None),
    },
};

In this case, “next entry” results in picking up the next entry from a vector. If nothing is found, we start padding; otherwise, we transition to the ‘open’ state with the newly selected entry.

The open state is a bit more complicated. Opening a file is the operation that may block. However, it seems to not be fully supported by some operating systems, and Tokio artificially makes a standard call somehow asynchronous.

pub async fn open(path: impl AsRef<Path>) -> io::Result<File> {
    let path = path.as_ref().to_owned();
    let std = asyncify(|| StdFile::open(path)).await?;

    Ok(File::from_std(std))
}

What does this mean for us? It presents some challenges because we need to poll a future. Polling a future requires storing it as a pinned and boxed dynamic object.

struct TarStateOpen {
    buffer_size: usize,
    task: Pin<Box<dyn Future<Output = Result<(String, File), std::io::Error>> + Send>>,
}

impl TarStateOpen {
    fn new(buffer_size: usize, entry: TarEntry) -> Self {
        let task = async move {
            match entry {
                TarEntry::File(path) => match File::open(&path).await {
                    Ok(file) => Ok((path, file)),
                    Err(error) => Err(error),
                },
            }
        };

        Self {
            buffer_size: buffer_size,
            task: Box::pin(task),
        }
    }
}

impl TarStateHandler for TarStateOpen {
    fn poll(mut self, cx: &mut Context<'_>) -> TarPollResult {
        let (path, file) = match self.task.as_mut().poll(cx) {
            Poll::Pending => return TarState::Open(self).pending(),
            Poll::Ready(Err(error)) => return TarState::failed(TarError::IOFailed(error)),
            Poll::Ready(Ok((path, file))) => (path, file),
        };

        TarStateHeader::new(self.buffer_size, path, file).poll(cx)
    }
}

You see, in the constructor, we store a pinned reference to a future returned from the async block. We cannot await it, but we can poll it later in the poll function. If we get Pending, we simply return the same state along with a Pending result of a stream. If we successfully receive a tuple, we pass it to the next state. Isn’t it straightforward?

The header state will also need to poll a future, for the same reason. Fetching metadata is only available through a blocking call, and artificial asyncification was introduced by Tokio:

pub async fn metadata(&self) -> io::Result<Metadata> {
    let std = self.std.clone();
    asyncify(move || std.metadata()).await
}

It means we need to store this future along with the file we received in the previous state.

struct TarStateHeader {
    buffer_size: usize,
    path: String,
    task: Pin<Box<dyn Future<Output = Result<(File, Metadata), std::io::Error>> + Send>>,
}

impl TarStateHeader {
    fn new<'a>(buffer_size: usize, path: String, file: File) -> TarStateHeader {
        let task = async move {
            match file.metadata().await {
                Ok(metadata) => Ok((file, metadata)),
                Err(error) => Err(error),
            }
        };

        Self {
            path: path,
            task: Box::pin(task),
            buffer_size: buffer_size,
        }
    }
}

impl TarStateHandler for TarStateHeader {
    fn poll(mut self, cx: &mut Context<'_>) -> TarPollResult {
        let (file, metadata) = match self.task.as_mut().poll(cx) {
            Poll::Pending => return TarState::Header(self).pending(),
            Poll::Ready(Err(error)) => return TarState::failed(TarError::IOFailed(error)),
            Poll::Ready(Ok(metadata)) => metadata,
        };

        let length: u64 = metadata.len();
        let header: TarHeader = TarHeader::empty(self.path);

        match header.write(&metadata) {
            Ok(chunk) => TarState::read(self.buffer_size, file, length).ready(chunk),
            Err(error) => TarState::failed(error),
        }
    }
}

The metadata is used to populate the TAR header structure and the file will be passed to the next stage. As in the previously described state, the pending result means we need to return it along with a state (self) to resume.

You might be interested in how the TAR header is constructed and converted into a chunk. I have intentionally omitted some code to demonstrate just the essential steps of building the header.

struct TarHeader {
    path: String,
    data: Box<[u8; 512]>,
}

impl TarHeader {
    ...

    fn write(mut self, metadata: &Metadata) -> TarResult<TarChunk> {
        let data = &mut self.data;

        Self::write_name(data, &self.path)?;
        Self::write_mode(data, metadata)?;
        Self::write_uid(data, 0)?;
        Self::write_gid(data, 0)?;
        Self::write_size(data, metadata)?;
        Self::write_mtime(data, metadata)?;
        Self::write_magic(data)?;
        Self::write_type_flag(data)?;
        Self::write_chksum(data)?;

        Ok(self.into())
    }
}

impl Into<TarChunk> for TarHeader {
    fn into(self) -> TarChunk {
        TarChunk::header(self.path, self.data)
    }
}

Finally, the read state. It appears to be the most important because the CPU will spend most of its time here. It will also be the most frequently polled state.

struct TarStateRead {
    buffer_size: usize,
    file: File,
    left: usize,
    completed: usize,
    chunk: TarChunk,
    offset: usize,
}

impl TarStateRead {
    fn new(buffer_size: usize, file: File, length: u64) -> Self {
        let left = length as usize / 512;
        let available = buffer_size / 512;

        let pages = std::cmp::min(available, left);
        let pages = pages + if length as usize > 0 { 1 } else { 0 };

        Self {
            buffer_size: buffer_size,
            file: file,
            left: length as usize,
            completed: 0,
            chunk: TarChunk::data(pages),
            offset: 0,
        }
    }

    fn advance(self, bytes: usize) -> Self {
        Self {
            buffer_size: self.buffer_size,
            file: self.file,
            left: self.left - bytes,
            completed: self.completed + bytes,
            chunk: self.chunk,
            offset: self.offset + bytes,
        }
    }

    fn next(self) -> (TarChunk, Self) {
        let left = self.left / 512;
        let available = self.buffer_size / 512;

        let pages = std::cmp::min(available, left);
        let pages = pages + if self.left % 512 > 0 { 1 } else { 0 };

        (
            self.chunk,
            Self {
                buffer_size: self.buffer_size,
                file: self.file,
                left: self.left,
                completed: self.completed,
                chunk: TarChunk::data(pages),
                offset: 0,
            },
        )
    }
}

You see, we have many more fields to manage. The state contains not only a file to read from but also its length and some numbers indicating the current position, the number of remaining bytes, or the offset within the current chunk. You can also notice a current chunk that is not yet fully filled or might even be empty.

We defined two useful functions: advance and next. The first one assists us in advancing the offset in the current chunk, while the latter creates a new state, returning the previously filled chunk.

Because reading is the most crucial part of our stream, the state handler also requires several lines of code:

impl TarStateHandler for TarStateRead {
    fn poll(mut self, cx: &mut Context<'_>) -> TarPollResult {
        let pinned: Pin<&mut File> = Pin::new(&mut self.file);
        let data = match self.chunk.offset(self.offset) {
            Err(error) => return TarState::failed(error),
            Ok(data) => data,
        };

        let mut buffer: ReadBuf<'_> = ReadBuf::new(data);
        match pinned.poll_read(cx, &mut buffer) {
            Poll::Pending => return TarState::Read(self).pending(),
            Poll::Ready(Err(error)) => return TarState::failed(TarError::IOFailed(error)),
            _ => (),
        }

        let read: usize = buffer.filled().len();
        let advanced: TarStateRead = self.advance(read);

        if advanced.left == 0 {
            return TarState::init().ready(advanced.chunk);
        }

        if advanced.offset == advanced.chunk.len() {
            let (chunk, state) = advanced.next();
            return TarState::from(TarState::Read(state)).ready(chunk);
        }

        TarState::from(TarState::Read(advanced)).looping()
    }
}

The main flow is similar. When the data is not ready, we return the pending state. But when we receive some data, we need to make a few decisions. Is the current chunk fully filled? Have we finished reading the file? The answers to these questions affect the returned state or perhaps the stream, yielding a chunk.

What’s next? The padding state, right? It’s one of the simplest states. It includes a field for its index.

struct TarStatePadding {
    index: usize,
}

impl TarStatePadding {
    fn new() -> Self {
        Self { index: 0 }
    }

    fn next(self) -> Self {
        Self { index: self.index + 1 }
    }
}

We know we need to send exactly two of them at the end. This makes the state handler relatively straightforward.

impl TarStateHandler for TarStatePadding {
    fn poll(self, _cx: &mut Context<'_>) -> TarPollResult {
        match self.index {
            0 => TarState::Padding(self.next()).ready(TarChunk::padding(0)),
            index => TarState::completed().ready(TarChunk::padding(index)),
        }
    }
}

Once we know we’ve sent two paddings, we need to transition to the completed state. Is it sophisticated? Not at all. It simply involves returning None to indicate the end of the stream.

impl TarStateHandler for TarStateCompleted {
    fn poll(self, _cx: &mut Context<'_>) -> TarPollResult {
        TarPollResult::ReturnPolling(TarState::completed(), Poll::Ready(None))
    }
}

When we’ve covered all the states, you may feel overwhelmed. All this code could be actually generated by the compiler when you used async/await. It would be a bit more concise because maintainability is not the compiler’s priority.

I mentioned in the introduction that we would also try to use this async stream to upload a few files into the created Docker container. In my previous story, I built a lightweight HTTP client via Unix Sockets to communicate with the Docker Runtime API. We will extend its capabilities.

Let’s start by integrating our tarball stream with hyper’s body stream, which is also an async stream to be implemented. Hyper is smart enough not to expect us to pass 8GB of data for a PUT request. We can provide a stream of body chunks instead. Let’s write one:

pub struct TarBody {
    inner: TarStream,
}

impl TarBody {
    pub fn from(stream: TarStream) -> Self {
        Self { inner: stream }
    }
}

impl Body for TarBody {
    type Data = Bytes;
    type Error = DockerError;

    fn poll_frame(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Option<Result<Frame<Self::Data>, Self::Error>>> {
        let self_mut: &mut TarBody = self.get_mut();
        let pointer: &mut TarStream = &mut self_mut.inner;
        let inner: Pin<&mut TarStream> = Pin::new(pointer);

        match inner.poll_next(cx) {
            Poll::Pending => Poll::Pending,
            Poll::Ready(chunk) => match chunk {
                None => Poll::Ready(None),
                Some(Err(error)) => Poll::Ready(Some(DockerError::raise_outgoing_archive_failed(error))),
                Some(Ok(chunk)) => {
                    let data: Vec<u8> = chunk.into();
                    let frame: Frame<Bytes> = Frame::data(Bytes::from(data));

                    Poll::Ready(Some(Ok(frame)))
                }
            },
        }
    }
}

In 33 lines of code, we’ve magically converted a tar chunk into a frame of bytes expected by hyper, handling all the errors in the process. How could this be integrated into the HTTP client? Consider the following code:

async fn container_upload(&self, id: &str, path: &str, archive: TarArchive) -> DockerResult<ContainerUpload> {
    let url: String = format!("/v1.42/containers/{id}/archive?path={path}");
    let connection: DockerConnection<TarBody> = DockerConnection::open(&self.socket).await?;

    let stream: TarStream = archive.into_stream(256 * 1024);
    let data: TarBody = TarBody::from(stream);

    match connection.put(&url, data).await {
        Ok(response) => match response.into_bytes().await {
            Ok(_) => Ok(ContainerUpload::Succeeded),
            Err(error) => Err(error),
        },
        Err(error) => match error {
            DockerError::StatusFailed(url, status, response) => match status.as_u16() {
                400 => Ok(ContainerUpload::BadParameter(response.into_error().await?)),
                403 => Ok(ContainerUpload::PermissionDenied(response.into_error().await?)),
                404 => Ok(ContainerUpload::NoSuchContainer(response.into_error().await?)),
                500 => Ok(ContainerUpload::ServerError(response.into_error().await?)),
                _ => Err(DockerError::StatusFailed(url, status, response)),
            },
            error => Err(error),
        },
    }
}

The upload function takes our tarball stream, converts it into a tar body stream and passes it to the put function. I would call it a perfect integration.

When do we know it works as expected? Let’s create a container, upload a few files, and compute their hashes. We can then compare those hashes with our local files to confirm it’s working as expected.

In this story we covered very detailed implementation of a async stream. We created quite sophisticated state machine. We managed to check the tarball creation in the console and also proved that the created TAR was fully recognized by the Docker Runtime API.

The code from this story you may find here: https://github.com/amacal/etl0/tree/b19e7762e7aaa13c2907279b7a6cdebd200c178e/src/etl0/src/tar

Learning Rust: Streaming Tarball

Written by Adrian Macal