Link to files inside of compressed TAR archives ⥢ OVERHEAD

Example scenario

You have two compressed TAR archives, archive-1.tar.gz & archive-2.tar.gz, which have somewhat similar content. Ideally, you’d wrap the uncompressed TAR archives in another TAR archive which is then compressed, which we’ll name wrapper.tar.gz, to better compress the shared content between them.
However, the applications that use these archives expect them to be separate files; the wrapping TAR archive would make the content inaccessible.
You can’t, or don’t want to, use FUSE to mount the wrapping archive to the relevant path… so how could you solve this?

FIFO pipes

FIFO pipes, also known as “named pipes” more simply, are an underrated and powerful mechanism for solving a multitude of parallelisation problems. To explain what they do, here’s a demonstration:

First, create a FIFO pipe:

Example of the mkfifo command, followed by the ls -l command to show the FIFO pipe’s properties.

$ mkfifo fifo
$ ls -l fifo
prw-r--r-- 1 user user 0 Dec 21 12:00 fifo

Notice how the permissions of the fifo pipe starts with p, this is the indicator for the file being a pipe.

Next, try to write to the fifo pipe:

Example of writing to a FIFO pipe using the echo command, which in this case never terminates.

$ echo "hello world!" > fifo
_

As you can see, writing to the pipe causes… nothing to happen; the echo command never terminates. Until…

Example of reading from a FIFO pipe after first writing to it in another terminal, showing how the write was waiting on the pipe to be read from.

[terminal 2]$ cat fifo
hello world!
---
[terminal 1]$ echo "hello world!" > fifo
[terminal 1]$ _

FIFO pipes are IPC mechanisms which don’t store data, therefore, any attempts to write to a pipe must be paired with a process on the other side that is ready to read, which in this case was the cat command. Similarly, if we had instead run cat fifo as the first command, it would of been blocked until a process wrote to the fifo pipe; the process blocking is bidirectional.

As you can maybe imagine, this is incredibly useful for managing functions executed in parallel, since FIFO pipes provide an effective way to block commands on one another. But, as an additional bonus, we can exploit their behaviour to fake a file.

Fake, dynamic files

Instead of explaining step-by-step how you can create fake files, here’s a script which does all that for you:

A POSIX shell script which takes two arguments: The first argument is the path to the real file, and the second argument is the path to the FIFO pipe (aka. the “fake” file).

#!/bin/sh

input="$1"
fifo="$2"

print_exit() { printf "%b" "$1"; exit 1; }
clean() {
	if [ -p "$fifo" ]; then
		if ! rm "$fifo"; then
			print_exit "Failed to remove '$fifo' pipe!\n"; fi
	fi
}
# Ensures this script's FIFO pipes are removed when the script is stopped.
trap clean EXIT

# Clean up old FIFO pipes at the start, just in case.
clean

# General safety checks for the arguments.
if [ -z "$input" ] || [ -z "$fifo" ]; then
	print_exit "Not enough arguments given!\n"; fi

if [ ! -f "$input" ]; then
	print_exit "'$input' doesn't exist, or isn't a file!\n"; fi

if [ -e "$fifo" ] && [ ! -p "$fifo" ]; then
	print_exit "'$fifo' already exists, but isn't a pipe!\n"; fi

# Queue up another `cat` command after the current one is finished, and check
# that the FIFO pipe is still there.
while true; do
	if [ ! -p "$fifo" ]; then
		if ! mkfifo "$fifo"; then
			print_exit "Failed to create '$fifo' pipe!\n"; fi
	fi

	if ! cat "$input" > "$fifo"; then
		print_exit "Failed to write to '$fifo'!\n"; fi
done

I’ve tried to make this script as robust as possible, so no matter what, the FIFO pipe is removed once the script exits, and otherwise, the FIFO pipe is always available to be read without any downtime.

When running this script, cat /path/to/fifo-pipe will act almost identically to cat /path/to/file. This is thanks to the cat "$input" > "$fifo" line in the while-loop which ensures every read of the FIFO pipe is answered with the contents of the input file.

To be clear, there are differences between the original file and the pipe, it’s not the same as a symlink (and even they differ in some ways too), however, most tools don’t know or care about the differences.

But uh, how does this help us with our problem exactly?

Extracting files from TAR archives

Simple: We just replace the cat "$input" > "$fifo" command at the end of the while-loop with the following:

Example snippet which, instead of running cat on the input, uses the tar command to extract the necessary content from the input and writes it to a FIFO pipe.

#!/bin/sh

input="$1"
fifo="$2"
extract="$3"

...

while true; do
	...
	if ! tar -xf "$input" "$extract" -O > "$fifo"; then
		print_exit "Failed to extract '$extract' from '$input'!\n"; fi
done

tar command options:

-x: Decompress the archive.

-f: The input path to the archive.

-O: Write the output to STDOUT (to then redirect to the FIFO pipe).

With this change, the syntax of the script now looks like this:

./fifo.sh /path/to/archive /path/to/fifo-pipe /path/to/file-inside-archive &

Notice the & character at the end, which runs the script as a background process as opposed to never terminating.

So, to finally get back to the original scenario, you could execute this command:

./fifo.sh wrapper.tar.gz archive-1.tar archive-1.tar &

…and then repeat this for archive-2.tar too. Once that’s done, you can now read from archive-1.tar & archive-2.tar as if they’re real files, however, the actual file contents are still within the compressed TAR archive wrapper.

Solved!

So uh, why did you need-

OKAY FINE! I was trying to combine the /var/lib/pacman/sync/*.db database files, which are just Gzip compressed TAR archives, into one compressed archive to see if it would save space.

Did it work?

Yes, if you ignore the problem of writing back to the databases inside the wrapper archive.

Was it worth it?

Well… no, but you got a blog post out of it, so eh, it could be worse.