Example scenario
- You have two compressed TAR archives,
archive-1.tar.gz
&archive-2.tar.gz
, which have somewhat similar content. Ideally, you’d wrap the uncompressed TAR archives in another TAR archive which is then compressed, which we’ll namewrapper.tar.gz
, to better compress the shared content between them. - However, the applications that use these archives expect them to be separate files; the wrapping TAR archive would make the content inaccessible.
- You can’t, or don’t want to, use FUSE to
mount
the wrapping archive to the relevant path… so how could you solve this?
FIFO pipes
FIFO pipes, also known as “named pipes” more simply, are an underrated and powerful mechanism for solving a multitude of parallelisation problems. To explain what they do, here’s a demonstration:
First, create a FIFO pipe:
Notice how the permissions of the
fifo
pipe starts withp
, this is the indicator for the file being a pipe.
Next, try to write to the fifo
pipe:
As you can see, writing to the pipe causes… nothing to happen; the echo
command never terminates. Until…
FIFO pipes are IPC mechanisms which don’t store data, therefore, any attempts to write to a pipe must be paired with a process on the other side that is ready to read, which in this case was the cat
command. Similarly, if we had instead run cat fifo
as the first command, it would of been blocked until a process wrote to the fifo
pipe; the process blocking is bidirectional.
As you can maybe imagine, this is incredibly useful for managing functions executed in parallel, since FIFO pipes provide an effective way to block commands on one another. But, as an additional bonus, we can exploit their behaviour to fake a file.
Fake, dynamic files
Instead of explaining step-by-step how you can create fake files, here’s a script which does all that for you:
I’ve tried to make this script as robust as possible, so no matter what, the FIFO pipe is removed once the script exits, and otherwise, the FIFO pipe is always available to be read without any downtime.
When running this script, cat /path/to/fifo-pipe
will act almost identically to cat /path/to/file
. This is thanks to the cat "$input" > "$fifo"
line in the while-loop which ensures every read of the FIFO pipe is answered with the contents of the input file.
To be clear, there are differences between the original file and the pipe, it’s not the same as a symlink (and even they differ in some ways too), however, most tools don’t know or care about the differences.
But uh, how does this help us with our problem exactly?
Extracting files from TAR archives
Simple: We just replace the cat "$input" > "$fifo"
command at the end of the while-loop with the following:
tar
command options:
-x
: Decompress the archive.
-f
: The input path to the archive.
-O
: Write the output to STDOUT (to then redirect to the FIFO pipe).
With this change, the syntax of the script now looks like this:
./fifo.sh /path/to/archive /path/to/fifo-pipe /path/to/file-inside-archive &
Notice the
&
character at the end, which runs the script as a background process as opposed to never terminating.
So, to finally get back to the original scenario, you could execute this command:
./fifo.sh wrapper.tar.gz archive-1.tar archive-1.tar &
…and then repeat this for archive-2.tar
too. Once that’s done, you can now read from archive-1.tar
& archive-2.tar
as if they’re real files, however, the actual file contents are still within the compressed TAR archive wrapper.
Solved!
So uh, why did you need-
OKAY FINE! I was trying to combine the /var/lib/pacman/sync/*.db
database files, which are just Gzip compressed TAR archives, into one compressed archive to see if it would save space.
Did it work?
Yes, if you ignore the problem of writing back to the databases inside the wrapper archive.
Was it worth it?
Well… no, but you got a blog post out of it, so eh, it could be worse.