File Descriptors and Redirects in POSIX shell

Take a look at the following.

{ cmd1 3>&- | tee /dev/fd/4 4>&1 >&3 3>&- | cmd2 3>&-; } 3>&1

Uh… what?

Background

While setting up this site I encountered an interesting problem. I knew I wanted a simple static site, nothing heavy to operate. I also didn’t want to force readers to run JavaScript for a good experience. Still, I wanted to collect some analytics to gauge what was happening. I searched for JavaScript-free analytics and discovered a neat little tool: GoAccess. It will generate analytics from the server access logs and store them on local disk. Perfect!

The documentation mentions a couple of ways to run goaccess. You could run it in a cron job, or as a service collecting real-time data and it opens up some websockets for communication. The former option is an out-of-band process disconnected from log rotation; how would I optimize the frequency of the job? The latter was too complicated for my needs.

The tool I use for managing and rotating logs, s6-log, has support for executing a “processor” during rotation, for tasks like compressing them with gzip. This is exactly what I wanted. This keeps the log rotation event coupled with a background parsing event for analytics. It’s the perfect time to run it.

The Challenge

s6-log’s documentation says the following about a custom processor.

When a rotation occurs, current (which has then been renamed previous) is fed to processor’s stdin, and processor’s stdout is saved and archived.

So s6-log rotates the current active log to a new file called previous. Then it feeds the contents of previous to the standard input of your custom processor. It then waits for your processor to spit out the “processed” logs for archival.

goaccess can accept logs on its standard input file descriptor for parsing, but it has no option for also outputting them afterwards.

I needed a way to split the log output.

There’s a nice utility called tee that can take input and send it to both a file and to standard out. But the typical way of sending output to another process is by using standard out like so:

echo something | tee /somefile | goaccess

That doesn’t give me what I want because the standard output file descriptor in the middle is used up in sending to goaccess. I need to keep it and send it after goaccess is done.

The Solution

Looking around to see if anyone else had encountered this scenario, I found a nice Stack Exchange page. And while the accepted answer may be good for some, it wouldn’t work for my setup, partly because my shell (ash from busybox) didn’t like the syntax. Instead, it was this comment on the accepted answer that caught my eye.

And that is the same shell snippet you saw at the beginning of this post. I needed to adapt it slightly to work with s6-log, partly because it is already using fds 3 and 4.

Here’s the final ‘processor’ script:

#!/bin/sh
{ tee /dev/fd/7 7>&1 >&6 6>&- <&0 | \
  goaccess - -a -o /srv/report.html >/dev/null 2>&1 6>&-; } 6>&1

I saved that to /usr/local/bin/goaccess_processor.sh and then set up s6-log to use it like this:

s6-log -b -- !"/usr/local/bin/goaccess_processor.sh" /var/log/nginx

This works. But how? What are all those difficult-to-read redirects doing?

I was going to try to break it down bit by bit here but there are existing tools that can do that really well. Here is one: explainshell.com