Take a look at the following.
{ cmd1 3>&- | tee /dev/fd/4 4>&1 >&3 3>&- | cmd2 3>&-; } 3>&1
Uh… what?
Background
While setting up this site I encountered an interesting problem. I knew I wanted a simple static site, nothing heavy to operate. I also didn’t want to force readers to run JavaScript for a good experience. Still, I wanted to collect some analytics to gauge what was happening. I searched for JavaScript-free analytics and discovered a neat little tool: GoAccess. It will generate analytics from the server access logs and store them on local disk. Perfect!
The documentation mentions a couple of ways to run goaccess. You could run it
in a cron job, or as a service collecting real-time data and it opens up some
websockets for communication. The former option is an out-of-band process
disconnected from log rotation; how would I optimize the frequency of the job?
The latter was too complicated for my needs.
The tool I use for managing and rotating logs, s6-log,
has support for executing a “processor” during rotation, for tasks like
compressing them with gzip. This is exactly what I wanted. This keeps the log
rotation event coupled with a background parsing event for analytics. It’s the
perfect time to run it.
The Challenge
s6-log’s documentation says the following about a custom processor.
When a rotation occurs, current (which has then been renamed previous) is fed to processor’s stdin, and processor’s stdout is saved and archived.
So s6-log rotates the current active log to a new file called previous.
Then it feeds the contents of previous to the standard input of your custom
processor. It then waits for your processor to spit out the “processed” logs
for archival.
goaccess can accept logs on its standard input file descriptor
for parsing, but it has no option for also outputting them afterwards.
I needed a way to split the log output.
There’s a nice utility called tee that can take input and send it to both a
file and to standard out. But the typical way of sending output to another
process is by using standard out like so:
echo something | tee /somefile | goaccess
That doesn’t give me what I want because the standard output file descriptor in
the middle is used up in sending to goaccess. I need to keep it and send it
after goaccess is done.
The Solution
Looking around to see if anyone else had encountered this scenario, I found a
nice Stack Exchange page.
And while the accepted answer may be good for some, it wouldn’t work for my
setup, partly because my shell (ash from busybox) didn’t like the syntax.
Instead, it was this comment
on the accepted answer that caught my eye.
And that is the same shell snippet you saw at the beginning of this post. I
needed to adapt it slightly to work with s6-log, partly because it is already
using fds 3 and 4.
Here’s the final ‘processor’ script:
#!/bin/sh
{ tee /dev/fd/7 7>&1 >&6 6>&- <&0 | \
goaccess - -a -o /srv/report.html >/dev/null 2>&1 6>&-; } 6>&1
I saved that to /usr/local/bin/goaccess_processor.sh and then set up s6-log
to use it like this:
s6-log -b -- !"/usr/local/bin/goaccess_processor.sh" /var/log/nginx
This works. But how? What are all those difficult-to-read redirects doing?
I was going to try to break it down bit by bit here but there are existing tools that can do that really well. Here is one: explainshell.com