[cs631apue] extra credit / revline

Jan Schaumann jschauma at stevens.edu
Tue Dec 20 23:02:36 EST 2011


Hello,

Thanks for submitting your extra credit solutions.  They were all very
good.  The key to this problem lies in the definition of the problem --
recall that the problem description said that you should be able to
handle "arbitrarily large" input files.  That is, the file might be 4GB
in size or more.

That means, you can't just read it into an array and then just traverse
the array backwards to print the contents.

Some of you attempted to solve the problem by recursively getting lines
from the file, so that once the end is reached the whole stack unloops
and you print the lines in reverse order.  I like that approach, it's
fairly elegant, but the problem is that you're still bound by your stack
size (since you can't do true tail-recursion, which the compiler could
optimize -- you have to retain the buffer and print it before
returning).

Note also that not only can the entire file be arbitrarily large, the
lines may also be arbitrarily long.  That is, the file might contain a
single line that is 5 bazillion characters long.  You can't just create
a buffer and say "oh, surely no line will be longer than 1024 characters
- that'd just be crazy".

So the only approach to handle *arbitrarily large* files (and lines) is
to painstakingly seek backwards from the end of the file one character
at a time, check if that is a '\n', and if so, print all the characters
between your current position and the last '\n' (or EOF).

Please note that there exists a tool called tac(1) on some systems that
does exactly what revline(1) does (since "tac" is "cat" backwards).  You
can look at the source for this tool (I believe it's in Linux's
"coreutils" package) to find that it does precisely this.

This tool also accepts "-" or input on STDIN, much like cat(1), which is
kind of surprising, since we know we can't seek on STDIN.  tac(1) checks
if it can seek, and if not it first writes the input to a temporary
file, which it then processes as any other file and unlinks after it's
done.  This isn't very elegant (and prone to failure if the input on
STDIN is large enough to fill up the disk space, for example), but the
only way for this tool to work without reading contents into memory.

-Jan


More information about the cs631apue mailing list