[cs615asa] Wrong Answers

Fri Apr 6 18:04:32 EDT 2018

Since I want to avoid all of you wasting time, here are the answers that
I came up with.  You can use those as a basis to check your method (many
other ways of finding the answers exist) or to discuss which one is the
right answer:

Jason G Ajmo <jajmo at stevens.edu> wrote:

> 1:
> # gzcat data.gz | grep "^en[\. ]" | wc -l
>  2233318

That's correct.  I don't know why I had the wrong answer, but chances
are it has to do with me trying to reconstruct the assignment and ending
up using a different dataset.  I'm sorry about that.

FYI: You can save yourself a few extra processes by using a single grep
command:

zgrep -c "^en[\. ]" pagecounts-20160803-090000.gz

> 2:
> # gzcat data.gz | grep "^en[\. ]" | awk '{ print $2 " " $(NF - 1) }' | sort -nrk 2 | head -n 1
> en 3127515

I think that answer should be right, too.  I ended up doing:

gzcat data.gz | awk '/^en[\. ]/ { print $(NF - 1) " " $0 }' | sort -n | tail -1

By using / matching in awk, I'm avoiding an extra grep(1) process.  If
you want, you can do the whole thing in awk, too, including the sorting
and extraction, but it becomes less pipe-liney that way.

For #3, I (now) ended up with:

gzcat data.gz | awk '/^en[\. ]/ { sum += $NF; } END { print sum ; }'

yielding '260243085755'.

For #4:

$ gzcat data.gz | awk '/^en[\. ]/ { sum += $(NF - 1); } END { print sum / 3600 ; }'
3256.12

For #5:

$ gzcat data.gz | awk '/^en[\. ]/ {s = $NF / $(NF - 1);
	if (s > l) { l = s; largest = $2; }}
	END { print largest; }'
Module:Syrian_and_Iraqi_insurgency_detailed_map/doc

The important lesson in this first part of the assignment is to always
seek clarification, ask questions, and to not blindly trust that your
instructor (or the test framework you are using, as which I function
here) is necessarily correct.

Keep that in mind for the second and third part of the assignment...

-Jan