[cs615asa] [CS615] HW5 Handling Garbage Input Properly

Matthew Gomez mgomez1 at stevens.edu
Fri Apr 6 15:45:23 EDT 2018


I believe en.mw is an aggregation of all the traffic to all mobile sites of
all projects. Also, I believe the link to the file format description is
wrong. I believe the dataset we’re working with is this:
https://wikitech.wikimedia.org/wiki/Analytics/Archive/Data/Pagecounts-raw
while the link on the assignment is to this:
https://wikitech.wikimedia.org/wiki/Analytics/Archive/Data/Pagecounts-all-sites

Matt

On Fri, Apr 6, 2018 at 2:59 PM Jan Schaumann <jschauma at stevens.edu> wrote:

> Matthew Gomez <mgomez1 at stevens.edu> wrote:
> > I?m not sure if we?re supposed to be looking for ?en? or ?en.*?.
>
> Asking for anything that's not clear is a good approach! :-)
>
> Any and all programming assignments will make assumptions and require
> follow-up and clarification.
>
> "en" means all of English language, so it seems reasonable to say that
> "all en" means "all things that fall into the 'en' category, including
> e.g. en.m".
>
> So I think "lines where the first field begins with 'en'" is probably a
> good start.  (Are there domains starting with 'en' that designate a
> different domain?  I don't know off the top of my head.)
>
> > worked on it for about 6.5 hours yesterday and couldn?t get the first
> > answer.
>
> I think you're overthinking the problem.  This shouldn't take you this
> long.
>
> Keep also in mind that it's entirely possible that whatever I claim as
> the correct answer is actually wrong and your answer is right.  If you
> get a wrong answer, and you don't understand why it's wrong, post here
> showing how you arrived at the answer.
>
> -Jan
> _______________________________________________
> cs615asa mailing list
> cs615asa at lists.stevens.edu
> https://lists.stevens.edu/mailman/listinfo/cs615asa
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.stevens.edu/pipermail/cs615asa/attachments/20180406/effcfeda/attachment.html>


More information about the cs615asa mailing list