[cs615asa] HM#N: Wen Zhang

Sat May 2 12:38:35 EDT 2015

Hi all, 

The following is my report for HW#N. 

 Last Thursday, I went to the meet-up "A gentle intro to regex" with my
friends and learned lots of extra important tips for the regex. The main
purpose of the event is that giving us a simple but impressive principle
to develop with regex in good manner. 
 At the beginning of regex party, the lecture introduce the main purpose
of regex - Validate, Route & Matching, Search & Replace(substitution),
Parse. As a system administrator, we are always analyzing the status of
groups of servers or automating the work of updating or configuring the
work environment for team. The regex is fundamentally useful in a system
administrator's routine work. We need to use "Route & Matching" to do
analyze, e.g. if we want to write a script to get the basic information
of IP, MASK, MAC, it's very convenient for us to implement if applying
the regex in our implementation. Search & Replace is also our friend,
the typical example is that we always use the vim tools to locate and
modify string via regex. However, as the lecture says that the "Parse"
is a limited function.
 Then, the lecture moved on the topic and introduces the most interested
part of the event. He gives us an example if we want to catch the video
id (e.g. dQw4w9WgXcQ) and the url is:
https://youtube.com/watch?v=dQw4w9WgXcQ. There're lots of way to match
the specified string. However, we should obey an important regulation to
write a comfortable and elegant regex.
1: match only what's necessary
E.g. url.match(_/.+://.+/.+v?=(w+)/)_.pop()
 we only match what's necessary here and no effort to match the
unnecessary words.
 The ugly one could be 

_/(?:youtube.com/S*(?:(?:/e(?:mbed))?/|watch?(?:S*?&?v=))|youtu.be/)([a-zA-Z0-
9_-]{6,11})/g_; 

2: your expression should be good at doing a single task
 We could split a complicated string into several trunks of simple and
easy manipulated string which means our regex could be more simple,
clear and easy to understand. As a system administrator, we're always
dealing with the problems with similarity patterns. If we can re-use the
snippet of code wrote before in our current project, it save us a large
sum of time. 

OTHER INTERESTING QUESTIONS FROM THE AUDIENCES
1: How to scale the regex in production environment? E.g. an ugly regex
in the loop statement like this:_
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*_
_ | "(?:[x01-x08x0bx0cx0e-x1fx21x23-x5bx5d-x7f]_
_ | \[x01-x09x0bx0cx0e-x7f])*")_
_@
(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?_
_ | [(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}_
_ (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:_
_ (?:[x01-x08x0bx0cx0e-x1fx21-x5ax53-x7f]_
_ | \[x01-x09x0bx0cx0e-x7f])+)_
_ ])_
The regex is O(logn) function, so it's better to write a simple and
independent regex to deal with a piece of string which can eliminate
unnecessary regex matching and save time.
2: What the order of translate a string into an array of string via
regex? E.g. how about this regex:_ /.+://.+/.+v?=(w+)/_
The sequence of translate is something like a stack. 
 ['https://youtube.com/watch?v=dQw4w9WgXcQ', // index 0
 'dQw4w9WgXcQ'] // index 1 

REFERENCE:
Event slides: http://moimikey.github.io/a-gentle-intro-to-regex/#/
github: https://github.com/moimikey/a-gentle-intro-to-regex
Challenge: https://gist.github.com/jdaudier/743d2b56091e688702d8 

Wen Zhang
CWID: 10402152
Contact:201-565-6871 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.stevens.edu/pipermail/cs615asa/attachments/20150502/f06e161d/attachment.html>