printing specific fields from a file

26 views
Skip to first unread message

Sam

unread,
Jul 20, 2016, 7:54:50 PM7/20/16
to scala-user
While learning, I'm trying to read a web log and extract few fields from it. The web log will be like below

147.172.225.10 - 16401 [16/Sep/2013:23:52:35 +0100] "GET /KBDOC-00057.html HTTP/1.0" 200 11761 "http://www.newbie.com"  "test F20L"
147.172.225.10 - 16401 [16/Sep/2013:23:52:35 +0100] "GET /theme.css HTTP/1.0" 200 12353 "http://www.newbie.com"  "test Mobile Browser Sorrento F20L"
23.53.29.101 - 32693 [16/Sep/2013:23:49:50 +0100] "GET /KBDOC-00035.html HTTP/1.0" 200 9337 "http://www.newbie.com"  "test Mobile Browser i3"

And I need to extract just the IP address and User id(3rd field) from the logs and print as

147.172.225.10/16401
147.172.225.10/16401
23.53.29.101/32693

I'm thinking of running a flatMap on each line splitting by ' ' and pick filed 1 and 3 and printing them with a / in between.
Could someone help me how and let me know if there is a better way to accomplish it. Thanks in advance !!

Kevin Wright

unread,
Jul 21, 2016, 4:31:02 AM7/21/16
to Sam, scala-user
val src = Source fromFile "<your file name>"
val splitLines = src.lines map (_ split " ")  // is an Iterator[Array[String]]
val output = splitLines map { fields => fields(0) + "/" + fields(2) } // is an Iterator[String]
output forEach println
src.close()

Just be wary of the iterators here.  Instead of attempting to load the entire file into memory, lines will be parsed on demand.  This is great for performance and memory pressure, but also means that you must be careful to only close the source after you’ve finished processing it.

You’ll also want to avoid using those iterators more than once, they’ll already be consumed after the first use…



--
You received this message because you are subscribed to the Google Groups "scala-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-user+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Kevin Wright
mail / hangouts / msn : kev.lee...@gmail.com
vibe / skype: kev.lee.wright

"My point today is that, if we wish to count lines of code, we should not regard them as "lines produced" but as "lines spent": the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger" ~ Dijkstra

Sam

unread,
Jul 22, 2016, 2:16:42 PM7/22/16
to scala-user, sam...@gmail.com
Kevin, 
Thanks so much for the detailed explanation !!
Reply all
Reply to author
Forward
0 new messages