regular expressions how-to: recognize lines?

67 views
Skip to first unread message

Peter Mancini

unread,
May 29, 2013, 12:18:45 PM5/29/13
to clo...@googlegroups.com
(def testcase "Line 1\nLine 2\nTarget Line\nLine 4\nNot a target line")
(println testcase)
(re-seq #"(?i)^target" testcase)
(re-seq #"(?i)target" testcase)

Line 3 finds nothing. It should find the third line, first word. Ultimately I'd like #"(?i)^Target.*$" to work in finding the entire line. I am confused why this is failing. Where do I find all the switches? I only found (?i) because of comments. Where is it in the documentation? Thanks!

P.S. I did read the java documentation and that wasn't much help.

Alan Thompson

unread,
May 29, 2013, 12:28:19 PM5/29/13
to clo...@googlegroups.com
I am new to re-seq, but this is progress: 

(re-seq #"(?i)target.*" testcase)
("Target Line" "target line")



--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Alan Thompson

unread,
May 29, 2013, 12:42:57 PM5/29/13
to clo...@googlegroups.com
Here it is:

(def lines  (clojure.string/split  "Line 1\nLine 2\nTarget Line\nLine 4\nNot a target line" #"\n")

user=>  (doseq [line lines]
  #_=>    (if-let [match (re-find #"(?i)^targ.*" line)]
  #_=>      (println match)))
Target Line
nil
user=>

I normally find it easiest to process input like this one line at a time.  You must then the ".*" to the search pattern to match the remainder of the line if you want that returned (try leaving it off to see the difference).

Alan

Peter Mancini

unread,
May 29, 2013, 12:43:37 PM5/29/13
to clo...@googlegroups.com
There has to be some way to turn on line recognition. Its a basic function of regex. I know the string has lines, I can even use clojure.string/split-lines on it. I shouldn't have to do that and map against it. It should be built into the regular expression system. I'm certain my problem is ignorance and not some oversight in the design of the system.

Alan Thompson

unread,
May 29, 2013, 12:46:21 PM5/29/13
to clo...@googlegroups.com
A
lso, I saw what you meant that the docs are not quite complete.  Perhaps you would like to add this example to ClojureDocs?  http://clojuredocs.org/clojure_core/clojure.core/re-find

Also, note that it re-seq is aimed to finding a sequence of results within a single string.  For this problem, you really want to process a sequence of strings (broken up by line) and just filter out the ones you want to keep.  So, re-find seems like the right tool here.

Alan

Neale Swinnerton

unread,
May 29, 2013, 12:45:55 PM5/29/13
to clo...@googlegroups.com
You need to pass the multiline 'm' flag to the regex. some variant of:

(def testcase "Line 1\nLine 2\nTarget Line\nLine 4\nNot a target line")
(println testcase)
(re-seq #"(?im)^target" testcase)
(re-seq #"(?im)target" testcase)
#'user/testcase
Line 1
Line 2
Target Line
Line 4
Not a target line
nil
("Target")
("Target" "target")



Neale Swinnerton
{t: @sw1nn, w: sw1nn.com }


On Wed, May 29, 2013 at 5:43 PM, Peter Mancini <pe...@cicayda.com> wrote:
There has to be some way to turn on line recognition. Its a basic function of regex. I know the string has lines, I can even use clojure.string/split-lines on it. I shouldn't have to do that and map against it. It should be built into the regular expression system. I'm certain my problem is ignorance and not some oversight in the design of the system.

--

Ben Wolfson

unread,
May 29, 2013, 12:46:31 PM5/29/13
to clo...@googlegroups.com
user> (def testcase "Line 1\nLine 2\nTarget Line\nLine 4\nNot a target line")
#'user/testcase
user> (re-seq #"(?im)^target" testcase)
("Target")
user> (re-seq #"(?im)target" testcase)
("Target" "target")
user>

You need to enable multiline mode:

MULTILINE

public static final int MULTILINE
Enables multiline mode.

In multiline mode the expressions ^ and $ match just after or just before, respectively, a line terminator or the end of the input sequence. By default these expressions only match at the beginning and the end of the entire input sequence.

Multiline mode can also be enabled via the embedded flag expression (?m).





On Wed, May 29, 2013 at 9:18 AM, Peter Mancini <pe...@cicayda.com> wrote:
--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
Ben Wolfson
"Human kind has used its intelligence to vary the flavour of drinks, which may be sweet, aromatic, fermented or spirit-based. ... Family and social life also offer numerous other occasions to consume drinks for pleasure." [Larousse, "Drink" entry]

Peter Mancini

unread,
May 29, 2013, 12:48:51 PM5/29/13
to clo...@googlegroups.com
Winner Winner Chicken Dinner!

Thanks. Where do I find that? Much appreciated!

Alan Thompson

unread,
May 29, 2013, 12:50:55 PM5/29/13
to clo...@googlegroups.com

Look under "Special Constructs", the 3rd entry.  Still pretty terse!
Alan

Alan Thompson

unread,
May 29, 2013, 12:57:55 PM5/29/13
to clo...@googlegroups.com
OK, I even have the O'Reilly book mentioned in the javadoc, "Mastering Regular Expressions".  Looking in the index under "multiline mode" says, "see Perl, modifier: multi-line mode (\m)".

So, it is certainly possible do use without splitting first, but I think composing two simple functions (re-find ... (re-split ...)) will be much easier for readers of the code to understand than the complicated do-it-all-in-one-regex method.  Also, this is what people are used to from unix grep, which operates on each line of a file separately.  

Alan
Reply all
Reply to author
Forward
0 new messages