Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
a little parsing challenge ☺
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  20 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Xah Lee  
View profile  
 More options Jul 17 2011, 3:47 am
Newsgroups: comp.lang.lisp, comp.lang.python, comp.lang.perl.misc, comp.lang.ruby, comp.lang.scheme
From: Xah Lee <xah...@gmail.com>
Date: Sun, 17 Jul 2011 00:47:42 -0700 (PDT)
Local: Sun, Jul 17 2011 3:47 am
Subject: a little parsing challenge ☺
2011-07-16

folks, this one will be interesting one.

the problem is to write a script that can check a dir of text files
(and all subdirs) and reports if a file has any mismatched matching
brackets.

• The files will be utf-8 encoded (unix style line ending).

• If a file has mismatched matching-pairs, the script will display the
file name, and the  line number and column number of the first
instance where a mismatched bracket occures. (or, just the char number
instead (as in emacs's “point”))

• the matching pairs are all single unicode chars. They are these and
nothing else: () {} [] “” ‹› «» 【】 〈〉 《》 「」 『』
Note that ‘single curly quote’ is not consider matching pair here.

• You script must be standalone. Must not be using some parser tools.
But can call lib that's part of standard distribution in your lang.

Here's a example of mismatched bracket: ([)], (“[[”), ((, 】etc. (and
yes, the brackets may be nested. There are usually text between these
chars.)

I'll be writing a emacs lisp solution and post in 2 days. Ι welcome
other lang implementations. In particular, perl, python, php, ruby,
tcl, lua, Haskell, Ocaml. I'll also be able to eval common lisp
(clisp) and Scheme lisp (scsh), Java. Other lang such as Clojure,
Scala, C, C++, or any others, are all welcome, but i won't be able to
eval it. javascript implementation will be very interesting too, but
please indicate which and where to install the command line version.

I hope you'll find this a interesting “challenge”. This is a parsing
problem. I haven't studied parsers except some Wikipedia reading, so
my solution will probably be naive. I hope to see and learn from your
solution too.

i hope you'll participate. Just post solution here. Thanks.

 Xah


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Raymond Hettinger  
View profile  
 More options Jul 17 2011, 5:48 am
Newsgroups: comp.lang.lisp, comp.lang.python, comp.lang.perl.misc, comp.lang.ruby, comp.lang.scheme
From: Raymond Hettinger <pyt...@rcn.com>
Date: Sun, 17 Jul 2011 02:48:42 -0700 (PDT)
Local: Sun, Jul 17 2011 5:48 am
Subject: Re: a little parsing challenge ☺
On Jul 17, 12:47 am, Xah Lee <xah...@gmail.com> wrote:

> i hope you'll participate. Just post solution here. Thanks.

http://pastebin.com/7hU20NNL

Raymond


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Robert Klemme  
View profile  
 More options Jul 17 2011, 9:20 am
Newsgroups: comp.lang.lisp, comp.lang.python, comp.lang.perl.misc, comp.lang.ruby, comp.lang.scheme
From: Robert Klemme <shortcut...@googlemail.com>
Date: Sun, 17 Jul 2011 15:20:25 +0200
Local: Sun, Jul 17 2011 9:20 am
Subject: Re: a little parsing challenge ☺
On 07/17/2011 11:48 AM, Raymond Hettinger wrote:

> On Jul 17, 12:47 am, Xah Lee<xah...@gmail.com>  wrote:
>> i hope you'll participate. Just post solution here. Thanks.

> http://pastebin.com/7hU20NNL

Ruby solution: https://gist.github.com/1087583

Kind regards

robert


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
mhenn  
View profile  
 More options Jul 17 2011, 9:55 am
Newsgroups: comp.lang.lisp, comp.lang.python, comp.lang.perl.misc, comp.lang.ruby, comp.lang.scheme
From: mhenn <michih...@hotmail.com>
Date: Sun, 17 Jul 2011 15:55:25 +0200
Local: Sun, Jul 17 2011 9:55 am
Subject: Re: a little parsing challenge ☺
Am 17.07.2011 15:20, schrieb Robert Klemme:

> On 07/17/2011 11:48 AM, Raymond Hettinger wrote:
>> On Jul 17, 12:47 am, Xah Lee<xah...@gmail.com>  wrote:
>>> i hope you'll participate. Just post solution here. Thanks.

>> http://pastebin.com/7hU20NNL

> Ruby solution: https://gist.github.com/1087583

I acutally don't know Ruby that well, but it looks like your program
recognizes "[(])" as correct although it is not, because you translate
"[(])" to "(())" (which is indeed correct, but does not resemble the
input correctly anymore).


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Thomas Jollans  
View profile  
 More options Jul 17 2011, 11:31 am
Newsgroups: comp.lang.lisp, comp.lang.python, comp.lang.perl.misc, comp.lang.ruby, comp.lang.scheme
From: Thomas Jollans <t...@jollybox.de>
Date: Sun, 17 Jul 2011 08:31:33 -0700 (PDT)
Local: Sun, Jul 17 2011 11:31 am
Subject: Re: a little parsing challenge ☺
On Jul 17, 9:47 am, Xah Lee <xah...@gmail.com> wrote:

I thought I'd have some fun with multi-processing:

https://gist.github.com/1087682


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Thomas Boell  
View profile  
 More options Jul 17 2011, 11:49 am
Newsgroups: comp.lang.lisp, comp.lang.python, comp.lang.perl.misc, comp.lang.ruby, comp.lang.scheme
From: Thomas Boell <tbo...@domain.invalid>
Date: Sun, 17 Jul 2011 17:49:11 +0200
Local: Sun, Jul 17 2011 11:49 am
Subject: Re: a little parsing challenge ☺
On Sun, 17 Jul 2011 02:48:42 -0700 (PDT)

Raymond Hettinger <pyt...@rcn.com> wrote:
> On Jul 17, 12:47 am, Xah Lee <xah...@gmail.com> wrote:
> > i hope you'll participate. Just post solution here. Thanks.

> http://pastebin.com/7hU20NNL

I'm new to Python. I think I'd have done it in a similar way (in any
language). Your use of openers/closers looks nice though. In the
initialization of openers, I guess you implicitly create a kind of
hash, right? Then the 'in' operator checks for the keys. That is elegant
because you have the openers and closers right next to each other, not
in separate lists.

But why do you enumerate with start=1? Shouldn't you start with index 0?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Robert Klemme  
View profile  
 More options Jul 17 2011, 12:01 pm
Newsgroups: comp.lang.lisp, comp.lang.python, comp.lang.perl.misc, comp.lang.ruby, comp.lang.scheme
From: Robert Klemme <shortcut...@googlemail.com>
Date: Sun, 17 Jul 2011 18:01:32 +0200
Local: Sun, Jul 17 2011 12:01 pm
Subject: Re: a little parsing challenge ☺
On 07/17/2011 03:55 PM, mhenn wrote:

> Am 17.07.2011 15:20, schrieb Robert Klemme:
>> On 07/17/2011 11:48 AM, Raymond Hettinger wrote:
>>> On Jul 17, 12:47 am, Xah Lee<xah...@gmail.com>   wrote:
>>>> i hope you'll participate. Just post solution here. Thanks.

>>> http://pastebin.com/7hU20NNL

>> Ruby solution: https://gist.github.com/1087583

> I acutally don't know Ruby that well, but it looks like your program
> recognizes "[(])" as correct although it is not, because you translate
> "[(])" to "(())" (which is indeed correct, but does not resemble the
> input correctly anymore).

Right you are.  The optimization breaks the logic.  Good catch!

Kind regards

        robert


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Robert Klemme  
View profile  
 More options Jul 17 2011, 12:54 pm
Newsgroups: comp.lang.lisp, comp.lang.python, comp.lang.perl.misc, comp.lang.ruby, comp.lang.scheme
From: Robert Klemme <shortcut...@googlemail.com>
Date: Sun, 17 Jul 2011 18:54:59 +0200
Local: Sun, Jul 17 2011 12:54 pm
Subject: Re: a little parsing challenge ☺
On 07/17/2011 06:01 PM, Robert Klemme wrote:

Turns out with a little possessiveness I can fix my original approach
which has the added benefit of not needing three passes through the file
(the two #tr's are obsolete now).

https://gist.github.com/1087583

Cheers

        robert


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Raymond Hettinger  
View profile  
 More options Jul 17 2011, 3:16 pm
Newsgroups: comp.lang.lisp, comp.lang.python, comp.lang.perl.misc, comp.lang.ruby, comp.lang.scheme
From: Raymond Hettinger <pyt...@rcn.com>
Date: Sun, 17 Jul 2011 12:16:13 -0700 (PDT)
Local: Sun, Jul 17 2011 3:16 pm
Subject: Re: a little parsing challenge ☺
On Jul 17, 8:49 am, Thomas Boell <tbo...@domain.invalid> wrote:

> But why do you enumerate with start=1? Shouldn't you start with index 0?

The problem specification says that the the char number should match
the emacs goto-char function which is indexed from one, not from
zero.  This is testable by taking the output of the program and
running it through emacs to see that the cursor gets moved exactly to
the location of the mismatched delimiter.

Raymond


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Xah Lee  
View profile  
 More options Jul 18 2011, 10:39 am
Newsgroups: comp.lang.lisp, comp.lang.python, comp.lang.perl.misc, comp.lang.ruby, comp.lang.scheme
From: Xah Lee <xah...@gmail.com>
Date: Mon, 18 Jul 2011 07:39:28 -0700 (PDT)
Local: Mon, Jul 18 2011 10:39 am
Subject: Re: a little parsing challenge ☺

On Jul 17, 12:47 am, Xah Lee <xah...@gmail.com> wrote:

> 2011-07-16

> folks, this one will be interesting one.

> the problem is to write a script that can check a dir of text files
> (and all subdirs) and reports if a file has any mismatched matching
> brackets.
> …

Ok, here's my solution (pasted at bottom). I haven't tried to make it
elegant or terse, yet, seeing that many are already much elegent than
i could possibly do so with my code.

my solution basically use a stack. (i think all of us are doing
similar) Here's the steps:

• Go thru the file char by char, find a bracket char.
• check if the one on stack is a matching opening char. If so remove
it. Else, push the current onto the stack.
• Repeat the above till end of file.
• If the stack is not empty, then the file got mismatched brackets.
Report it.
• Do the above on all files.

Many elegant solutions. Raymond Hettinger is very quick, posted a
solution only after a hour or so when i posted it. Many others are
very short, very nice. Thank you all for writing them. I haven't
studied them yet. I'll run them all and post a summary in 2 days. (i
have few thousands files to run this test thru, many of them have
mismatched brackets. So i have good data to test with.)

PS we still lack a perl, Scheme lisp, tcl, lua versions. These
wouldn't be hard and would be interesting to read.  If you are picking
up one of these lang, this would be a good exercise.  Haskell too. I
particularly would like to see a javascript version ran from command
line. Maybe somebody can put this exercise to Google folks ... they
are like the js gods.

also, now that we have these home-brewed code, how'd a parser expert
do it? Is it possible to make it even simpler by using some parser
tools? (have no idea what those lex yacc do, or modern incarnations)
I've also been thinking whether this can be done with Parsing
Expression Grammar. That would make the code semantics really elegant
(as opposed home-cooked stack logic).

 Xah

;; -*- coding: utf-8 -*-
;; 2011-07-15, Xah Lee
;; go thru a file, check if all brackets are properly matched.
;; e.g. good: (…{…}… “…”…)
;; bad: ( [)]
;; bad: ( ( )

(setq inputDir "~/web/xahlee_org/p/") ; must end in slash

(defvar matchPairs '() "a alist. For each air, the car is opening
char, cdr is closing char.")

(setq matchPairs '(
                   ("(" . ")")
                   ("{" . "}")
                   ("[" . "]")
                   ("“" . "”")
                   ("‹" . "›")
                   ("«" . "»")
                   ("【" . "】")
                   ("〈" . "〉")
                   ("《" . "》")
                   ("「" . "」")
                   ("『" . "』")
                   )
      )

(defvar searchRegex "" "regex string of all pairs to search.")
(setq searchRegex "")
(mapc
 (lambda (mypair) ""
   (setq searchRegex (concat searchRegex (regexp-quote (car mypair))
"|" (regexp-quote (cdr mypair)) "|") )
   )
 matchPairs)

(setq searchRegex (replace-regexp-in-string "|$" "" searchRegex t
t)) ; remove the ending “|”

(setq searchRegex (replace-regexp-in-string "|" "\\|" searchRegex t
t)) ; change | to \\| for regex “or” operation

(defun my-process-file (fpath)
  "process the file at fullpath fpath ..."
  (let (myBuffer (ii 0) myStack ξchar ξpos)

    (setq myStack '() ) ; each element is a vector [char position]
    (setq ξchar "")

    (setq myBuffer (get-buffer-create " myTemp"))
    (set-buffer myBuffer)
    (insert-file-contents fpath nil nil nil t)

    (goto-char 1)
    (while (search-forward-regexp searchRegex nil t)
      (setq ξpos (point)  )
      (setq ξchar (buffer-substring-no-properties ξpos (- ξpos 1))  )

      ;; (princ (format "-----------------------------\nfound char: %s
\n" ξchar) )

      (let ((isClosingCharQ nil) (matchedOpeningChar nil) )
        (setq isClosingCharQ (rassoc ξchar matchPairs))
        (when isClosingCharQ (setq matchedOpeningChar (car
isClosingCharQ) ) )

        ;; (princ (format "isClosingCharQ is: %s\n" isClosingCharQ) )
        ;; (princ (format "matchedOpeningChar is: %s\n"
matchedOpeningChar) )

        (if
            (and
             (car myStack) ; not empty
             (equal (elt (car myStack) 0) matchedOpeningChar )
             )
            (progn
              ;; (princ (format "matched this bottom item on stack: %s
\n" (car myStack)) )
              (setq myStack (cdr myStack) )
              )
          (progn
            ;; (princ (format "did not match this bottom item on
stack: %s\n" (car myStack)) )
            (setq myStack (cons (vector ξchar ξpos) myStack) ) )
          )
        )
      ;; (princ "current stack: " )
      ;; (princ myStack )
      ;; (terpri )
      )

    (when (not (equal myStack nil))
      (princ "Error file: ")
      (princ fpath)
      (print (car myStack) )
      )
    (kill-buffer myBuffer)
    ))

;; (require 'find-lisp)

(let (outputBuffer)
  (setq outputBuffer "*xah match pair output*" )
  (with-output-to-temp-buffer outputBuffer
    (mapc 'my-process-file (find-lisp-find-files inputDir "\\.html$"))
    (princ "Done deal!")
    )
  )


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "a little parsing challenge ?" by s...@netherlands.com
s...@netherlands.com  
View profile  
 More options Jul 18 2011, 3:34 pm
Newsgroups: comp.lang.lisp, comp.lang.python, comp.lang.perl.misc, comp.lang.ruby, comp.lang.scheme
From: s...@netherlands.com
Date: Mon, 18 Jul 2011 12:34:45 -0700
Local: Mon, Jul 18 2011 3:34 pm
Subject: Re: a little parsing challenge ?

On Sun, 17 Jul 2011 00:47:42 -0700 (PDT), Xah Lee <xah...@gmail.com> wrote:
>2011-07-16

>folks, this one will be interesting one.

>the problem is to write a script that can check a dir of text files
>(and all subdirs) and reports if a file has any mismatched matching
>brackets.

[snip]
>i hope you'll participate. Just post solution here. Thanks.

I have to hunt for a job so I'm not writing a solution for you.
Here is a thin regex framework that may get you started.

-sln

---------------------

use strict;
use warnings;

 my @samples = qw(
  A98(y[(np)r]x)tp[kk]a.exeb
  A98(y[(np)r]x)tp[kk]a}.exeb
  A98(‹ynprx)tpk›ka.mpeg
  ‹A98(ynprx)tpk›ka
  “A9«8(yn«pr{{[g[x].}*()+}»)tpkka».”
  “A9«8(yn«pr{{[g[x].]}*()+}»)tpkka».”
  “A9«8(yn«pr»)tpkka».”
  “A9«8(yn«pr»)»”t(()){}[a[b[d]{}]pkka.]“«‹“**^”{[()]}›»”
  “A9«8(yn«pr»)”t(()){}[a[b[d]{}]pkka.]“«‹“**^”{[()]}›»”
 );

 my $regex = qr/

  ^ (?&FileName) $

  (?(DEFINE)

      (?<Delim>    
            \( (?&Content) \)
          | \{ (?&Content) \}
          | \[ (?&Content) \]
          | \“ (?&Content) \”
          | \‹ (?&Content) \›
          | \« (?&Content) \»
             # add more here ..
      )

      (?<Content>
           (?:  (?> [^(){}\[\]“”‹›«»]+ ) # add more here ..
              | (?&Delim)
           )*
      )

      (?<FileName>
           (?&Content)
      )
    )
 /x;

 for (@samples)
 {
    print "$_ - ";
    if ( /$regex/ ) {
       print "passed \n";
    }
    else {
       print "failed \n";
    }
 }

__END__

Output:

A98(y[(np)r]x)tp[kk]a.exeb - passed
A98(y[(np)r]x)tp[kk]a}.exeb - failed
A98(‹ynprx)tpk›ka.mpeg - failed
‹A98(ynprx)tpk›ka - passed
“A9«8(yn«pr{{[g[x].}*()+}»)tpkka».” - failed
“A9«8(yn«pr{{[g[x].]}*()+}»)tpkka».” - passed
“A9«8(yn«pr»)tpkka».” - passed
“A9«8(yn«pr»)»”t(()){}[a[b[d]{}]pkka.]“«‹“**^”{[()]}›»” - passed
“A9«8(yn«pr»)”t(()){}[a[b[d]{}]pkka.]“«‹“**^”{[()]}›»” - failed


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "a little parsing challenge ☺" by Robert Klemme
Robert Klemme  
View profile  
 More options Jul 20 2011, 2:23 am
Newsgroups: comp.lang.lisp, comp.lang.python, comp.lang.perl.misc, comp.lang.ruby, comp.lang.scheme
From: Robert Klemme <shortcut...@googlemail.com>
Date: Wed, 20 Jul 2011 08:23:09 +0200
Local: Wed, Jul 20 2011 2:23 am
Subject: Re: a little parsing challenge ☺
On 18.07.2011 16:39, Xah Lee wrote:

Small correction: my solution works differently (although internally the
regexp engine will roughly do the same).  So, my approach summarized

- traverse a directory tree
- for each found item of type "file"
-    read the whole content
-    throw it at a regexp which is anchored at the beginning
      and does the recursive parsing
-    report file if the match is shorter than the file

Note: special feature for recursive matching is used which Perl's regexp
engine likely can do as well but many others don't.

Cheers

        robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Mark Tarver  
View profile  
 More options Jul 20 2011, 1:43 am
Newsgroups: comp.lang.lisp, comp.lang.python, comp.lang.perl.misc, comp.lang.ruby, comp.lang.scheme
From: Mark Tarver <dr.mtar...@gmail.com>
Date: Tue, 19 Jul 2011 22:43:42 -0700 (PDT)
Local: Wed, Jul 20 2011 1:43 am
Subject: Re: a little parsing challenge ☺
On Jul 17, 8:47 am, Xah Lee <xah...@gmail.com> wrote:

Parsing technology based on BNF enables an elegant solution.  First
take a basic bracket balancing program which parenthesises the
contents of the input.  e.g. in Shen-YACC

(defcc <br>
   "(" <br> ")" <br$> := [<br> | <br$>];
   <item> <br>;
   <e> := [];)

(defcc <br$>
  <br>;)

(defcc <item>
  -*- := (if (element? -*- ["(" ")"]) (fail) [-*-]);)

Given (compile <br> ["(" 1 2 3 ")" 4]) the program produces [[1 2 3]
4]. When this program is used to parse the input, whatever residue is
left indicates where the parse has failed.  In Shen-YACC

(define tellme
  Stuff -> (let Br (<br> (@p Stuff []))
                Residue (fst Br)
                (if (empty? Residue)
                    (snd Br)
                    (error "parse failure at position ~A~%"
                          (- (length Stuff) (length Residue))))))

e.g.

(tellme ["(" 1 2 3 ")" "(" 4])
parse failure at position 5

(tellme ["(" 1 2 3 ")" "(" ")" 4])
[[1 2 3] [] 4]

The extension of this program to the case described is fairly simple.
Qi-YACC is very similar.

Nice problem.

I do not have further time to correspond right now.

Mark


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Xah Lee  
View profile  
 More options Jul 20 2011, 6:31 am
Newsgroups: comp.lang.lisp, comp.lang.python, comp.lang.perl.misc, comp.lang.ruby, comp.lang.scheme
From: Xah Lee <xah...@gmail.com>
Date: Wed, 20 Jul 2011 03:31:24 -0700 (PDT)
Local: Wed, Jul 20 2011 6:31 am
Subject: Re: a little parsing challenge ☺
i've just cleaned up my elisp code and wrote a short elisp tutorial.

Here:

〈Emacs Lisp: Batch Script to Validate Matching Brackets〉
http://xahlee.org/emacs/elisp_validate_matching_brackets.html

plain text version follows. Please let me know what you think.

am still working on going thru all code in other langs. Will get to
the ruby one, and that perl regex, and the other fixed python ones.
(possibly also the 2 common lisp codes but am not sure they are
runnable as is or just some non-working showoff. lol)

===============================================
Emacs Lisp: Batch Script to Validate Matching Brackets

Xah Lee, 2011-07-19

This page shows you how to write a elisp script that checks thousands
of files for mismatched brackets.

----------------------------------------------------------------
The Problem

------------------------------------------------
Summary

I have 5 thousands files containing many matching pairs. I want to to
know if any of them contains mismatched brackets.

------------------------------------------------
Detail

The matching pairs includes these: () {} [] “” ‹› «» 〈〉 《》 【】 〖〗 「」
『』.

The program should be able to check all files in a dir, and report any
file that has mismatched bracket, and also indicate the line number or
positon where a mismatch occurs.

For those curious, if you want to know what these brackets are, see:

    • Syntax Design: Use of Unicode Matching Brackets as Specialized
Delimiters
    • Intro to Chinese Punctuation with Computer Language Syntax
Perspectives

For other notes and conveniences about dealing with brackets in emacs,
see:

    • Emacs: Defining Keys to Navigate Brackets
    • “extend-selection” at A Text Editor Feature: Extend Selection by
Semantic Unit
    • “select-text-in-quote” at Suggestions on Emacs's mark-word
Command

----------------------------------------------------------------
Solution

Here's outline of steps.

    • Go thru the file char by char, find a bracket char.
    • Check if the one on stack is a matching opening char. If so
remove it. Else, push the current onto the stack.
    • Repeat the above till no more bracket char in the file.
    • If the stack is not empty, then the file got mismatched
brackets. Report it.
    • Do the above on all files.

Here's some interesting use of lisp features to implement the above.

------------------------------------------------
Define Matching Pair Chars as “alist”

We begin by defining the chars we want to check, as a “association
list” (aka “alist”). Like this:

(setq matchPairs '(
                   ("(" . ")")
                   ("{" . "}")
                   ("[" . "]")
                   ("“" . "”")
                   ("‹" . "›")
                   ("«" . "»")
                   ("【" . "】")
                   ("〖" . "〗")
                   ("〈" . "〉")
                   ("《" . "》")
                   ("「" . "」")
                   ("『" . "』")
                   )
      )

If you care only to check for curly quotes, you can remove elements
above. This is convenient because some files necessarily have
mismatched pairs such as the parenthesis, because that char is used
for many non-bracketing purposes (e.g. ASCII smiley).

A “alist” in lisp is basically a list of pairs (called key and value),
with the ability to search for a key or a value. The first element of
a pair is called its key, the second element is its value. Each pair
is a “cons”, like this: (cons mykey myvalue), which can also be
written using this syntax: (mykey . myvalue) for more easy reading.

The purpose of lisp's “alist” is similar to Python's dictionary or
Pretty Home Page's array. It is also similar to hashmap, except that
alist can have duplicate keys, can search by values, maintains order,
and alist is not intended for massive number of elements. Elisp has a
hashmap datatype if you need that. (See: Emacs Lisp Tutorial: Hash
Table.)

(info "(elisp) Association Lists")

------------------------------------------------
Generate Regex String from alist

To search for a set of chars in emacs, we can read the buffer char-by-
char, or, we can simply use “search-forward-regexp”. To use that,
first we need to generate a regex string from our matchPairs alist.

First, we defines/declare the string. Not a necessary step, but we do
it for clarity.

(setq searchRegex "")

Then we go thru the matchPairs alist. For each pair, we use “car” and
“cdr” to get the chars and “concat” it to the string. Like this:

(mapc
 (lambda (mypair) ""
   (setq searchRegex (concat searchRegex (regexp-quote (car mypair))
"|" (regexp-quote (cdr mypair)) "|") )
   )
 matchPairs)

Then we remove the ending “|”.

(setq searchRegex (substring searchRegex 0 -1)) ; remove the ending
“|”

Then, change | it to \\|. In elisp regex, the | is literal. The “regex
or” is \|. And if you are using regex in elisp, elisp does not have a
special regex string syntax, it only understands normal strings. So,
to feed to regex \|, you need to espace the first backslash. So, your
regex needs to have \\|. Here's how we do it:

(setq searchRegex (replace-regexp-in-string "|" "\\|" searchRegex t
t)) ; change | to \\| for regex “or” operation

You could shorten the above into just 2 lines by using \\| in the
“mapc” step and not as a extra step of replacing | by \\|.

See also: emacs regex tutorial.

------------------------------------------------
Implement Stack Using Lisp List

Stack is done using lisp's list. e.g. '(1 2 3). The bottom of stack is
the first element. To add to the stack, do it like this: (setq mystack
(cons newitem mystack)). To remove a item from stack is this: (setq
mystack (cdr mystack)). The stack begin as a empty list: '().

For each element in the stack, we need the char and also its position,
so that we can report the position if the file does have mismatched
pairs.

We use a vector as entries for the stack. Each entry is like this:
(vector char pos). (See: Emacs Lisp Tutorial: List & Vector.)

Here's how to fetch a char from stack bottom, check if current char
matches, push to stack, pop from stack.

; check if current char is a closing char and is in our match pairs
alist.
; use “rassoc” to check alist's set of “values”.
; It returns the first key/value pair found, or nil
(rassoc char matchPairs)

; add to stack
(setq myStack (cons (vector char pos) myStack) )

; pop stack
(setq myStack (cdr myStack) )

------------------------------------------------
Complete Code

Here's the complete code.

;; -*- coding: utf-8 -*-
;; 2011-07-15
;; go thru a file, check if all brackets are properly matched.
;; e.g. good: (…{…}… “…”…)
;; bad: ( [)]
;; bad: ( ( )

(setq inputFile "xx_test_file.txt" ) ; a test file.
(setq inputDir "~/web/xahlee_org/") ; must end in slash

(defvar matchPairs '() "a alist. For each pair, the car is opening
char, cdr is closing char.")
(setq matchPairs '(
                   ("(" . ")")
                   ("{" . "}")
                   ("[" . "]")
                   ("“" . "”")
                   ("‹" . "›")
                   ("«" . "»")
                   ("【" . "】")
                   ("〖" . "〗")
                   ("" . "")
                   ("" . "")
                   ("「" . "」")
                   ("『" . "』")
                   )
      )

(defvar searchRegex "" "regex string of all pairs to search.")
(setq searchRegex "")
(mapc
 (lambda (mypair) ""
   (setq searchRegex (concat searchRegex (regexp-quote (car mypair))
"|" (regexp-quote (cdr mypair)) "|") )
   )
 matchPairs)

(setq searchRegex (replace-regexp-in-string "|$" "" searchRegex t
t)) ; remove the ending “|”

(setq searchRegex (replace-regexp-in-string "|" "\\|" searchRegex t
t)) ; change | to \\| for regex “or” operation

(defun my-process-file (fpath)
  "process the file at fullpath fpath ..."
  (let (myBuffer myStack ξchar ξpos)

    (setq myStack '() ) ; each element is a vector [char position]
    (setq ξchar "") ; the current char found

    (when t
      ;; (not (string-match "/xx" fpath)) ; in case you want to skip
certain files

      (setq myBuffer (get-buffer-create " myTemp"))
      (set-buffer myBuffer)
      (insert-file-contents fpath nil nil nil t)

      (goto-char 1)
      (setq case-fold-search t)
      (while (search-forward-regexp searchRegex nil t)
        (setq ξpos (point)  )
        (setq ξchar (buffer-substring-no-properties ξpos (- ξpos
1))  )

        ;; (princ (format "-----------------------------\nfound char:
%s\n" ξchar) )

        (let ((isClosingCharQ nil) (matchedOpeningChar nil) )
          (setq isClosingCharQ (rassoc ξchar matchPairs))
          (when isClosingCharQ (setq matchedOpeningChar (car
isClosingCharQ) ) )

          ;; (princ (format "isClosingCharQ is: %s\n"
isClosingCharQ) )
          ;; (princ (format "matchedOpeningChar is: %s\n"
matchedOpeningChar) )

          (if
              (and
               (car myStack) ; not empty
               (equal (elt (car myStack) 0) matchedOpeningChar )
               )
              (progn
                ;; (princ (format "matched this bottom item on stack:
%s\n" (car myStack)) )
                (setq myStack (cdr myStack) )
                )
            (progn
              ;; (princ (format "did not match this bottom item on
stack: %s\n" (car myStack)) )
              (setq myStack (cons (vector ξchar ξpos) myStack) ) )
            )
          )
        ;; (princ "current stack: " )
        ;; (princ myStack )
        ;; (terpri )
        )

      (when (not (equal myStack nil))
        (princ "Error file: ")
        (princ fpath)
        (print (car myStack) )
        )
      (kill-buffer myBuffer)
      )
    ))

(require 'find-lisp)

(let (outputBuffer)
  (setq outputBuffer "*xah match pair output*" )
  (with-output-to-temp-buffer outputBuffer
    ;;
...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Uri Guttman  
View profile  
 More options Jul 20 2011, 12:31 pm
Newsgroups: comp.lang.lisp, comp.lang.python, comp.lang.perl.misc, comp.lang.ruby, comp.lang.scheme
From: "Uri Guttman" <u...@StemSystems.com>
Date: Wed, 20 Jul 2011 12:31:05 -0400
Local: Wed, Jul 20 2011 12:31 pm
Subject: Re: a little parsing challenge ☺

a better parsing challenge. how can you parse usenet to keep this troll
from posting on the wrong groups on usenet? first one to do so, wins the
praise of his peers. 2nd one to do it makes sure the filter stays in
place. all the rest will be rewarded by not seeing the troll anymore.

anyone who actually engages in a thread with the troll should parse
themselves out of existance.

uri

--
Uri Guttman  --  uri AT perlhunter DOT com  ---  http://www.perlhunter.com --
------------  Perl Developer Recruiting and Placement Services  -------------
-----  Perl Code Review, Architecture, Development, Training, Support -------


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
rusi  
View profile  
 More options Jul 20 2011, 1:30 pm
Newsgroups: comp.lang.lisp, comp.lang.python, comp.lang.perl.misc, comp.lang.ruby, comp.lang.scheme
From: rusi <rustompm...@gmail.com>
Date: Wed, 20 Jul 2011 10:30:03 -0700 (PDT)
Local: Wed, Jul 20 2011 1:30 pm
Subject: Re: a little parsing challenge ☺
On Jul 20, 9:31 pm, "Uri Guttman" <u...@StemSystems.com> wrote:

> a better parsing challenge. how can you parse usenet to keep this troll
> from posting on the wrong groups on usenet? first one to do so, wins the
> praise of his peers. 2nd one to do it makes sure the filter stays in
> place. all the rest will be rewarded by not seeing the troll anymore.

> anyone who actually engages in a thread with the troll should parse
> themselves out of existance.

Goedelian paradox: Is this thread in existence?

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Randal L. Schwartz  
View profile  
 More options Jul 20 2011, 3:06 pm
Newsgroups: comp.lang.lisp, comp.lang.python, comp.lang.perl.misc, comp.lang.ruby, comp.lang.scheme
From: mer...@stonehenge.com (Randal L. Schwartz)
Date: Wed, 20 Jul 2011 12:06:18 -0700
Local: Wed, Jul 20 2011 3:06 pm
Subject: Re: a little parsing challenge ☺

>>>>> "Uri" == Uri Guttman <u...@StemSystems.com> writes:

Uri> a better parsing challenge. how can you parse usenet to keep this troll
Uri> from posting on the wrong groups on usenet? first one to do so, wins the
Uri> praise of his peers. 2nd one to do it makes sure the filter stays in
Uri> place. all the rest will be rewarded by not seeing the troll anymore.

Uri> anyone who actually engages in a thread with the troll should parse
Uri> themselves out of existance.

Since the newsgroups: line is not supposed to have spaces in it, that
makes both his post and your post invalid.  Hence, filter on invalid
posts.

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<mer...@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.posterous.com/ for Smalltalk discussion


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jason Earl  
View profile  
 More options Jul 20 2011, 4:57 pm
Newsgroups: comp.lang.lisp, comp.lang.python, comp.lang.perl.misc, comp.lang.ruby, comp.lang.scheme
From: Jason Earl <je...@notengoamigos.org>
Date: Wed, 20 Jul 2011 14:57:25 -0600
Local: Wed, Jul 20 2011 4:57 pm
Subject: Re: a little parsing challenge ☺

On Wed, Jul 20 2011, Randal L. Schwartz wrote:
>>>>>> "Uri" == Uri Guttman <u...@StemSystems.com> writes:

> Uri> a better parsing challenge. how can you parse usenet to keep this troll
> Uri> from posting on the wrong groups on usenet? first one to do so, wins the
> Uri> praise of his peers. 2nd one to do it makes sure the filter stays in
> Uri> place. all the rest will be rewarded by not seeing the troll anymore.

> Uri> anyone who actually engages in a thread with the troll should parse
> Uri> themselves out of existance.

> Since the newsgroups: line is not supposed to have spaces in it, that
> makes both his post and your post invalid.  Hence, filter on invalid
> posts.

I suspect that the spaces you are seeing are being added by Gnus.  I see
them too (and I see them in your post as well), but they disappear when
I use "C-u g" and view the source of the posts.

Jason


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Xah Lee  
View profile  
 More options Jul 21 2011, 11:36 am
Newsgroups: comp.lang.lisp, comp.lang.python, comp.lang.perl.misc, comp.lang.ruby, comp.lang.scheme
From: Xah Lee <xah...@gmail.com>
Date: Thu, 21 Jul 2011 08:36:18 -0700 (PDT)
Local: Thurs, Jul 21 2011 11:36 am
Subject: Re: a little parsing challenge ☺
Ok. Here's a preliminary report.

〈Lisp, Python, Perl, Ruby … Code to Validate Matching Brackets〉
http://xahlee.org/comp/validate_matching_brackets.html

it's taking too much time to go thru.

right now, i consider only one valid code, by Raymond Hettinger (with
minor edit from others).

right now, there's 2 other possible correct solution. One by Robert
Klemme but requires ruby19 but i only have ruby18x. One by Thomas
Jollans in Python 3 but didn't run on my machine perhaps due to some
unix/Windows issue, yet to be done.

the other 3 or 4 seems to be incomplete or just suggestion of ideas.

i haven't done extensive testing on my own code neither.
I'll revisit maybe in a few days.

Feel free to grab my report and make it nice. If you would like to fix
your code, feel free to email.

 Xah

On Jul 21, 7:26 am, Ian Kelly <ian.g.ke...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Xah Lee  
View profile  
 More options Jul 21 2011, 2:53 pm
Newsgroups: comp.lang.lisp, comp.lang.python, comp.lang.perl.misc, comp.lang.ruby, comp.lang.scheme
From: Xah Lee <xah...@gmail.com>
Date: Thu, 21 Jul 2011 11:53:26 -0700 (PDT)
Local: Thurs, Jul 21 2011 2:53 pm
Subject: Re: a little parsing challenge ☺

On Jul 21, 9:43 am, pyt...@bdurham.com wrote:

> Xah,

> 1. Is the following string considered legal?

> [ { ( ] ) }

> Note: Each type of brace opens and closes in the proper sequence. But
> inter-brace opening and closing does not make sense.

nu!

> Or must a closing brace always balance out with the most recent opening
> brace like so?

> [ { ( ) } ]

yeah!

> 2. If there are multiple unclosed braces at EOF, is the answer you're
> looking for the position of the first open brace that hasn't been closed
> out yet?

well, as many pointed out, i really haven't thought it out well.

originally, i just want to know the position of a un-matched char.

i haven't taken the time to think about what really should be the
desired behavior. For me, the problem started because i wanted to use
the script to check my 5k html files, in particular, classic novels
that involves double curly quotes and french quotes. So, the desired
behavior is one based on the question of what would best for the user
to see in order to correct a bracket mismatch error in a file. (which,
can get quite complex for nested syntax, because, usually, once you
have one missed, it's all hell from there. I think this is similar to
the problem when a compiler/interpreter encounters a bad syntax in
source code, and thus the poplar situation where error code of
computer programs are hard to understand...)

but anyway, just for this exercise, the requirement needn't be
stringent. I still think that at least the reported position should be
a matching char in the file. (and if we presume this, then only my
code works. LOL)

PS this is a warmup problem for writing a HTML tag validator. I looked
high and lo in past years, but just couldn't find a script that does
simple validation in batch. The w3c one is based on SGML, really huge
amount of un-unstandable irregular historical baggage. XML lexical
validator is much closer, but still not regular. I simply wanted one
just like the match-pair validator in our problem, except the opening
char is not a single char but string of the form <xyz …> and the
*matching* closing one is of the form </xyz>, and with just one
exception: when a tag has “/>” in ending such as <br/> then it is
skipped (i.e. not considered as opening or closing).

I'll be writing this soon in elisp… since i haven't studied parsers, i
had hopes that parser expert would show some proper parser solutions…
in particular i think such can be expressed in Parsing Expression
Grammar in just a few lines… but so far no deity came forward to show
the light. lol

getting ranty… it's funny, somehow the tech geekers all want regex to
solve the problem. Regex, regex, regex, a 40 years old deviant bastard
that by some twist of luck became a tool for matching text patterns.
One bloke came forward to show-off a perl regex obfuscation. That's
like, lol. But it might be good for the lulz if his code is actually
complete and worked. Then, you have a few who'd nonchalantly remark
“O, you just need push-down automata”. LOL, unless they show actual
working code, its Automata their asses.

folks, don't get angry with me. I'm a learner. I'm curious. I always
am eager to learn. And there's always things we can learn. Don't get
into a fit and do the troll dance in a pit with me. Nobody's gonna
give a shit if you think u knew it all. If u are not the master of one
thousand and one languages yet, you can learn with me. ☺ troll!!!!

 Xah


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »