Advanced Find

20 views
Skip to first unread message

C.H. Fred

unread,
May 12, 2022, 7:42:28 PM5/12/22
to sem...@googlegroups.com
Is there an advanced find macro that would let me type in two words near each other, same line or nearby lines?

-- 
Rick C. Hodgin

Guy Rouillier

unread,
May 12, 2022, 9:08:42 PM5/12/22
to sem...@googlegroups.com
Standard find macro will help you find two words near each other on same line - just use a regexp.  Doesn't work across newline boundaries, unfortunately.

Guy Rouillier
--

---
You received this message because you are subscribed to the Google Groups "SemWare TSE Pro text editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email to semware+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/semware/CAOtq5xrDVRE8TLzg22H-dUYrOZQHG%2BeneY84pUhmn4_SLMXA4g%40mail.gmail.com.

Carlo Hogeveen

unread,
May 12, 2022, 10:53:50 PM5/12/22
to sem...@googlegroups.com

 

I cobbled this together, which matches your specification:

https://ecarlo.nl/tse/index.html#FindAcrossLines

 

When using a regular expression like “word1.*word2” with options “gix”, then it finds the expression across lines if the found expression is at most 255 characters long. This matches your “near” requirement.

 

I did notice a small positioning error. That I will have to address at a later date.

 

 

 

C.H. Fred

unread,
May 13, 2022, 10:47:49 AM5/13/22
to sem...@googlegroups.com
Very nice.  I was able to search for SQL expressions that have things like SELECT.*HAVING.

Thank you, Carlo.

-- 
Rick C. Hodgin



--

---
You received this message because you are subscribed to the Google Groups "SemWare TSE Pro text editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email to semware+u...@googlegroups.com.

Carlo Hogeveen

unread,
May 13, 2022, 11:57:11 AM5/13/22
to sem...@googlegroups.com

 

Glad to hear you find it useful too.

 

I fixed known bugs and added several new features in v0.2.

 

https://ecarlo.nl/tse/index.html#FindAcrossLines

 

Overview:

  This tool works like TSE's Find command, but with these differences:

  - The search string:

    - It searches the search string across lines.

    - When searching it sees each line ending as a line feed character (LF, 10).

    - If the search options contain an "x", then in the search string "$" matches

      a line feed character, and "\s" is short for "[\x00-\x20]", which matches

      a white space character (here meaning a space, tab, carriage return, line

      feed or any other control character).

    - It marks the found string as a block.

  - The search options

    - Only the search options g, i, m, x, + and digits are allowed.

    - Default the search string matches a range of at most 255 characters.

    - In the search option you may add a different character range

      from 1 to MAXLINELEN - 1 (currently 31999 characters).

    - Or you may use the "m" to indicate the maximum search range.

 

 

 

knud van eeden

unread,
May 13, 2022, 1:03:27 PM5/13/22
to sem...@googlegroups.com
When I use v0.2 in my very large working file (more than 1 gigabytes)
then FindAcrossLines hangs.

Steps:

1. Clean 4.42.00 TSE

2. Loading the large 1 gigabytes file

3. menu > execute macro > FindAcrossLines.mac

4. 

E.g. searching for

Thomas.*FlatFile

then followed by choosing option

gix5000

5. I know it is there within that range, so should be found.

6. Instead it hangs TSE (at least does not return even after a long time waiting) and a killing of the TSE process is necessary.

with friendly greetings
Knud van Eeden

--

---
You received this message because you are subscribed to the Google Groups "SemWare TSE Pro text editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email to semware+u...@googlegroups.com.
To view this discussion on the web visit

knud van eeden

unread,
May 13, 2022, 1:04:39 PM5/13/22
to sem...@googlegroups.com
It is reproducible each time.


C.H. Fred

unread,
May 13, 2022, 1:08:30 PM5/13/22
to sem...@googlegroups.com
I had a similar slowdown on a 32MB file with 650,000 lines.  It does eventually come back, but takes a while.

On smaller files it's good.

-- 
Rick C. Hodgin


knud van eeden

unread,
May 13, 2022, 1:15:26 PM5/13/22
to sem...@googlegroups.com
===

> I had a similar slowdown on a 32MB file with 650,000 lines.  It does eventually come back, but takes a while.

But in the e.g. 30 times larger file (e.g. 1 gigabytes) with 15 million lines it hangs thus until e.g. Microsoft WIndows kicks in and informs no response or killing the TSE process manually.

===

> On smaller files it's good.

Yes, I tried e.g. in FindAcrossLines.s itself to search for

only.*range

then

gix1000

it finds it successfully by highlighting the found block starting with 'only' and ending with 'range' and all characters in between.



Carlo Hogeveen

unread,
May 13, 2022, 1:54:49 PM5/13/22
to sem...@googlegroups.com

Knud and Rick,

Yes, the algorithm, that FindAcrossLines currently uses, is slow, even more so when searching with a large character range.
From a technical viewpoint it makes sense, that in such a case the tool seems to hang for very large files.
It does not hang, it is just that slow.
From a user viewpoint that is not good of course.
Thanks for reporting this.
I will improve the algorithm at some later date.
I can tell you beforehand, that the tool will always be significantly slower than TSE’s built-in Find(), but methinks there is room for improvement.

Carlo



C.H. Fred

unread,
May 13, 2022, 2:36:31 PM5/13/22
to sem...@googlegroups.com
I think it's fantastic.  Most files I'll use this on are under 10,000 lines, and probably most under 2,000.

Thank you again!

-- 
Rick C. Hodgin


--

---
You received this message because you are subscribed to the Google Groups "SemWare TSE Pro text editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email to semware+u...@googlegroups.com.

knud van eeden

unread,
May 13, 2022, 6:13:18 PM5/13/22
to sem...@googlegroups.com
Advanced Find

There have been some attempts in the course of the years, e.g. I have written a 'concordance' algorithm after reading some books about NLP (Natural Language Processing), which collects N characters left and right of a given word and shows that in a vertical column.

Carlo has written before at least 1 implementation of multi-line search.

I also wrote at least one attempt, but certainly not with regular expression search, only regular search.

It is important that this regular expression search implementation should be a very fast algorithm, because multiline regular expression search is missing out of the box in TSE.

I have not looked at the source code at all also not now thus, except a very quick peek of the structure and seeing that it was relatively short but combining the information and the parameters received, this must be the algorithm:

===

Given is that TSE can only do regular expression search in text on 1 line and 1 line only.

So what is done is that an amount of characters in which to search is collected by concatenation and made a 1-liner of.

This output can e.g. be put as a 1-liner in a buffer.

That is also the origin of this 32000 character maximum, because that is the maximum single line length in current TSE 4.42.00.

So that is really a very smart idea indeed and is at the center of this algorithm. That is use a regular expression and convert the multiline text to a single line.

Then TSE can do its regular expression search, business as usual, apply the full power of that regular expression search on that 1-line, using the same regular search notation and syntax as TSE out of the box provides.

Thus an algorithm could be:

1. Given that your search for 'word1.*word2'.

2. Given that you search in e.g. 1000 characters

3. Extract the first word, e.g. word1, from this expression.

4. Then you use a while loop

WHILE ( first word found in the file or block )
 Extract the 1000 characters after that first word
 Goto a temporary buffer
 Paste and concatenate that 1000 characters as a 1 liner in that buffer
 Do a regular expression search 'word1.*word2' in that 1-liner in that buffer (this can give multiple hits)
 If found, store that line or show it (e.g. store e.g. a filename, linenumber, begincharacter and endcharacter)
ENDWHILE

5. After running show the 0 or more hits as found in the while loop.

6. This then made to run as fast as possible.

with friendly greetings
Knud van Eeden

===



Carlo



--

---
You received this message because you are subscribed to the Google Groups "SemWare TSE Pro text editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email to semware+unsub...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/semware/004101d866f2%2485889830%249099c890%24%40ecarlo.nl.

knud van eeden

unread,
May 13, 2022, 6:39:17 PM5/13/22
to sem...@googlegroups.com, S.E. Mitchell
Maybe also Sammy could have a go at an ultra-fast implementation of this 1-liner of 32000 characters idea because multi-line search in TSE is still missing out of the box.

It is a very important and at least useful to have building block thus.

It could then e.g. be added to the potpourri menu or even be added to the default TSE menu.


knud van eeden

unread,
May 13, 2022, 7:39:46 PM5/13/22
to sem...@googlegroups.com
1. This is a *very rough prototype*, certainly not completely finished, but the beginning is there.

2. There is no checking what so ever in this version.

3. Algorithm based on the idea of creating 1-liners then doing a regular expression search in that 1 liner.

4. It extracts e.g. all occurrences of 'Thomas' followed by 'Flatfile' in my 1 gigabyte working file with 15 million lines in a few seconds successfully.

5. You input the first word, then the second word(s), then the search option (e.g. 'gix') then the total amount of lines to search in after the first line.

6. So not working with characters after, but with lines after because (maybe much) faster.

7. All hits are collected in a buffer as 1-liners and output (that output can certainly be improved).

---

INTEGER PROC FNBlockSearchExpressionRegularLineMultiB( STRING in1S, STRING in2S, STRING searchOptionS, INTEGER lineTotalI, INTEGER buffer2I )
 INTEGER B = FALSE
 INTEGER buffer1I = 0
 IF ( NOT ( IsBlockInCurrFile() ) ) Warn( "Please mark a block" ) B = FALSE RETURN( B ) ENDIF // return from the current procedure if no block is marked
 PushPosition()
 buffer1I = CreateTempBuffer()
 PopPosition()
 PushPosition()
 PushBlock()
 GotoBlockBegin()
 WHILE ( LFind( in1S, "l" ) AND IsCursorInBlock() )
  PushPosition()
  PushBlock()
  UnMarkBlock()
  MarkStream()
  Down( lineTotalI )
  Copy()
  GotoBufferId( buffer1I )
  MarkLine( 1, NumLines() )
  DelBlock()
  Paste()
  DO lineTotalI TIMES
   EndLine()
   Right()
   JoinLine()
  ENDDO
  BegLine()
  IF LFind( Format( in1S, ".*", in2S ), searchOptionS )
   UpDateDisplay() // IF WaitForKeyPressed( 0 ) ENDIF // Activate if using a loop
   MarkLine( 1, NumLines() )
   Copy()
   GotoBufferId( buffer2I )
   EndFile()
   Paste()
  ENDIF
  PopBlock()
  PopPosition()
  NextChar()
 ENDWHILE
 PopPosition()
 PopBlock()
 B = TRUE
 RETURN( B )
END

PROC Main()
 STRING s1[255] = "Thomas" // 1st word
 STRING s2[255] = "FlatFile" // 2nd word
 STRING s3[255] = "gix" // search option for regular expression search
 STRING s4[255] = "20" // total amount of lines
 INTEGER buffer2I = 0
 PushPosition()
 buffer2I = CreateTempBuffer()
 PopPosition()
 IF ( NOT ( Ask( "block: search: expression: regular: line: multi: inS1 = ", s1, _EDIT_HISTORY_ ) ) AND ( Length( s1 ) > 0 ) ) RETURN() ENDIF
 IF ( NOT ( Ask( "block: search: expression: regular: line: multi: inS2 = ", s2, _EDIT_HISTORY_ ) ) AND ( Length( s2 ) > 0 ) ) RETURN() ENDIF
 IF ( NOT ( Ask( "block: search: expression: regular: line: multi: searchOptionS = ", s3, _EDIT_HISTORY_ ) ) AND ( Length( s3 ) > 0 ) ) RETURN() ENDIF
 IF ( NOT ( Ask( "block: search: expression: regular: line: multi: lineTotalI = ", s4, _EDIT_HISTORY_ ) ) AND ( Length( s4 ) > 0 ) ) RETURN() ENDIF
 Message( FNBlockSearchExpressionRegularLineMultiB( s1, s2, s3, Val( s4 ), buffer2I ) ) // gives e.g. TRUE
 GotoBufferId( buffer2I )
END

---

with friendly greetings
Knud van Eeden


--

---
You received this message because you are subscribed to the Google Groups "SemWare TSE Pro text editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email to semware+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/semware/CAOtq5xrDVRE8TLzg22H-dUYrOZQHG%2BeneY84pUhmn4_SLMXA4g%40mail.gmail.com.

knud van eeden

unread,
May 14, 2022, 2:49:05 PM5/14/22
to sem...@googlegroups.com
01. E.g. FindAcrossLines.s finds only ONE occurrence (=the first), then
         highlights the block starting with the first word, ending at
         the second word then stops.

    But it is currently very slow and hangs TSE if very large file (e.g. 1 gigabytes) is tried.

    The method used to highlight the block in FindAcrossLines.s is basically using NextChar( <range> ),
    e.g. NextChar( 1000 ).

02. This program in the email finds ALL occurrences in the block of a
    first word followed by second word in the highlighted block very
    fast (in the order of maybe 2 to 3 times slower than TSE Find()
    itself and puts the output in a buffer.

03. It extracts e.g. all occurrences of for example 'Thomas' followed by
    'Flatfile' in my 1 gigabyte working file with 15 million lines in a
    few seconds successfully.

04. Though this is a *very rough prototype*, it is certainly good enough
    for me to use from now on daily.

05. There is no checking what so ever in this version, so as short as
    possible program and as fast as possible.

06. Algorithm based on the idea of creating 1-liners then doing a
    regular expression search in that 1 liner.

07. The first word is searched without regular expression.
    But for the second word the full TSE regular expression syntax
    can be applied.

     E.g.

      first.*second.*third

    will search first without regular expression, but second.*third
    with a regular expression, thus searches also for second followed
    by third.

08. How to use:

09. First you highlight the part the file of or the whole file.

10. Then you choose as input the first word, then the second word(s),
    then the search option (e.g. 'gix') then the total amount of lines
    to search in after the first line.

11. So not working with characters after (e.g. using

     NextChar( <range> )

    but with lines after because mabye a little faster.

12. All hits (thus all found occurrences of the first word followed by
    the second word) are all collected in a buffer as 1-liners and output as
    one-liners

---

--- cut here: begin --------------------------------------------------
    MarkLine( 1, NumLines() )
    Copy()
    GotoBufferId( buffer2I )
    EndFile()
   IF LFind( "^$", "cgx" )
    DelLine()
    ENDIF
    Paste()
   ENDIF
   PopBlock()
   PopPosition()
   NextChar()
  ENDWHILE
  PopPosition()
  PopBlock()
  B = TRUE
  RETURN( B )
 END
 //
 PROC Main()
  STRING s1[255] = "Thomas" // 1st word
  STRING s2[255] = "FlatFile" // 2nd word
  STRING s3[255] = "gix" // search option for regular expression search
  STRING s4[255] = "20" // total amount of lines
  INTEGER buffer2I = 0
  IF ( NOT ( IsBlockInCurrFile() ) ) Warn( "Please mark a block" ) RETURN() ENDIF // return from the current procedure if no block is marked
  PushPosition()
  buffer2I = CreateTempBuffer()
  PopPosition()
  IF ( NOT ( Ask( "block: search: expression: regular: line: multi: inS1 = ", s1, _EDIT_HISTORY_ ) ) AND ( Length( s1 ) > 0 ) ) RETURN() ENDIF
  IF ( NOT ( Ask( "block: search: expression: regular: line: multi: inS2 = ", s2, _EDIT_HISTORY_ ) ) AND ( Length( s2 ) > 0 ) ) RETURN() ENDIF
  IF ( NOT ( Ask( "block: search: expression: regular: line: multi: searchOptionS = ", s3, _EDIT_HISTORY_ ) ) AND ( Length( s3 ) > 0 ) ) RETURN() ENDIF
  IF ( NOT ( Ask( "block: search: expression: regular: line: multi: lineTotalI = ", s4, _EDIT_HISTORY_ ) ) AND ( Length( s4 ) > 0 ) ) RETURN() ENDIF
  Message( FNBlockSearchExpressionRegularLineMultiB( s1, s2, s3, Val( s4 ), buffer2I ) ) // gives e.g. TRUE
  GotoBufferId( buffer2I )
 END

--- cut here: end ----------------------------------------------------

---

with friendly greetings
Knud van Eeden

On Friday, May 13, 2022, 01:43:44 AM GMT+2, C.H. Fred <foxmul...@gmail.com> wrote:
Is there an advanced find macro that would let me type in two words near each other, same line or nearby lines?
Rick C. Hodgin

Carlo Hogeveen

unread,
May 14, 2022, 5:31:32 PM5/14/22
to sem...@googlegroups.com

https://ecarlo.nl/tse/index.html#FindaXlines

I renamed my “FindAcrossLines” tool to "FindaXlines", so that it fits in the Potpourri menu (of newer TSE versions).

In v0.4 a better algorithm increased its speed 37-fold.

On my pc finding a string at the bottom of a 1 GB file now takes 12 minutes.

Given that this tool's intended functionality is to be like TSE's own Find command, it is now at its speed limit, meaning it will not get any faster.

Carlo



C.H. Fred

unread,
May 14, 2022, 5:47:22 PM5/14/22
to sem...@googlegroups.com
Sammy,

Is there, or could you add one, a low-level API hook that would allow iteration through every line in a block or in the editor top-down?

LowLevel_enumLines(myDllPathname, FuncName)

It calls FuncName() in my.dll, and I return true or false indicating if TSE should continue iterating.

Could be read-only, or return a string to replace the line contents with if not NULL.

-- 
Rick C. Hodgin

knud van eeden

unread,
May 14, 2022, 10:12:27 PM5/14/22
to sem...@googlegroups.com
> Given that this tool's intended functionality is to be like TSE's own Find command

It looks like it only finds 1 occurrence? highlights it and then stops.

If it behaves like Find() in general it would be expected to find similarly between 0 and N occurrences.

E.g. Find() stores the N results, as is known, in a buffer which one can access using e.g. compressed view (ALT E).

Note: Of course one could stop Find() after only 1 hit.

with friendly greetings
Knud van Eeden




Carlo



--

---
You received this message because you are subscribed to the Google Groups "SemWare TSE Pro text editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email to semware+unsub...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/semware/000801d867d9%24f9079ee0%24eb16dca0%24%40ecarlo.nl.

Guy Rouillier

unread,
May 15, 2022, 12:40:40 AM5/15/22
to sem...@googlegroups.com
I've been following along, and reading the valiant attempts to get this working.  I wanted to mention that pcregrep handles multiline searches out of the box.  I don't have a gigabyle-length text file handy to test speed, but it does work.  A Windows version is readily available.  Might be easier and produce faster results to use that and spend the time parsing the output.

Guy Rouillier

On 5/12/2022 7:42:16 PM, "C.H. Fred" <foxmul...@gmail.com> wrote:

--

---
You received this message because you are subscribed to the Google Groups "SemWare TSE Pro text editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email to semware+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/semware/CAOtq5xrDVRE8TLzg22H-dUYrOZQHG%2BeneY84pUhmn4_SLMXA4g%40mail.gmail.com.

Carlo Hogeveen

unread,
May 15, 2022, 2:22:51 AM5/15/22
to sem...@googlegroups.com

Hi Guy,

You are not wrong. As a stand-alone, the tool you mention, and many other command line tools, and possible other editors already do this well and faster.

However, when extending TSE functionality, when possible I choose differently when having to make the following choice, which I would then have to impose on others:
- Install an open source TSE macro.
- Install an open source TSE macro and an executable downloaded from the internet.

Besides, this is fun. And you called me valiant.

Carlo



Carlo Hogeveen

unread,
May 15, 2022, 5:24:00 AM5/15/22
to sem...@googlegroups.com

https://ecarlo.nl/tse/index.html#FindaXlines

In v0.5 of the FindaXlines tool I implemented the "v" search option to list all occurrences, including the <Alt E> key to edit the list.

For example, executing FindaXlines with parameters "\sfor\s.*\sendfor\s" and "givx" on this macro's source will list both its for-statements.

Carlo



C.H. Fred

unread,
May 15, 2022, 8:07:30 AM5/15/22
to sem...@googlegroups.com
I took the code and moved it into my ui file, and modified it so it works on <Ctrl+Q><2>.

It prompts for word one, word two, and then goes.

My next tweak will be <Ctrl+L> to be a smart repeat find to rerun the last one if it came from the normal find or this one.

Easy peasy. :-)

-- 
Rick C. Hodgin


--

---
You received this message because you are subscribed to the Google Groups "SemWare TSE Pro text editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email to semware+u...@googlegroups.com.

knud van eeden

unread,
May 15, 2022, 9:22:58 AM5/15/22
to sem...@googlegroups.com
Could you please upload the latest version v0.5 to 


(still v0.4 at this moment)

Thanks
with friendly greetings
Knud van Eeden




Carlo



--

---
You received this message because you are subscribed to the Google Groups "SemWare TSE Pro text editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email to semware+unsub...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/semware/000101d8683d%24810323b0%2483096b10%24%40ecarlo.nl.

Knud van Eeden

unread,
May 15, 2022, 9:35:21 AM5/15/22
to sem...@googlegroups.com
> pcregrep 

1. Where (URL) do you download your Microsoft Windows version (zip, exe, ...)?

2. And how do you use it on the command line? 


Thanks
with friendly greetings
Knud van Eeden

Knud van Eeden

unread,
May 15, 2022, 9:39:51 AM5/15/22
to sem...@googlegroups.com
Latest version (FindAXLines.s) v0.5 (not available at this moment) should handle it out of the box (using the 'v' option).

with friendly greetings
Knud van Eeden

Carlo Hogeveen

unread,
May 15, 2022, 10:01:46 AM5/15/22
to sem...@googlegroups.com
> Could you please upload the latest version v0.5 to

Oops!
Done.

https://ecarlo.nl/tse/index.html#FindaXlines



knud van eeden

unread,
May 15, 2022, 10:19:07 AM5/15/22
to sem...@googlegroups.com
> Could you please upload the latest version v0.5 
I do not see that option box appearing e.g. with <ALT E> after searching with v0.5:

Steps to reproduce:

1. Load the file (e.g. 1 gigabytes file size) containing that text

2. Search for

Thomas.*FlatFile

3. Options

givx200

4. It takes only about 2 minutes now (so useable now) before it comes back and highlights the first occurrence.

5. Result: But no <ALT E> option seen.

6. Expected: <ALT E> option seen.

knud van eeden

unread,
May 15, 2022, 10:30:01 AM5/15/22
to sem...@googlegroups.com
FYI only: Also the 'V' option is missing in the search option box (it only shows [GIMX]) when providing the input to FindAXLines.s.



knud van eeden

unread,
May 15, 2022, 10:41:11 AM5/15/22
to sem...@googlegroups.com
> ... search for SQL expressions that have things like SELECT.*HAVING ...
> Rick C. Hodgin

If having to find all that using <CTRL><L> is taking a lot of manual effort:

===

FYI only: Here you see an output from my TSE program searching in a 1 gigabytes working file for

 SELECT.*HAVING

Thus more than one outputs of the search.

First the line number, then the first word (case sensitive SELECT), then followed by case non-sensitive HAVING in the first 40 lines after the first word.

(My TSE program generates this multi-line regular expression output in a few seconds natively in TSE memory).

(see attached screenshot).

Inline image

Carlo Hogeveen

unread,
May 15, 2022, 10:49:31 AM5/15/22
to sem...@googlegroups.com

Knud,

RE your problem with v0.5 of the FindaXlines tool.

I have checked the server version, and it's OK.
I have completely restarted the server to make extra sure there is no old version in the server's cache.
I have downloaded the tool again and compared it to the intended version, and they are the same.

> ... givx200 ...
> ... before it comes back and highlights the first occurrence. ...

Given that you supply the "v" option and "... get the first occurrence ... " instead of a "View Finds" window, it still behaves like an older version of the tool.
I cannot reproduce that with the v0.5 version of the tool.
Here I get a very nice "View Finds" window.

https://ecarlo.nl/tse/index.html#FindaXlines

Carlo



knud van eeden

unread,
May 15, 2022, 10:58:39 AM5/15/22
to sem...@googlegroups.com
> RE your problem with v0.5 of the FindaXlines tool.
I have checked the server version, and it's OK.
I have completely restarted the server to make extra sure there is no old version in the server's cache.
I have downloaded the tool again and compared it to the intended version, and they are the same.

Yes, when trying to download it again in my default working with Google Chrome browser, it kept being version v0.4, even after your message that it was uploaded.

Then I started another browser thus, Microsoft Edge, and it downloaded the v0.5 successfully this time, while in Google Chrome still downloading v0.4.

Root cause thus: caching behavior (thus keeping the old information for some reason) in the Google Chrome browser.

knud van eeden

unread,
May 15, 2022, 11:20:24 AM5/15/22
to sem...@googlegroups.com
> ... givx200 ...
> ... before it comes back and highlights the first occurrence. ...
> Given that you supply the "v" option and "... get the first occurrence ... " instead of a "View Finds" window, it still behaves like an older version of the tool.
> I cannot reproduce that with the v0.5 version of the tool.
> Here I get a very nice "View Finds" window.

Yes, it was still the old v0.4 in the compilation path, while v0.5 was compiled elsewhere, so v0.4 unseeingly took priority it shows.

Indeed, now a very useful result with all the outputs shown, like in <ALT E> and then being able to go to such lines by pressing ENTER on that line.

(Note: Here it takes now about 5 minutes to search in the 1 gigabytes file, so doable).

knud van eeden

unread,
May 15, 2022, 1:08:14 PM5/15/22
to sem...@googlegroups.com
> Is there an advanced find macro that would let me type in two words near each other, same line or nearby lines?
> Rick C. Hodgin

Latest version of searbllm.s

1. About as fast as native TSE Find() (e.g. a few seconds to search for SELECT.*HAVING in a 1 gigabyte file)
2. Highlight the file or part of file first.
3. Run, then input first word (e.g. SELECT), second word (e.g. HAVING), search option (e.g. 'gix'), total amount of *characters* to search after the first word (e.g. 1000)
4. It will output all found occurrences as concatenated 1-liners, starting with the line number the first word was found, followed by the first word, followed by the text until the second word, followed by the second word.

See below and also in attached searbllm.s.

INTEGER PROC FNBlockSearchExpressionRegularLineMultiB( STRING in1S, STRING in2S, STRING searchOptionS, INTEGER characterTotalI, INTEGER buffer2I )
 INTEGER B = FALSE
 INTEGER buffer1I = 0
 INTEGER J = 0
 IF ( NOT ( IsBlockInCurrFile() ) ) Warn( "Please mark a block" ) B = FALSE RETURN( B ) ENDIF // return from the current procedure if no block is marked
 PushPosition()
 PushBlock()
 Set( Break, ON )
 PushPosition()
 buffer1I = CreateTempBuffer()
 PopPosition()
 GotoBlockBegin()
 WHILE ( LFind( in1S, "l" ) AND IsCursorInBlock() )
  J = CurrLine()
  PushPosition()
  PushBlock()
  UnMarkBlock()
  MarkStream()
  NextChar( characterTotalI )
  MarkStream()
  Copy()
  GotoBufferId( buffer1I )
  MarkLine( 1, NumLines() )
  DelBlock()
  Paste()
  REPEAT
   EndLine()
   Right()
   B = JoinLine()
  UNTIL ( NOT B )
  BegLine()
  IF LFind( Format( in1S, ".*", in2S ), searchOptionS )
   MarkLine( 1, NumLines() )
   Copy()
   GotoBufferId( buffer2I )
   WHILE LFind( "^$", "gix" )
    DelLine()
   ENDWHILE
   EndFile()
   Paste()
   BegLine()
   InsertText( Format( Format( Str( J ), " " ) : 10 : " " ), _INSERT_ )
  ENDIF
  AddLine()
  PopBlock()
  PopPosition()
  NextChar()
 ENDWHILE
 PopPosition()
 PopBlock()
 B = TRUE
 RETURN( B )
END
//
PROC Main()
 STRING s1[255] = "Thomas" // 1st word
 STRING s2[255] = "FlatFile" // 2nd word
 STRING s3[255] = "gix" // search option for regular expression search
 STRING s4[255] = "40" // total amount of lines to search in after the first word
 INTEGER buffer2I = 0
 IF ( NOT ( IsBlockInCurrFile() ) ) Warn( "Please mark a block" ) RETURN() ENDIF // return from the current procedure if no block is marked
 PushPosition()
 buffer2I = CreateTempBuffer()
 PopPosition()
 IF ( NOT ( Ask( "block: search: expression: regular: line: multi: inS1 = ", s1, _EDIT_HISTORY_ ) ) AND ( Length( s1 ) > 0 ) ) RETURN() ENDIF
 IF ( NOT ( Ask( "block: search: expression: regular: line: multi: inS2 = ", s2, _EDIT_HISTORY_ ) ) AND ( Length( s2 ) > 0 ) ) RETURN() ENDIF
 IF ( NOT ( Ask( "block: search: expression: regular: line: multi: searchOptionS = ", s3, _EDIT_HISTORY_ ) ) AND ( Length( s3 ) > 0 ) ) RETURN() ENDIF
 IF ( NOT ( Ask( "block: search: expression: regular: line: multi: characterTotalI = ", s4, _EDIT_HISTORY_ ) ) AND ( Length( s4 ) > 0 ) ) RETURN() ENDIF
 Message( FNBlockSearchExpressionRegularLineMultiB( s1, s2, s3, Val( s4 ), buffer2I ) ) // gives e.g. TRUE
 GotoBufferId( buffer2I )
END
searbllm.s

knud van eeden

unread,
May 15, 2022, 3:54:47 PM5/15/22
to sem...@googlegroups.com
>> I wanted to mention that pcregrep handles multiline searches out of the box.
>> Guy Rouillier

---

> And how do you use the multi-line regular expression notation on the pcregrep.exe command line? 





---

> pcregrep 
> Where (URL) do you download your Microsoft Windows version (zip, exe, ...)?

The Microsoft Windows can be downloaded here: http://www.rexegg.com/pcregrep-pcretest.html
Not PCRE2 (because for some reason that has to be compiled yourself, which almost always is a big disaster operation on Microsoft Windows (e.g. no needed tools or libraries available).

If you download PCRE2 you get a .zip which is needed to be recompiled by you.

Thus download the older PCRE1, current version 8.45.

---

>>> Is there an advanced find macro that would let me type in two words near each other, same line or nearby lines?
>>> Rick C. Hodgin

with friendly greetings
Knud van Eeden

Guy Rouillier

unread,
May 16, 2022, 3:46:23 AM5/16/22
to sem...@googlegroups.com
The version I have on my system (Windows 7 64-bit) came packaged with MSYS2, which I use to run the MinGW-w64 compiler.  You can install just the pcre package from here:


With the following test file grep.txt:
======
this is on the first line
and this is on the second line
======

both the following invocations find a match:

pcregrep -Mi "first(.*\r\n).*second" grep.txt
pcregrep -Mi "(?s)first.*second" grep.txt

Guy Rouillier
Reply all
Reply to author
Forward
0 new messages