Problem with large file and highlight.current.word=1

133 views
Skip to first unread message

Jérôme LAFORGE

unread,
Aug 20, 2012, 5:42:01 PM8/20/12
to scite-i...@googlegroups.com
Hello, 
If we open large files (e.g. log files) with highlight.current.word=1, then the editor can be frozen because it can be looking for all occurrences of the current word (word at the caret) (issue : http://sourceforge.net/tracker/index.php?func=detail&aid=3530365&group_id=2439&atid=102439)
To avoid this problem, you have to turn off this feature. If you turn off, then you don't have anymore this useful feature for all open buffers. 
Moreover if you forget to turn off highlight.current.word, your editor is frozen. Mostly you have to kill the process and  you lose your context (unsaved files, open buffers, caret position, search mark and so on).

Imho, a simple workaround can be proposed :
If we associate struct currentWordHighlight to Buffer (into class Buffer) and not SciTE process (into class SciTEBase), then we can disabled this feature for current buffer (because the file is too large or for others conditions) and let enabled this feature for others buffers. The disable/enable of this feature can be set with command.discover.properties (for example with python script is charge of count the number of lines into current file and disables/enables this feature).

Jérôme LAFORGE


Neil Hodgson

unread,
Aug 22, 2012, 6:43:00 AM8/22/12
to scite-i...@googlegroups.com
Jérôme LAFORGE:

If we open large files (e.g. log files) with highlight.current.word=1, then the editor can be frozen because it can be looking for all occurrences of the current word (word at the caret) (issue : http://sourceforge.net/tracker/index.php?func=detail&aid=3530365&group_id=2439&atid=102439)
To avoid this problem, you have to turn off this feature. If you turn off, then you don't have anymore this useful feature for all open buffers. 

   You can turn it off per-directory with local options.

Imho, a simple workaround can be proposed :
If we associate struct currentWordHighlight to Buffer (into class Buffer) and not SciTE process (into class SciTEBase),

   Current word highlighting also affects the output pane so moving this into each buffer would be even more confusing.

   I'd prefer to just limit search to an arbitrary amount of time, say 100 ms, with *NO* property to change this.

   Neil

Jérôme LAFORGE

unread,
Aug 22, 2012, 8:40:33 AM8/22/12
to scite-i...@googlegroups.com
   You can turn it off per-directory with local options.

I know, but you have to always open the file into directory with good local options. Because, if you forget to do that you editor can be frozen. From my point of view, I prefer automatic solution.

 
   I'd prefer to just limit search to an arbitrary amount of time, say 100 ms, with *NO* property to change this.
 
This solution is quite straightforward. But with this limitation, the user can be tricked because he see some highlighted occurrences but sometime (with slow computer or large file) all occurrences are not highlighted into the file.

For example, my .SciTEUser.properties defines lot lexer style. The size of .SciTEUser.properties is about 25,1KB. If I want highlight the word "style", then only first occurrence of style are highlighted, and if I don't pay attention, then I can miss some occurrences (at end of file).

Do you think that is better to disable completely this feature?
For example, if the file is larger than 500KB, then highlight.current.word is disabled.

--
"The box said 'Requires Windows 95, NT, or better,' so I installed Linux."

Jérôme LAFORGE

unread,
Aug 22, 2012, 11:46:58 AM8/22/12
to scite-i...@googlegroups.com
Do you think that is better to disable completely this feature?
For example, if the file is larger than 500KB, then highlight.current.word is disabled.
 
Although, to define this maximum size file for highlight current word,
maybe the property max.file.size can be used (http://www.scintilla.org/SciTEDoc.html#property-max.file.size).

Neil Hodgson

unread,
Aug 22, 2012, 9:07:02 PM8/22/12
to scite-i...@googlegroups.com
Jérôme LAFORGE:

   You can turn it off per-directory with local options.

I know, but you have to always open the file into directory with good local options. Because, if you forget to do that you editor can be frozen. From my point of view, I prefer automatic solution.

   Fixing "editor is frozen" isn't as strict as making the feature usable: it only has to be limited to allowing the user to close the file or open local properties to turn it off. One second of lag, even two is fine for that.

This solution is quite straightforward. But with this limitation, the user can be tricked because he see some highlighted occurrences but sometime (with slow computer or large file) all occurrences are not highlighted into the file.

   If the timer expires then all highlights should be removed.

Do you think that is better to disable completely this feature?
For example, if the file is larger than 500KB, then highlight.current.word is disabled.

   I'd select 1 MB as a nice round number which would produce about a 250 ms lag for "style" in a file similar to src/Embedded.properties on my machine although it will be slower on an Atom.


Although, to define this maximum size file for highlight current word,
maybe the property max.file.size can be used (http://www.scintilla.org/SciTEDoc.html#property-max.file.size).

   That is tying two distinct features together.

   Neil

Jérôme LAFORGE

unread,
Aug 23, 2012, 4:01:31 AM8/23/12
to scite-i...@googlegroups.com
   If the timer expires then all highlights should be removed.
 
This solution is fine for me. 


   I'd select 1 MB as a nice round number which would produce about a 250 ms lag for "style" in a file similar to src/Embedded.properties on my machine although it will be slower on an Atom.
 
Your Atom processor is faster than mine :). For the same file, the search take about 2,943372 seconds.

You can find the patch where the duration is limited to 250ms. If you prefer another duration, let me know it.
 

Neil Hodgson

unread,
Aug 25, 2012, 3:23:42 AM8/25/12
to scite-i...@googlegroups.com
Jérôme LAFORGE:

Your Atom processor is faster than mine :). For the same file, the search take about 2,943372 seconds.

   That sounds like a debug build instead of a release build or it could be a compiler difference. I'm using the free version of Visual C++ 2010.

You can find the patch where the duration is limited to 250ms. If you prefer another duration, let me know it.

   MSVC doesn't like the assignment inside the conditional expression. The boolean variable doesn't seem to be needed.

   Neil

Jérôme LAFORGE

unread,
Aug 25, 2012, 4:54:03 AM8/25/12
to scite-i...@googlegroups.com
   That sounds like a debug build instead of a release build or it could be a compiler difference. I'm using the free version of Visual C++ 2010.

I use GNU/Linux only. For compilation, I use the compilation defined by your makefile (no DEBUG=1 and no CLANG=1) 
 
gcc --version :
gcc (Ubuntu 4.4.3-4ubuntu5.1) 4.4.3
Copyright (C) 2009 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.



   MSVC doesn't like the assignment inside the conditional expression. The boolean variable doesn't seem to be needed.
Plz find this new patch :

 

Neil Hodgson

unread,
Aug 25, 2012, 5:42:11 AM8/25/12
to scite-i...@googlegroups.com
Jérôme LAFORGE:

I use GNU/Linux only. For compilation, I use the compilation defined by your makefile (no DEBUG=1 and no CLANG=1) 

   Mmm. Maybe some of the GTK+ calls are slower. Try switching to UTF-8.

   Neil

Jérôme LAFORGE

unread,
Aug 25, 2012, 10:57:18 AM8/25/12
to scite-i...@googlegroups.com
Try switching to UTF-8.

  Much more fast with UTF-8 : 0,056006

Neil Hodgson

unread,
Aug 25, 2012, 6:38:58 PM8/25/12
to scite-i...@googlegroups.com
Jérôme LAFORGE:

Try switching to UTF-8.

  Much more fast with UTF-8 : 0,056006

   OK, the problem is that even for case-sensitive matches, a case folding object is being populated with the folded values of characters 0x80 .. 0xff which is slower on GTK+. The case folder can be avoided by changing Editor::SearchInTarget to have a conditional expression:

ScopedCaseFolder pcf((searchFlags & SCFIND_MATCHCASE) ? NULL : CaseFolderForEncoding());

   Other approaches to this could include caching the folding object until the encoding is changed. The folded values might be found faster by processing the whole set together instead of character by character.

   Neil

Jérôme LAFORGE

unread,
Aug 26, 2012, 5:47:33 AM8/26/12
to scite-i...@googlegroups.com
The case folder can be avoided by changing Editor::SearchInTarget to have a conditional expression:

ScopedCaseFolder pcf((searchFlags & SCFIND_MATCHCASE) ? NULL : CaseFolderForEncoding());

This solution works fine with ASCII or UTF-8 encoding.
ASCII : 0,040070
UTF-8 : 0,049127

If you can commit this solution, that would be great.
 

Neil Hodgson

unread,
Aug 26, 2012, 8:25:46 PM8/26/12
to scite-i...@googlegroups.com
Jérôme LAFORGE:

> If you can commit this solution, that would be great.

Its a change with significant potential impact so it shouldn't be applied near the end of a release cycle. I should be making a 3.2.2 release this week.

A more comprehensive change that doesn't recompute the folding table for each run of case-insensitive searches will be more widely useful.

Neil

Jérôme LAFORGE

unread,
Aug 27, 2012, 4:23:36 AM8/27/12
to scite-i...@googlegroups.com
For my information, you told me that the search of "style" word take about 250ms on your atom on Windows.
Can you perform this test again with no case folder for encoding to know whether there is a gain or not on Windows? 

Neil Hodgson

unread,
Aug 27, 2012, 8:37:43 PM8/27/12
to scite-i...@googlegroups.com
Jérôme LAFORGE:

> For my information, you told me that the search of "style" word take about 250ms on your atom on Windows.

The measured machine was an i7 870, not an Atom.

> Can you perform this test again with no case folder for encoding to know whether there is a gain or not on Windows?

I didn't keep the test file or code. Recreating as close as I can remember, with a 1 MB file containing 4900 matches, I get 200 ms for the original code and 40 ms with the modification.

Neil

Jérôme LAFORGE

unread,
Sep 1, 2012, 1:51:24 PM9/1/12
to scite-i...@googlegroups.com
   A more comprehensive change that doesn't recompute the folding table for each run of case-insensitive searches will be more widely useful.

Please find this enclosed patch.
 
scintilla_rev4279.patch

Neil Hodgson

unread,
Sep 4, 2012, 5:18:49 AM9/4/12
to scite-i...@googlegroups.com
Jérôme LAFORGE:

Please find this enclosed patch.

   Thanks, but I implemented this a bit differently by locating the case folder on the Document object so it would't be confused when changing documents. It also needs to free the case folder when setting the character set as case folding values may differ between, Latin1 and ISO-8859-7 (Greek), for example.

   Committed.

   As a stylistic issue, it is not necessary to check whether a pointer is NULL before deleting it and some lint tools don't like seeing this. Instead of 

   if (ptr)
      delete p;

   just use

   delete p;

   Neil

Jérôme LAFORGE

unread,
Sep 4, 2012, 12:35:15 PM9/4/12
to scite-i...@googlegroups.com
   Thanks, but I implemented this a bit differently by locating the case folder on the Document object so it would't be confused when changing documents. It also needs to free the case folder when setting the character set as case folding values may differ between, Latin1 and ISO-8859-7 (Greek), for example.

Ok, no pb
  
   Committed.
Thx for this commit that's improved drastically the search for highlight current word feature.
 
Reply all
Reply to author
Forward
0 new messages