SET FILTER speed up

883 views
Skip to first unread message

Nenad Batocanin

unread,
Jul 26, 2013, 10:03:30 PM7/26/13
to harbou...@googlegroups.com

Topic for all time: I'm trying to speed up TBrowse with a filter. I have a DBF table with about 30,000 records on server. The user can enter a complex condition (can not be optimized by the index). I want to allow him to browse filtered table. I've alredy tried some solutions like creating a temporary conditional index but with no succes. Create index is not a good solution because the user waits too long for the index creating. Other solution is SET FILTER. This command is slow because for all records always test the condition again and again. For example, if the first few records satisfy the condition, Ctrl-PgDn will always check all the remaining 30,000 records. Of course, it is sometimes necessary. But what if the condition always (approximately) gives the same results?

 

I was thinking about a new temporary table that stores numbers of records that satisfy the condition and number of previous and the next ok records. When the browse is activated, the table is filled with the first few records. As the user moves the cursor, the table is filled with new data. But, if the user re-crosses the records already in the table, the view is much faster. Last record was clearly set, and Ctrl-PgDn will not check thousands of unnecessary records. The problem is implementation - requires quite a change and need to change Tbrowse:Skip, goTop, goBottom... Some sort of system support there'd be more than welcome in my opinion.

 

What is your opinion?

 

Regards, NB

SD

unread,
Jul 27, 2013, 1:32:12 AM7/27/13
to harbou...@googlegroups.com
Yes, indeed a good topic.

Once I was stuck with SET FILTER, and had discussion here in this forum also, Francesco took the pain to look into the source. We tried lot of combinations, copying to temp. dbfs, filter expression optimization etc.... But one small piece of change did wonders for me.....

As SET FILTER command is very convenient and never fails in fetching the exact required data, I tend to use it.

Yes, you need optimization, refactoring.

The one thing which I found doing havoc in my case was the following code inside a do while loop [on local pc it was ok, but when exposed over the LAN it was becoming too slow):

sele Cons1
set order to 1
go top

I just commented the 2nd and 3rd line.... and got the much needed "accelerator"!

The usage of "GO TOP" should be taken with great care, and there shouldn't be multiple "go top" - I found them the "silent killers" , if you test them on local pc it could be insignificant but over a network the slow speed is very much noticeable.

If something can come up directly from Harbour as an alternative to SET FILTER, an optimized modified better way to have "filtered data", nothing like it.......

Regards,
Swapan

Francesco Perillo

unread,
Jul 27, 2013, 1:57:26 AM7/27/13
to harbou...@googlegroups.com

 

I was thinking about a new temporary table that stores numbers of records that satisfy the condition and number of previous and the next ok records.



I personally don't use TBrowse on big data tables but this is a subject I was thinking about lately.
The solution I have in my mind - but never tested - has your as first step, creating a temporary table with just the recno() of the selected records. Please keep in mind that this is a "snapshot" of the "pointers" to the data set.
This snapshot can be created using MT (a thread that does the LOCATE FOR/CONTINUE and fills the snapshot) or a process on the server so that data doen't cross the lan (I was thinking about netio RPC or exposing a web service) but the server platform must be supported....

In both ways you may start to browse immediately and you don't need to change TBrowse code (note 1).

You then setup a SET RELATION from snapshot records to the real data records. With this solution you must be aware that this is a snapshot of "pointers". If someone on the lan changes the values you will see the changed values when screen refreshes but if someone changes a field involved in the filter you continue to see that records !!!

It is also a bit more complicated to make changes on the data...


Another solution is copying in the snapshot all the fields you need to show in the TBrowse (also with MT or on server). This is a REAL SNAPSHOT, it is a COPY of the data and you will never see the changes done by others until you refresh. Since it is a dbf copy you should take care of editing, if allowed.


Francesco

note 1: TBrowse is a class and you may easily subclass from it and change only the methods you need to change, like I did here:
CREATE CLASS FlagBrowse FROM TBrowse


do...@people.net.au

unread,
Jul 27, 2013, 5:13:13 AM7/27/13
to harbou...@googlegroups.com
Hi Nenad


> Create index is not a good solution because the user waits too long for the index creating.

Are you doing this:
1. locally (which should be rather fast) or
2. across a network (which will be somewhat slower) or
3. across a wide area network (which will be painfully slow).

Once the index is created response should be good (up or down)

If either 2 or 3 the solution may be to have the indexing done on the computer which houses the data (ie a client-server model).  I use the TIP library to program client-server applications, but you may well be able to do it using netio?  Others may well be able to comment on that possibility.

Depending upon the size of the data file having it stored on a solid state disk might also give quite a speed improvement.

Good luck!
REgards
Doug





On Sat 27/07/13 04:03 , "Nenad Batocanin" nbato...@wings.rs sent:

Topic for all time: I'm trying to speed up TBrowse with a filter. I have a DBF table with about 30,000 records on server. The user can enter a complex condition (can not be optimized by the index). I want to allow him to browse filtered table. I've alredy tried some solutions like creating a temporary conditional index but with no succes. Create index is not a good solution because the user waits too long for the index creating. Other solution is SET FILTER. This command is slow because for all records always test the condition again and again. For example, if the first few records satisfy the condition, Ctrl-PgDn will always check all the remaining 30,000 records. Of course, it is sometimes necessary. But what if the condition always (approximately) gives the same results?

 

I was thinking about a new temporary table that stores numbers of records that satisfy the condition and number of previous and the next ok records. When the browse is activated, the table is filled with the first few records. As the user moves the cursor, the table is filled with new data. But, if the user re-crosses the records already in the table, the view is much faster. Last record was clearly set, and Ctrl-PgDn will not check thousands of unnecessary records. The problem is implementation - requires quite a change and need to change Tbrowse:Skip, goTop, goBottom... Some sort of system support there'd be more than welcome in my opinion.

 

What is your opinion?

 

Regards, NB

--
--
You received this message because you are subscribed to the Google
Groups "Harbour Users" group.
Unsubscribe: harbour-user...@googlegroups.com
Web: http://groups.google.com/group/harbour-users
 
---
You received this message because you are subscribed to the Google Groups "Harbour Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to harbour-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Clippero

unread,
Jul 27, 2013, 8:42:48 AM7/27/13
to harbou...@googlegroups.com
[Eng] by Google

Hi, the subject filters. DBF is complicated if you want to get is volocidad.
Slow addition method using SET FILTER must add disk access, the speed if the team that runs the application, whether it is a local or network is, if it's a small local network or a network largest extended by a VPN, the speed of network connections, if the. dbf is related to another or others. . dbf, etc..

I've tried filtering. Dbf files with 3000 or 10000 records, I share some points.

0 - If you are in network environment, connections must be made as well as possible, avoid jumping to switch and routers, the application has to have good bandwidth and have the highest enforcement priorities, the antivirus does not have to know that the application is running, etc.. etc.
1 - The. Dbf to filter NO may be related
2 - If you can, there must be an index to help the SET FILTER, locally created this index is faster than if you are on the server, but is to say that the bigger the file slower index creation.

3 - Two options you can do with the. Dbf directly from the server or a local copy, in the case of copying data we use "real" as the. Dbf original might be having changes in which we will not know.
  
a-Si can be loaded into an array of numbers of records that meet the condition for this is to do a tour of. dbf (generating a delay) loading into an array of records that meet the filter condition eg

_vendedor: = 1

DBGOTOP()
WHILE! EOF()
      
IF vendor = _vendedor
         
AADD(Afiltro, RECNO())
      
ENDIF
      
DBSKIP()
ENDDO
SET FILTER TO ASCAN(Afiltro, RECNO()) # 0
DBGOTOP()
TBrowse()
 
While it takes a time to load the array with RECNO () that satisfy the condition, analyzed both the SET FILTER is always the same is the filter that is itself a simple condition is the RECNO () or not in the array, ready.
The limitation is the size of the array
In this case the index helps speed traveled before the filter, to speed after fitro view could be had in advance to an RECNO index () and set this ORDER to the view with the TBrowse ()
When using an index that does not support FILTER SET speed is a little lower.

b - The second option is to use the covered loading the array with RECNO () which meet the condition or not and then copy the data to a new. dbf, if it can be on the local computer.
The TBrowse () will be made without any filter and speed would be normal to browsear one. Dbf, which may even be related since there is no filter any slowing in sight.
We return to the problem of not working on the actual data, any changes would not be taken into account or to refresh the view or to update the original file, unless that upgrade process instrument data from the temporary file to the real- thing that does not always work :-(

Note: With respect to the indices I use. NTX but I think the. CDX are faster.

That's my experience, it would be nice to have more experiences, gunpowder and wheel already invented but you can always improve what already invented.


--------------------------------

[Esp]

Hola, el tema filtros con .dbf es complicado si lo que se quiere obtener es volocidad.
Además de la lentitud del método que emplea SET FILTER hay que agregarle el acceso al disco, la velocidad en si del equipo que corre la aplicación, si se trata de un equipo local o está en red, si es una red local pequeña o una red mas grande ampliada por una VPN, la velocidad de las conexiones de red, si la .dbf está relacionada con otra u otras. .dbf, etc.

Yo he probado filtrar archivos .dbf con 3000 o 10000 registros, comparto algunos puntos.

0 - Si es en ambiente de red, las conexiones deben estar hechas lo mejor posible, evitar saltos con switch y routers, la aplicación tiene que disponer de buen ancho de banda y tener las máximas prioridades de ejecución, el antivirus no tiene ni que saber que dicha aplicación está corriendo, etc. etc.
1 - La .dbf a filtrar NO puede estar relacionada
2 - Si se puede tiene que existir un indice que ayude al SET FILTER, dicho indice creado localmente es más rápido que si está en el servidor, de mas está decir que cuanto mas grande el archivo mas lenta la creación del indice.
3 - Dos opciones que se pueden hacer con la .dbf directo desde el servidor o con una copia a nivel local, en el caso de hacer la copia pasamos usar datos no "reales" ya que la .dbf original puede ir teniendo cambios de los cuales no nos enteraremos.
  a- Si se puede cargar en un array los numeros de registros que cumplan la condición, para esto hay que hacer una recorrida del .dbf (generando una demora) cargando en un array los registros que cumplan la condición del filtro por ejemplo

_vendedor := 1
DBGOTOP()
WHILE !EOF()
      IF vendedor = _vendedor
         AADD(Afiltro,RECNO())
      ENDIF
      DBSKIP()
ENDDO
SET FILTER TO ASCAN(Afiltro,RECNO())#0
DBGOTOP()
TBROWSE()

Si bien se invierte un tiempo en cargar el array con los RECNO() que cumplan la condición, la evalución del SET FILTER es siempre la misma sea el filtro que sea, además es una condición sencilla, está el RECNO() o no en el array, listo.
La limitante es el tamaño del array
En este caso el indice ayuda a agilizar la recorrida antes del filtro, para agilizar la vista luego del fitro se podría tener de antemano un indice por RECNO() y establecer dicho ORDER para la vista con el TBROWSE()
En caso de usar un indice que no ayude al SET FILTER la velocidad es un poco menor.

b - La segunda opción es usar la recorrida cargando el array con los RECNO() que cumplan la condición o no y luego copiar los datos a una nueva .dbf, si se puede en el equipo local.
El TBROWSE() se haría sin filtro alguno y la velocidad sería la normal al browsear una .dbf, que incluso puede estar relacionada ya que no existe filtro alguno enlenteciendo la vista.
Volvemos al problema de no trabajar sobre los datos reales, cualquier cambio no sería tomado en cuenta ni para actualizar la vista ni para actualizar el archivo original, a no ser que se instrumente un proceso de actualización de datos desde el temporal a al archivo real, cosa que no siempre sale bien :-(

Nota: Con respecto a los indices yo uso .NTX pero creo que los .CDX son más rápidos.

Esa es mi experiencia, sería bueno tener mas experiencias, la polvora y la rueda ya están inventadas pero siempre se puede mejorar lo ya inventado.

Fermín              

Rick Lipkin

unread,
Jul 27, 2013, 9:33:59 AM7/27/13
to harbou...@googlegroups.com
Nenad

If you are using DbfCdx .. have a look at Ord.Ch and use scopes .. MUCH faster than using 'Set Filter'

 OrdScope( 0, cCondition ) ; OrdScope( 1, cCondition )


Rick Lipkin

Itamar M. Lins Jr. Lins

unread,
Jul 27, 2013, 9:47:15 AM7/27/13
to harbou...@googlegroups.com
The best option!

el->(OrdScope(0,nil))
el->(OrdScope(1,nil))
el->(DbGoTop())
Do While el->(OrdWildSeek( "*"+cStrFiltro+"*", .t. ))
   If eq->(DbSeek(el->cod_mercad))
      AAdd( aFiltro, {eq->cod_mercad,eq->mercadoria,eq->avista,eq->quantidade} )
   EndIf
EndDo

//Hwgui function
//   hwg_CREATEARLIST(oBrw,aFiltro)

TIA
Itamar M. Lins Jr.



2013/7/27 Rick Lipkin <r1....@live.com>

Francesco Perillo

unread,
Jul 27, 2013, 10:06:58 AM7/27/13
to harbou...@googlegroups.com

All everybody says is interesting but nenad asked for set filter not using indexes

Itamar M. Lins Jr. Lins

unread,
Jul 27, 2013, 10:28:53 AM7/27/13
to harbou...@googlegroups.com
Use of Set Filter for LAN not recommended. Killer application!

Best regards,
Itamar M. Lins Jr.


2013/7/27 Francesco Perillo <fper...@gmail.com>

Nenad Batoćanin

unread,
Jul 27, 2013, 2:12:25 PM7/27/13
to harbou...@googlegroups.com

I use my solution (inherits tbrowse), but I believe that other solutions have a similar problem. Temporary table must have all reviewed records, because if the user jumps the cursor to the record that is not in the tmp table, the program will not know if this record does not meet the requirement, and where to move the cursor.

NB

Nenad Batoćanin

unread,
Jul 27, 2013, 2:15:24 PM7/27/13
to harbou...@googlegroups.com

Obviously an interesting topic :)

I thought about some details. For example, suppose that the main table is:

1  ok
2
3
4 ok
5 ok
6 ok
7
8
9 ok
10
11

Let's say that it is indexed by record number, but that does not matter. Records 1, 4, 5, 6, 9 satisfy the condition, and should be seen in tbrowse. Program uses a temporary table like this (I hope that Google will not spoil this table):

rec          go           prev
-----        -----        ------
1              1              0
2              4  
3              4
4              4              1
5              5              4
6              6              5
7              9
8              9
9              9              6
10            9
11            9

The field REC is the original record number (table has an index on that field). Field GO keeps the correct record number. For example, if the user tries to go on record 5, he remains on same record, because it meets the condition (REC=GO). But if the user tries to jump to a record 7, the cursor is moved to record 9, because it was the first proper record. SKIP operations are now easily implemented. This table is initially empty and is filled as the user moves the cursor up and down.

I think this can be implemented without major changes the main code.

Regards, NB

 

Nenad Batoćanin

unread,
Jul 27, 2013, 6:09:09 PM7/27/13
to harbou...@googlegroups.com, do...@people.net.au
Across a network. I was thinking to use a netio, but there were some problems and did not do it. I'll probably use the ADS, but I'll still have people who will use plain file sharing.

NB

Nenad Batoćanin

unread,
Jul 27, 2013, 6:22:08 PM7/27/13
to harbou...@googlegroups.com
We have a variety of customers: from single user to a network of 50+ stations. The quality of the network is also different, and unfortunately we can not have much influence on him. Perhaps I was not clear: my goal is to speed up the _second_ pass. For example, the user moves the cursor down and the program writes records slowly. But if go back and move the cursor over the same records, the user can not wait again.

regards, NB

Nenad Batoćanin

unread,
Jul 27, 2013, 6:23:17 PM7/27/13
to harbou...@googlegroups.com
I use NTX, but I plan to move on CDX.

NB

Francesco Perillo

unread,
Jul 28, 2013, 3:23:56 AM7/28/13
to harbou...@googlegroups.com

I thought about some details. For example, suppose that the main table is:

1  ok
2
3
4 ok
5 ok
6 ok
7
8
9 ok
10
11

No need to index on recno()... 

Records 1, 4, 5, 6, 9 satisfy the condition, and should be seen in tbrowse. Program uses a temporary table like this (I hope that Google will not spoil this table):


No, the table should be different, with 5 records holding the recno()  of OK records:
SELECTED.dbf:

rec          go
-----        -----
1              1 
2              4  
3              5
4              6 
5              9 


Now, with a set relation from go to MAIN->recno() (it doesn't need an index if using recno()) when TBrowse moves the database pointer it updates also the set related database.

Probably you don't even need to modify TBrowse source code


This table is initially empty and is filled as the user moves the cursor up and down.

You need to decide it the table can be filled async (the use starts browsing while a MT local process or a process on the server using netio rpc) or sync (the user must wait until all the records are selected, again a local process or netio rpc).
In async you have to say to the user somehow that the filter action is completed.



Gmail

unread,
Jul 28, 2013, 6:27:30 AM7/28/13
to harbou...@googlegroups.com
Hello Nenad,
 
    I can’t help much but my suggestions are:
 
    1) Use CDX (for me it is cleanner and nicer)
    2) Ordwildseek() is amazingly fast and easy to use
    3) Scopes are a very good option to try
    4) Create a temporary DBF in memory is very good option too! ( mem: ). Look \contrib\hbmemio
    5) Maybe a combination of all the above?
 
    I would try each option and see what is the best for your case.
    Regards,
 
Qatan

Nenad Batoćanin

unread,
Jul 28, 2013, 11:36:38 AM7/28/13
to harbou...@googlegroups.com
Sorry, it's a little difficult for me to explain this in English :)

I first thought to create a table with the correct records only, but it was not enough. This table must contain all the records because if a user jumps to a random record (which is not "registered" in the TMP), there is no way to quickly determine the action. REC field must have an index for a similar reason: for example, user jumps to a record 1234. The program checks the TMP and quickly determine the correct record. In my example there is no index on main table, but the table can be like this (index on second column):

3 Anta
5 Banta
1 Centa
2 Denta
6 Fenta
4 Zenta

This requires TMP table with REC field. In fact, TMP table is doubly linked list. 

I wrote a small test program that demonstrates how the system works. There are a couple of bugs and did not do all the commands (Ctrl-PgDn), but I think that's enough for illustration :)

NB
test.zipped

Patrick Mast

unread,
Jul 29, 2013, 4:27:26 AM7/29/13
to harbou...@googlegroups.com
Hello,

Look at this blogbost "Tip for faster filters"

Patrick

Gmail

unread,
Jul 29, 2013, 5:29:55 AM7/29/13
to harbou...@googlegroups.com
Hello Patrick,
 
>Look at this blogbost "Tip for faster filters"
 
That’s right.
I believe you know about “Record Recycling”. I remember I read about it on Six drive documentation because it was a problem for it, too.
I also found information about it here in the Holmes page: http://www.ghservices.com/gregh/clipper/trix0007.htm
I hope it will help a bit more...
Regards,
 
Qatan

Gmail

unread,
Jul 29, 2013, 5:33:48 AM7/29/13
to harbou...@googlegroups.com

G3...@aol.com

unread,
Jul 29, 2013, 7:24:09 AM7/29/13
to harbou...@googlegroups.com
Hi All
 
I developed a Clipper based record caching system in the late 1980s to get round these problems and avoid the use of Set Deleted. To make the record appear to be above any valid (non-deleted) records I filled any field that was indexed with a value that was always greater than would be contained in valid records - typically a graphics character like CHR(160) or a fixed future century date. It was then easy to seek for a cached record when a new record was required.
 
All that is then required are two custom functions to "delete" and "add" new records with code to add to the cache or look in the cache respectively. An end of day (last person signing out) or overnight process checked the cache sizes against an application configuration setting and added new records to the cache if it was required. As part of this process I kept all functions of this type which were not directly related to a particular application in a common library used for all applications. An intelligent linker will only link those functions that are used.
 
As I recall the cache setting for best results was typically 10% of the number of valid records on that particular system but that would need reviewing for applications with higher daily transaction counts.
 
Regards

Bob

Bob F Burns G3OOU
C Eng, FIET, MSE
G-QRP 6907
Secretary of Crystal Palace Radio & Electronics Club: www.g3oou.co.uk
Technical web site: www.qsl.net/g3oou
@BobFBurns on Twitter: http://twitter.com/BobFBurns

Nenad Batocanin

unread,
Jul 29, 2013, 10:28:43 PM7/29/13
to harbou...@googlegroups.com

Thanks, this is helpful information. Whether this applies to other Harbour RDD or just RMDBFCDX?

 

But my problem is not related to the deleted records. I need a way to efficiently filter a large group of records from the table.

 

Regards, NB

--

Nenad Batocanin

unread,
Jul 29, 2013, 10:32:09 PM7/29/13
to harbou...@googlegroups.com

Of course, it's always good to hear more opinions and information :)

--

Nenad Batocanin

unread,
Jul 29, 2013, 10:42:23 PM7/29/13
to harbou...@googlegroups.com

I must say that all the solutions we're talking about is only part of the final solution. The solution is simple data server that could execute operations such as filtering, indexing, etc.. There is ADS, but it is very expensive and I have the impression that less attention is paid to DBF support. I think a simple, efficient and cost effective data server (eg, based on NetIO) was hit in the center of the target.

 

Regards, NB

 

From: harbou...@googlegroups.com [mailto:harbou...@googlegroups.com] On Behalf Of G3...@aol.com
Sent: Monday, July 29, 2013 1:24 PM
To: harbou...@googlegroups.com
Subject: Re: [harbour-users] Re: SET FILTER speed up

 

Hi All

Mario H. Sabado

unread,
Jul 29, 2013, 11:27:42 PM7/29/13
to harbou...@googlegroups.com
Hi Nenad,

How about the LetoDB?  Have you tried looking its features? 

http://sourceforge.net/projects/letodb/

Regards,
Mario
--

G3...@aol.com

unread,
Jul 30, 2013, 4:18:38 AM7/30/13
to harbou...@googlegroups.com
In a message dated 30/07/2013 02:28:42 GMT Standard Time, nbato...@wings.rs writes:

But my problem is not related to the deleted records. I need a way to efficiently filter a large group of records from the table.

=============================================
Hi All
 
Regardless of the application programming platform, any processing of high volumes of multi-user data needs to avoid transporting that data across a network which will always have unpredictable speed and response times. You are therefore left with doing the data processing on the server which should be as fast as economically possible.
 
Is the above requirement a fixed filter definition just requiring the index to be updated as transactions occur or similar to a report generator where the definition may change very often, requiring a complete rebuild of the index in real time each time? The latter is the most demanding and I would certainly do that with the data, index(es) and the filter program located on the server to avoid network transport issues of any sort.
 
On a recent data analysis contract, I used MS-Access on a networked PC to undertake analyses of a legacy database with 5 million records on a server. The best I could manage over the network was around 1000-2000 records per second whereas running the same analyses programs directly on the server using a remote desktop connection gave me a minimum 5 times speed improvement.

Nenad Batocanin

unread,
Aug 3, 2013, 12:10:13 PM8/3/13
to harbou...@googlegroups.com

LetoDB is interesting, but I read some posts here about some bugs. Also, I do not see any work on it, and it was a bit disturbing. I saw some Russian forums that have a lot of messages about LetoDB, but I have not yet read.

 

Of course, any additional experience are welcome.

 

NB

Reply all
Reply to author
Forward
0 new messages