How I can search for a text string in a text file (TXT)?

388 views
Skip to first unread message

Guillermo Varona Silupú

unread,
Nov 9, 2012, 4:17:17 PM11/9/12
to harbou...@googlegroups.com
Hello:
How I can search for a text string in a text file (TXT)?

TIA

BestRegards
GVS

Giovanni Di Maria

unread,
Nov 9, 2012, 4:30:43 PM11/9/12
to harbou...@googlegroups.com
You have to load the TXT file in memory:
- memoread
- hb_memoread
.......

There are many solutions:

- "xx" $ cString
- At( "cde", "abcsefgfedcba" )
- Rat( "cde", "abcsefgfedcba" )
- ATNum()
- BEFORATNUM()
- FT_AT2()
......

   Giovanni Di Maria

Guillermo Varona Silupú

unread,
Nov 9, 2012, 5:34:10 PM11/9/12
to harbou...@googlegroups.com
Hello Giovanni:

Thanks for your answer.
Is there a limit to the file size?

TIA

BestRegards
GVS
--
You received this message because you are subscribed to the Google
Groups "Harbour Users" group.
Unsubscribe: harbour-user...@googlegroups.com
Web: http://groups.google.com/group/harbour-users

Daniele Campagna

unread,
Nov 9, 2012, 8:03:08 PM11/9/12
to harbou...@googlegroups.com, Guillermo Varona Silupú
In data 09 novembre 2012 alle ore 22:17:17, Guillermo Varona Silupú
<gvar...@hotmail.com> ha scritto:
hi,
in the GOTOC (Good Old Times Of Clipper) I had the very same problem.
I imposed the following guideline:
- no RAM-dependent solution ( no matter how big the txt file and the
memory limits)

I know that now with Harbour you could load a couple of GigaBytes of a txt
file directly in RAM an then parse it, but no programmer coming from the
past can even conceive such a obscenity. :-)
so my solution was:
- an external loop reading a chunk of txt file. (fopen, fread) This was
the reading buffer.
- another loop parsing the buffer and locating the delimiters of a
"paragraph" (CR+LF or LF, OS dependent)
If the end-of-paragraph was found, pass the "paragraph" to the parser, else
- read another chunk and repeat.
else if, subtract the "paragraph" from the buffer, save the remaining of
the buffer and read another chunk (if EOF not reached).
Re-parse the buffer etc.

The parser of the "paragraph" BTW in some implementation then located
"words", (character strings between spaces, commas, periods etc.) and made
further evaluations on the "words" extracted, in other implementations
looked for keywords or substrings etc. In another one replaced characters
translating them... etc.

I used such a schema to read a variety of files ranging from EBCDIC files
from mainframes (with conversion routines EBCDIC->OEM) to any other file
you can imagine.

It is not very easy, eh. Anyway I remember I started from examples in
Clipper example programs, IIRC copyfile.prg or something similar.
So before posting tons of old and bad-written code the question is: What
do you need exactly?
Dan

Giovanni Di Maria

unread,
Nov 9, 2012, 10:52:41 PM11/9/12
to harbou...@googlegroups.com
Hi.
I have open text files > 40 Mb with NO problem.
Giovanni

P.S
Also limits of arrays is very high ( for example: aName[8000000] ).

----------------------------------------------

do...@people.net.au

unread,
Nov 10, 2012, 1:33:07 AM11/10/12
to harbou...@googlegroups.com
Hi Daniele

I think you may be being overly cautious and over complicating things in times when the operating system can happily page to disk if necessary.  You can happily process files that exceed the size of available RAM and the OS will almost certainly do a more efficient job than we can.

You may also introduce additional problems apart from having to write and test fairly complicated code.  Paragraph boundaries may vary in nature (Windows versus Linux versus MemoEdit() for starters) and the more such "cutting up" of the text you do the more restrictions you are placing on what you can search for, eg if I want to search for some text plus two line feeds plus some more text?

There is never such a thing as no limit - even operating systems impose limits including maximum file sizes and those limits will vary dependant upon hardware, OS version and file system chosen for the device.  Lets say you think maximum file size will be 100Mb.  You could consider testing well beyond this  say to 250Mb and maybe if you want to be fairly certain test files for size in your program before processing and issue a warning message if they exceed that size.

Regards
Doug



On Sat 10/11/12 02:03 , "Daniele Campagna" cyber...@tiscalinet.it sent:
In data 09 novembre 2012 alle ore 22:17:17, Guillermo Varona Silupú
<gvar...@hotmail.com> ha scritto:

> Hello:
> How I can search for a text string in a text file (TXT)?
>
> TIA
>
> BestRegards
> GVS
>


--
You received this message because you are subscribed to the Google
Groups "Harbour Users" group.
Unsubscribe: harbour-users+unsub...@googlegroups.com
Web: http://groups.google.com/group/harbour-users">http://groups.google.com/group/harbour-users

Daniele Campagna

unread,
Nov 12, 2012, 9:59:41 AM11/12/12
to harbou...@googlegroups.com
In data 10 novembre 2012 alle ore 03:48:51, Guillermo Varona Silupú
<gvar...@hotmail.com> ha scritto:


>
> Hi Dan:
> Thank you very much for your answer.
> What I need is to look for a specific string in a txt file

I use something like this:

function main()
public eol, ln:=""
if upper(os())="LINUX"
EOL=chr(10)
else
eol=chr(13)+chr(10)
endif
hdl:=fopen("txtfile.txt")
do while readline(hdl,@ln,eol)=0
? ln
[do whatever you want with ln (=a line of text)]
if <mysearchstring>$ln
...etc
endif

enddo
return nil

function readline(nHdl,linea,eol)
local RetVal:=0,lEnd:=.f.,z
static nbyte:=1
byte:=" "
linea=""
do while .t.
z:=fread(nHdl,@byte,1)
if z=0
if len(linea)=0
retval:=-1
endif
exit
endif
linea=linea+byte
if right(linea,len(eol))=eol
linea=strtran(linea,eol,"")
exit
endif
enddo
return RetVal

HTH
salu2
Dan

Guillermo Varona Silupú

unread,
Nov 12, 2012, 10:22:51 AM11/12/12
to harbou...@googlegroups.com
Thanks Dan, prove it.

BestRegards
GVS

Klas Engwall

unread,
Nov 12, 2012, 5:30:44 PM11/12/12
to harbou...@googlegroups.com
Hi Daniele,

> function main()
> public eol, ln:=""
> if upper(os())="LINUX"
> EOL=chr(10)
> else
> eol=chr(13)+chr(10)
> endif
> hdl:=fopen("txtfile.txt")
> do while readline(hdl,@ln,eol)=0
> ? ln

Why make <ln> public? You are only passing the address of its memory
location, not the variable name, to readline() anyway.

And why make <eol> public? And why not use the hb_eol() function instead
of the os() + manual assignment trick?

... not to mention the ambiguous single equal sign :-)

Regards,
Klas

Daniele Campagna

unread,
Nov 13, 2012, 4:23:46 AM11/13/12
to harbou...@googlegroups.com, Klas Engwall
In data 12 novembre 2012 alle ore 23:30:44, Klas Engwall
<har...@engwall.com> ha scritto:

> Hi Daniele,
>
>> function main()
>> public eol, ln:=""
>> if upper(os())="LINUX"
>> EOL=chr(10)
>> else
>> eol=chr(13)+chr(10)
>> endif
>> hdl:=fopen("txtfile.txt")
>> do while readline(hdl,@ln,eol)=0
>> ? ln
>
> Why make <ln> public? You are only passing the address of its memory
> location, not the variable name, to readline() anyway.

This is old stuff, I used then ln in child functions and not always I
passed ln as a parameter. Moreover, I was not sure about passing static
vars for reference. I think I was just lazy.
>
> And why make <eol> public? And why not use the hb_eol() function instead
> of the os() + manual assignment trick?

You are right, of course. When I wrote that, there was no hb_eol()
methink...
eol is public so it is evalued once for all, otherwise you have to call it
every time readline() is called! Another option was a #define.

>
> ... not to mention the ambiguous single equal sign :-)

Oh well...

>
> Regards,
> Klas
>

Regards, Dan ;-)
--
Creato con il rivoluzionario client email di Opera:
http://www.opera.com/mail/

Guillermo Varona Silupú

unread,
Nov 10, 2012, 1:36:34 PM11/10/12
to harbou...@googlegroups.com
Ok.
TVM

BestRegards
GVS
Reply all
Reply to author
Forward
0 new messages