Virtualtext crash

3 views
Skip to first unread message

Alexey Pechnikov

unread,
Dec 10, 2009, 10:10:51 AM12/10/09
to SpatiaLite Users
$ uname -a
Linux ... 2.6.26-2-686-bigmem #1 SMP Fri Aug 14 01:52:30 UTC 2009 i686
GNU/Linux

$ ls -lh /tmp/access.log.1
-rw-rw-rw- 1 root root 539M 2009-12-10 18:01 /tmp/access.log.1

sqlite> create virtual table test using VirtualText ('/tmp/access.log.
1','utf8','','.','"',' ');
sqlite3: gconv.c:75: __gconv: Assertion `outbuf != ((void *)0) &&
*outbuf != ((void *)0)' failed.



a.furieri

unread,
Dec 15, 2009, 1:59:15 PM12/15/09
to SpatiaLite Users
Hi Alexey,

thanks a lot for noticing this.

I really love you brilliant idea: querying
by SQL the Apache logfile is very nice
[and useful] :-)

during my post-mortem analysis I realized that
VirtualText has lots and lots of really *awful*
traps in its current implementation:

a) the text file is firstly read and entirely
CACHED IN RAM :-(
this is good for small-sized files, but is
completely unsustainable for huge-sized files
[as the one you are using in your test]

b) this way we have lots of small (heavily fragmented)
dynamic memory allocations: again, the performance
impact is good for small files, but is absolutely
negative for huge files.

c) worst of all, insufficient memory conditions
where completely mishandled.

so standing things, the current VirtualText
implementation is perfectly able to access
any text file not exceeding some tenth MB

but is completely useless while attempting
to access any file exceeding 100MB.

----
conclusion: I'll try to re-implement from scratch
the VirtualText access logic, in order to offer
a decent and robust support for huge-sized files
as well, not only for the smallest ones.
----

BTW, I've found some errors in your sample query
the correct syntax is:

CREATE VIRTUAL TABLE test USING VirtualText ('access_log.1',
'UTF-8',0,'.','"',' ');


bye,
Sandro

a.furieri

unread,
Dec 17, 2009, 1:49:18 PM12/17/09
to SpatiaLite Users
Hi Alexey,

I've just finished a complete re-writing
for VirtualText:
- now caching in-memory row-pointers, and
not full row-data as before
- very low memory footprint
- successfully tested using an huge
10millions-rows test file / 1.8GB
- not exactly as fast as the lightning
when accessing so huge files, but
still usable
- performance impact for more realistically
sized files (50MB) is quite unnoticeable

already in SVN, to be publicly released in 2.4.0-RC-3

bye,
Sandro

Reply all
Reply to author
Forward
0 new messages