Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

STDumper (Initial version released.)

0 views
Skip to first unread message

Gazza

unread,
Oct 2, 2009, 3:59:15 PM10/2/09
to
Now that I've hooked up preliminary SmartWord filtering into the main
Filter utility, I think I'm about ready to make a first public
release. There are still a few issues to iron out and these are listed
below...

Filter is dropping words...
This seems to happen when the word in question spans two "lines". (I
define lines here as 255 characters.) What I'm doing at the moment is
to read in a line from the input file at a time and split this into
words, which I then check against the database. If it comes back as
found, then I write this to the output line and continue to the next
word. If not found, then it enters it into the unknowns list to be
written to a file when the program finishes with the document. What I
think I should be doing is checking if the word is in the dictionary,
if not, then are we at the end of the line? If we are then store it
and move on to the beginning of the next line. When we see a word
which isn't in the dictionary and we're beginning a new line, then we
add the current word to the end of the stored word to see if we get a
match. I've tried variations of this a couple of times now, with
different, but un-desired results.

AddWords single word logging...
Still haven't got anywhere with this long standing bug. A full
description of this one is in the !ReadMe file distributed with the
archive.

You can download what I have so far from http://www.garethlock.com/acorn/stdumper/stdump.zip

Gazza

unread,
Oct 10, 2009, 9:43:14 PM10/10/09
to
On Oct 2, 8:59 pm, Gazza <use...@garethlock.com> wrote:
> You can download what I have so far fromhttp://www.garethlock.com/acorn/stdumper/stdump.zip

Done a little optimisation here and there... Managed to shave 2k off
the SmartWord API (Libs.SmartW). Also got word counts for each start
letter right aligned properly in the database statistics report
options from within SWAdmin.

Gazza

unread,
Nov 3, 2009, 9:47:44 AM11/3/09
to
New release available...

This seems to have a few more bugs introduced. For some reason, the
output produced by Filter when SmartWord filtering is turned on loses
all spaces between words. I have no idea why as I haven't made any
changes to this. The main part of the update has been to expand the
dictionary to over 5000 words. Still a long way to go, but the program
is now at a stage where it can scan a document and produce a list of
unknowns from it. This is then manually tidied up and inserted into
the database. Yes... I do put each of the test documents through a
spell-checker BEFORE I stick them through this, so all words are
correctly spelt.

As usual. You can find the latest update at http://www.garethlock.com/acorn/stdumper/stdump.zip

Gazza

unread,
Nov 6, 2009, 12:23:44 PM11/6/09
to
Made a few changes to the way that the tools initialised libraries.
Added a feature to the SWAdmin tool that allows the user to save a
dump of the brief report to the root directory of the database.
Cleaned a few other things up and included LibASH in preparation for a
re-write of the SmartWord parsing feature inside the Filter tool.

Hopefully by using LibASH blocks, rather than BASIC strings, I can get
around the 255 character limit that's causing words to split between
lines on occasion.

Anyhow... For those of you that are following, the latest download is
at...

http://www.garethlock.com/acorn/stdumper/stdump.zip

0 new messages