Max File Limit

62 views
Skip to first unread message

Jason Wolfe

unread,
Nov 29, 2014, 9:13:38 AM11/29/14
to antword...@googlegroups.com
I was wondering if there is a max file limit for Antword? I was just trying to run it on about 1900 txt files and it wasn't working.

I haven't tried much troubleshooting yet, just thought id ask.

Thanks!

Jason Wolfe

Laurence Anthony

unread,
Nov 29, 2014, 9:20:28 AM11/29/14
to antword...@googlegroups.com
Hi Jason,

Welcome to the discussion group!

AWP (AntWordProfiler) stores all information in a database on disk so there really isn't any limit on the file size or number. But, you need to use UTF-8 encoded files. I suspect you aren't. Try profiling a simple "this is a test" text file, which is in the ASCII (a subset of UTF-8) to make sure everything is running normally. 

A new tool I made called EncodeAnt can take a set of files and convert them all to UTF-8 in a batch process. It's on my website software page: http://www.laurenceanthony.net/software.html. It only runs on Windows at the moment, though.

Laurence.



###############################################################
Laurence ANTHONY, Ph.D.
Professor
Center for English Language Education in Science and Engineering (CELESE)
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################


Jason Wolfe

--
You received this message because you are subscribed to the Google Groups "AntWordProfiler-Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antwordprofil...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jason Wolfe

unread,
Nov 29, 2014, 9:00:24 PM11/29/14
to antword...@googlegroups.com
Hi Laurence

Thanks for adding me. Glad to be here! I just finished the CL MOOC at Lancaster and got to know your software better.

When I run a test set of 10-50 files, it seems to work ok. Perhaps I am not waiting long enough?

I thought plain text .txt files were ASCII. Aren't they?

I attached one as an example. I know there were some issues with " " marks in the title of the file. I changed, am changing that now.

Jason
5Chris Bangle_Great cars are great art.txt

Laurence Anthony

unread,
Nov 30, 2014, 4:17:45 AM11/30/14
to antword...@googlegroups.com
Hi,

A text file is not automatically an ASCII file. ASCII is just one of many encodings. It was developed in the 1960s and can only encode English and a few extra characters. UTF-8 is a Unicode encoding that can encode all letters of the world's languages.

Your file was UTF-8 encoded and ran fine in AntWordProfiler.

Laurence.


###############################################################
Laurence ANTHONY, Ph.D.
Professor
Center for English Language Education in Science and Engineering (CELESE)
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################

Jason Wolfe

unread,
Nov 30, 2014, 4:59:31 AM11/30/14
to antword...@googlegroups.com
Thanks Laurence

If there were some files in 1900 txt files that were not ASCII could that mess with the whole process?

Jason

Laurence Anthony

unread,
Nov 30, 2014, 5:46:47 AM11/30/14
to antword...@googlegroups.com
Yes. Did you try the EncodeAnt program that I suggested?

###############################################################
Laurence ANTHONY, Ph.D.
Professor
Center for English Language Education in Science and Engineering (CELESE)
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################

Jason Wolfe

unread,
Nov 30, 2014, 5:52:23 AM11/30/14
to antword...@googlegroups.com
I have not...we run a strict Mac household! I will do it on Monday at work.

Thanks for your help.

Jason

Jason Wolfe

unread,
Dec 1, 2014, 7:15:04 AM12/1/14
to antword...@googlegroups.com
Hello Anthony

so when I load first ~1000 and second ~1000 text file seperately it processes them. but when I try to do all of them in one go, it give me a "DBD SQL" error.

Screen shot attached with error warning and txt with comeplete error report in it.

Is it something I am doing? Is it my files?

Jason
AWP Error Grab.tiff

Laurence Anthony

unread,
Dec 1, 2014, 8:08:53 AM12/1/14
to antword...@googlegroups.com
Hi Jason,

Hmm... looks like a problem with the data insertion into the database. The line number is there so I'll have a look. Nobody has reported this before, so I'm wondering if there is something in your file that might be causing this problem, but as you say, when you split up the files into 1000 chunks, it works OK.

It the number 1000 relevant. Does it work on 999 files but then dies on 1000 files?

Laurence.



###############################################################
Laurence ANTHONY, Ph.D.
Professor
Center for English Language Education in Science and Engineering (CELESE)
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################

Jason Wolfe

unread,
Dec 1, 2014, 8:19:07 AM12/1/14
to antword...@googlegroups.com
Hi Anthony

The number wasn't that important. I thought it could be my files...it sent the error between 1000 and 1500, so the first 1000 was ok,the first 1500 sent the error. First 1400 sent the error, the first 1300 sent the error. Then I looked at files 1000 to 1300 (about 300) it worked fine. So I think the files are clean. 

Before all this I ran the first 1000 in 100 file chunks and it was all good. 

I'm on a 2011 MacBook Pro with all the latest updates on Mavericks. 

I can send you the files if you want. It's about 20 MB. 

Thanks for all your help. 

Jason



You received this message because you are subscribed to a topic in the Google Groups "AntWordProfiler-Discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/antwordprofiler/5_EiOs_gpi0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to antwordprofil...@googlegroups.com.

Laurence Anthony

unread,
Dec 1, 2014, 8:23:49 AM12/1/14
to antword...@googlegroups.com
Hi Jason,

If you can send me the files, that would help me to immediately identify the problem. If you zip them up, they should become small enough to send via email.

Were you simply running the standard analysis with the built in reference corpus files?

If you do send me the files, please use my email address (not this discussion group address).

Laurence.

###############################################################
Laurence ANTHONY, Ph.D.
Professor
Center for English Language Education in Science and Engineering (CELESE)
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################

Reply all
Reply to author
Forward
0 new messages