Is FLEx suitable for building a text corpus?

43 views
Skip to first unread message

MKR LAB

unread,
Apr 16, 2017, 6:22:30 AM4/16/17
to FLEx list
Hi,

I'm working on building a text corpus of about 156k sentences, but there is one problem. The loading is quite slow and when navigating from one tab to another is a real pain. Is there any solution to this? Or is there any better software for working on text corpus? I'd like to have smoother experience in navigating and loading data.

Thanks,
Makara

Paul Nelson

unread,
Apr 16, 2017, 6:41:46 AM4/16/17
to flex...@googlegroups.com
What version of FieldWorks are you using?

Happy Easter!
Paul

--
You are subscribed to the publicly accessible group "FLEx list".
Only members can post but anyone can view messages on the website.
To change your status, please write to flex_d...@sil.org.
You can join this group by going to http://groups.google.com/group/flex-list.
---
You received this message because you are subscribed to the Google Groups "FLEx list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+unsubscribe@googlegroups.com.
To post to this group, send email to flex...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/flex-list/99d62ef2-eb52-454f-bfcd-338910d262aa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jonathan Dailey

unread,
Apr 17, 2017, 12:07:59 PM4/17/17
to FLEx List
Try upgrading to the newest release candidate. 

On Sun, Apr 16, 2017 at 5:22 AM, MKR LAB <mkr...@gmail.com> wrote:

--
You are subscribed to the publicly accessible group "FLEx list".
Only members can post but anyone can view messages on the website.
To change your status, please write to flex_d...@sil.org.
You can join this group by going to http://groups.google.com/group/flex-list.
---
You received this message because you are subscribed to the Google Groups "FLEx list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+unsubscribe@googlegroups.com.
To post to this group, send email to flex...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/flex-list/99d62ef2-eb52-454f-bfcd-338910d262aa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
SIL International
Language Technology Consultant

MKR LAB

unread,
Apr 17, 2017, 12:12:20 PM4/17/17
to FLEx list

Please see the screenshot above. I've also tried to 'check for update', but it says mine is the latest version.

Makara


On Sunday, April 16, 2017 at 5:41:46 PM UTC+7, Paul Nelson wrote:
What version of FieldWorks are you using?

Happy Easter!
Paul
On Sun, Apr 16, 2017 at 5:22 AM, MKR LAB <mkr...@gmail.com> wrote:
Hi,

I'm working on building a text corpus of about 156k sentences, but there is one problem. The loading is quite slow and when navigating from one tab to another is a real pain. Is there any solution to this? Or is there any better software for working on text corpus? I'd like to have smoother experience in navigating and loading data.

Thanks,
Makara

--
You are subscribed to the publicly accessible group "FLEx list".
Only members can post but anyone can view messages on the website.
To change your status, please write to flex_d...@sil.org.
You can join this group by going to http://groups.google.com/group/flex-list.
---
You received this message because you are subscribed to the Google Groups "FLEx list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+...@googlegroups.com.

Jonathan Dailey

unread,
Apr 17, 2017, 12:17:12 PM4/17/17
to FLEx List
updates in flex do not work that way.  Go to http://software.sil.org/fieldworks/ and download the most recent version.

To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+unsubscribe@googlegroups.com.

To post to this group, send email to flex...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

MKR LAB

unread,
Apr 17, 2017, 12:31:44 PM4/17/17
to FLEx list
I see, but why 'Check for update...' is on the 'Help' menu? If it doesn't work as expected, then it should be removed; otherwise after seeing the current version is up to date, one may not be aware that they are having an old version.

Makara

Jonathan Dailey

unread,
Apr 17, 2017, 12:36:55 PM4/17/17
to FLEx List
I agree.

To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+unsubscribe@googlegroups.com.

To post to this group, send email to flex...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Ann Bush

unread,
Apr 17, 2017, 1:11:40 PM4/17/17
to flex...@googlegroups.com

It was removed at a later version than the one you are on.  It never worked.  One of our team members implemented this then left our group.

 

From: flex...@googlegroups.com [mailto:flex...@googlegroups.com] On Behalf Of MKR LAB
Sent: Monday, April 17, 2017 11:32 AM
To: FLEx list <flex...@googlegroups.com>
Subject: Re: [FLEx] Is FLEx suitable for building a text corpus?

 

I see, but why 'Check for update...' is on the 'Help' menu? If it doesn't work as expected, then it should be removed; otherwise after seeing the current version is up to date, one may not be aware that they are having an old version.

 

Makara

On Monday, April 17, 2017 at 11:17:12 PM UTC+7, Jonathan Dailey wrote:

updates in flex do not work that way.  Go to http://software.sil.org/fieldworks/ and download the most recent version.

On Mon, Apr 17, 2017 at 11:12 AM, MKR LAB <mkr...@gmail.com> wrote:

Image removed by sender.

image001.jpg

MKR LAB

unread,
Apr 17, 2017, 7:32:58 PM4/17/17
to FLEx list
I got that. Thanks a lot! :)

Aaron Broadwell

unread,
Apr 19, 2017, 1:07:28 PM4/19/17
to FLEx list
To return to the original question about slow loading -- my experience is that long texts do not work very well in FLEx.  The program functions slowly with them and the user experience is not so good.

The way our corpus works is that a long text like a book is broken into units that correspond to about one or two pages.  For example, we have a book Movilla 1635 that is about 240 folia long.  In our FLEx corpus, it corresponds to about 140 texts, and each one loads quickly and smoothly.
To search for things across the whole book, we use the Genre function.  We created a new genre titled Movilla 1635 and each text is marked as coming from this genre. (Genre here stands in for book, really...)

To search across all the texts in this book/genre, we use the Choose texts option to select Movilla 1635.

This method is working smoothly for several fairly large corpora (75,000 -- 300,000 words).  I am not sure if it will completely scale up to corpora that are several times large than this or not.  Possibly other FLEx users have some larger corpora?

Mike Aubrey

unread,
Apr 19, 2017, 1:43:17 PM4/19/17
to flex...@googlegroups.com

What you're describing is exactly what I've been doing for about six or seven years with my corpus. My corpus is about the same as what you're working with right now at about 100,000, with individual 'texts' being about 2,000 to 3,000 words each.


I'd like to get to around ~2,000,000 words in time, but that's a ways off.


Mike


From: flex...@googlegroups.com <flex...@googlegroups.com> on behalf of Aaron Broadwell <g.bro...@gmail.com>
Sent: Wednesday, April 19, 2017 12:07:28 PM
To: FLEx list
Subject: [FLEx] Re: Is FLEx suitable for building a text corpus?
 
--
You are subscribed to the publicly accessible group "FLEx list".
Only members can post but anyone can view messages on the website.
To change your status, please write to flex_d...@sil.org.
You can join this group by going to http://groups.google.com/group/flex-list.
---
You received this message because you are subscribed to the Google Groups "FLEx list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+...@googlegroups.com.
To post to this group, send email to flex...@googlegroups.com.

KenK

unread,
Apr 19, 2017, 1:51:33 PM4/19/17
to flex...@googlegroups.com
Hi,

I have experienced that same slowness in loading, especially with long legal texts, such as the constitution, criminal, the tax, labor, and immigration codices.Sounds like you have found a work-around. I guess the answer is to break up the text into chunks.

Does anyone know how big the chunks can be before the slowing occurs? So, is 2k-3k the ideal?

Ken

For more options, visit https://groups.google.com/d/optout.


-- 

Allan Johnson

unread,
Apr 20, 2017, 4:23:07 PM4/20/17
to flex...@googlegroups.com
Hi Ken,

I worked through this question for a project once. Oddly I'm not able to find the project to refresh my memory on what text size I settled on. The preference may differ somewhat for different users or different computers, depending on how powerful the computer is and how much slowdown you can live with. The size I chose was based on kilobytes rather than the number of words. I'm not sure of this part, but if I'm remembering correctly, a 50kb size showed a noticable but tolerable amount of slowdown, and I chose to avoid anything bigger than that. A 20kb size worked nicely with no noticeable slowdown. I may have settled on 20kb for that project. That would mean the size of the original baseline texts which were then pasted into FLEx. 

That would likely represent a good number of words - maybe roughly equivalent to the 2000 to 3000 word length previously suggested here, if the average word length was 7-10 bytes. 

Note that the number of bytes may not be equivalent to the number of alphabetic characters. 

With Unicode there can be 2-3 bytes in one alphabetic character.

I think the more consistent measure as far as gauging how much slowdown to expect in FLEx is probably the size in kb rather than the number of words. I might be wrong on that, though.

Allan




To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+unsubscribe@googlegroups.com.

To post to this group, send email to flex...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/flex-list/011d7562-1a13-4d0f-b5f3-d070d9013f21%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You are subscribed to the publicly accessible group "FLEx list".
Only members can post but anyone can view messages on the website.
To change your status, please write to flex_d...@sil.org.
You can join this group by going to http://groups.google.com/group/flex-list.
---
You received this message because you are subscribed to the Google Groups "FLEx list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+unsubscribe@googlegroups.com.


-- 

--
You are subscribed to the publicly accessible group "FLEx list".
Only members can post but anyone can view messages on the website.
To change your status, please write to flex_d...@sil.org.
You can join this group by going to http://groups.google.com/group/flex-list.
---
You received this message because you are subscribed to the Google Groups "FLEx list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+unsubscribe@googlegroups.com.

To post to this group, send email to flex...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages