filtering data

36 views
Skip to first unread message

Alan Vogel

unread,
Oct 29, 2017, 6:39:12 AM10/29/17
to flex...@googlegroups.com
I would like to know if there is a way to filter data based on a different field than the field that the original display is based on. I have a concordance display based on a string that is located in the Note field. Now that I have the list of sentences that have this string in the Note field, I want to filter these sentences according to a string that is located in the Baseline field. Is there a way of doing this?

Alternatively, if there were a way of exporting the data, that would help, too. That is, once I have the list of sentences based on a string in the Note field, if I could export the sentences contained in the list, I could then search in the resulting database for another string that I am interested in. Alan

Ron Moe

unread,
Oct 31, 2017, 4:04:48 PM10/31/17
to flex...@googlegroups.com
Hi Alan,
I can't answer your question about the functionality of FLEx. But I would like to address the need, at least what I assume is your need. Most of my time these days is devoted to lexical analysis. I'm using a text corpus, but not in FLEx. I would like to use FLEx to interlinearize the text corpus, but I don't see how the interlinearized texts would enable me to do what I need to do. FLEx has the parsing/interlinearizing function and the text charting function. Both are very useful for certain kinds of things, but not for lexical (semantic) analysis.

I find that most of my time is spent in sorting example sentences. But in order to sort the sentences, I have to annotate them in some way, or at least identify some significant factor in the sentences. For instance one significant factor is the syntactic structure. Usually a text corpus is tagged. But the tags are attached to words. A good corpus query system will enable the user to search for sequences of tags. For instance if the words are tagged for the lexeme form (the technical term is "lemmatized"), then you can search for a sequence of lemmas (e.g. "run + up + a + bill" will return "ran up a bill"). If the words are tagged for grammatical category, then you can search for a sequence of  grammatical categories (e.g. "v + prep + art + n" will return "ran up a bill"). A good corpus query system will allow you to mix and match these thing (e.g. "run + prep + art + n" will return "run up the flag").

But there are limits to this sort of system. One problem is that the user needs to know what to look for in order to search intelligently. None of the queries above will return "ran up a big bill". So you need wild cards and the ability to search within x number of words ("run up +3 n"). But even these functions won't enable me to do what I want to do. I have two needs. 1. I want to analyze and sort all the sentences. To do that I would have to search for every conceivable pattern. Obviously that is not a practical solution. 2. I want to analyze and sort the sentences by clause structure, such as syntactic structure or semantic case frame. For these types of things word level tags don't work. You have to annotate (or mark up) each sentence and then sort the data by the annotation.

A good corpus query system also looks at the text surrounding the key word and counts the collocates. A good system ranks the collocates and returns a list of collocates ranked by frequency. But I also want to sort the sentences by collocate and group the collocates by semantic domain, grammatical relationship to the key word, etc. I don't just want a list of collocates. I want to study how the collocates interact with the key word to disambiguate senses, create meaning, etc.

So my problem is that I have to scrutinize my concordance examples, look for significant features in the context, mark up and/or annotate the sentences, and then sort the sentences by each significant feature. By that point I usually have a pretty good idea of what senses the word has, so I also have to sort by sense. I don't know of any easy or efficient way to analyze and tag the sentences. But a computer could save me a lot of time by sorting the data, especially sorting on multiple parameters.

So my need is for a database that enables me to extract a concordance of a word, creatively tag (or otherwise mark up) the example sentences, and then sort the sentences by each type of tag or a combination of tags. This sounds like what you are wanting to do. So one of the questions I am asking myself is, Is there a program out there that would enable me to do this? I actually did something very similar to this 30 years ago using the SIL Edit program and some other early programs. When Shoebox came out I put the database into Shoebox. But I had to create new fields for every type of tag and then type the contents of the tag. It was very time consuming. But when I was done I could print out a variety of reports. For instance I printed a thick stack of paper with all the syntactic structures in my text corpus. I also printed a concordance of all the lexemes. But this is chicken feed compared to what we can do today and what I would like to be able to do.

Does anyone have any suggestions? I don't know if the FLEx team has sufficient resources to prioritize this functionality.
Ron Moe

On Sun, Oct 29, 2017 at 2:39 AM, Alan Vogel <alan_...@sil.org> wrote:
I would like to know if there is a way to filter data based on a different field than the field that the original display is based on. I have a concordance display based on a string that is located in the Note field. Now that I have the list of sentences that have this string in the Note field, I want to filter these sentences according to a string that is located in the Baseline field. Is there a way of doing this?

Alternatively, if there were a way of exporting the data, that would help, too. That is, once I have the list of sentences based on a string in the Note field, if I could export the sentences contained in the list, I could then search in the resulting database for another string that I am interested in. Alan

--
You are subscribed to the publicly accessible group "FLEx list".
Only members can post but anyone can view messages on the website.
To change your status, please write to flex_d...@sil.org.
You can join this group by going to http://groups.google.com/group/flex-list.
---
You received this message because you are subscribed to the Google Groups "FLEx list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+unsubscribe@googlegroups.com.
To post to this group, send email to flex...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/flex-list/8678d6e1-39e5-5e45-b09a-f373c36e6bd9%40sil.org.
For more options, visit https://groups.google.com/d/optout.

Leaders, Marlin

unread,
Nov 1, 2017, 11:47:33 AM11/1/17
to flex...@googlegroups.com

Ron,

You probably already thought of this, but have you thought of creating a second FLEx project with your Example sentences as the lexeme form? Then you can use Semantic Domains to categorize your sentences semantically. Create grammatical categories of each combination of v +prep+art+n you want for the sentences. Then you could filter and sort to mix and match either the sentences (lexeme) or the grammar category tags of the sentence. Plus you get the advantage of the bulk edit tools in the lexical area.

Then if you put your texts in the Texts area, you would have context of the whole sentence.

Anyway, just thinking outside the box by using the box we’ve got.

Marlin

--

To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+...@googlegroups.com.


To post to this group, send email to flex...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/flex-list/8678d6e1-39e5-5e45-b09a-f373c36e6bd9%40sil.org.
For more options, visit https://groups.google.com/d/optout.

 

--

You are subscribed to the publicly accessible group "FLEx list".
Only members can post but anyone can view messages on the website.
To change your status, please write to flex_d...@sil.org.
You can join this group by going to http://groups.google.com/group/flex-list.
---
You received this message because you are subscribed to the Google Groups "FLEx list" group.

To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+...@googlegroups.com.


To post to this group, send email to flex...@googlegroups.com.

Ron Moe

unread,
Nov 1, 2017, 2:09:37 PM11/1/17
to flex...@googlegroups.com
I had not thought of that. It is an intriguing idea. However I'm still stuck on a couple of methodological issues. I'm afraid I will have to explain what I do, what I want to be able to do, how I envision doing it, and what kind of computer program would make it easy and efficient.

One of the things I can easily do in Word is to color code elements of the example sentences. I have set up a system of 22 character styles, each with a different color "border" which creates a box around the word and fills it with a background color. Word displays these styles in a box on the left of the screen. So all I have to do is select a word (or phrase) in the example sentence and then click on the style on the left to highlight the word. I can mark up clause or phrase structure by assigning a different color for each constituent (verb-magenta, nominative-green, accusative-cyan, dative-orange, genitive-yellow, preposition-blue, conjunction-gray, adjective-brown, etc.). I can use the same colors to mark collocates or anything else that seems significant.

Once I am done marking up the example sentences, I sort them. But I have to sort them "manually" and this takes a lot of time. The colors enable me to rapidly scan over a lot of data and pick out the element or structure that I am sorting. Once the sorting is done, the colors enable me to scan through all the examples with a particular structure or collocate. This makes it easy to see the feature that I need to describe. Without this tool in Word it would be very difficult to analyze masses of example sentences. Every time I wanted to look at another example sentence, I would have to read the sentence and find the word, phrase, or structure that I was interested in. But once the sentences are color coded, I can easily find things. One of the really great benefits of this system is that other people can also read the analysis file and quickly see what I have seen. So I can create a "permanent" record of the syntactic and semantic analysis that I have done. I can easily review it later and other people can easily review it too.

What I haven't figured out is how to combine this color coding method with some sort of database system that would make it easy (or easier) to mark up the data and sort it. I can imagine a very sophisticated "lexical analysis" program that would create a giant database out of your text corpus. It would enable you to mark up (color code) each sentence for clause structure. Then it would hide that mark up and enable you to mark up noun phrase structure. Then it would hide that and enable you to mark it up in other ways. So you wouldn't have to duplicate the data for each different feature that you want to investigate. It would enable you to tag words, phrases, or the whole sentence for anything you wanted to. It would generate displays (e.g. key word in context) and reports (collocates of the key word). It would sort the data by any feature you had set up. It would enable you to extract examples of each significant feature that you had identified and put these examples into a permanent summary file that documents your analysis for posterity.

FLEx is really good at early stages of linguistic documentation and description. The parsing and interlinear functions are good. The text charting is good. The grammar sketch is good. But I have always been interested in the methodology of linguistic analysis. When I find that I cannot easily and efficiently do what I need to do, I start thinking about *how* to get the job done. Linguistic theory is good and tools like FLEx are good. But I find huge gaps in our thinking about field methods. Throughout my career I've had to invent tools and methods to get the job done.
Ron Moe

--

To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+unsubscribe@googlegroups.com.


To post to this group, send email to flex...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/flex-list/8678d6e1-39e5-5e45-b09a-f373c36e6bd9%40sil.org.
For more options, visit https://groups.google.com/d/optout.

--
You are subscribed to the publicly accessible group "FLEx list".
Only members can post but anyone can view messages on the website.
To change your status, please write to flex_d...@sil.org.
You can join this group by going to http://groups.google.com/group/flex-list.
---
You received this message because you are subscribed to the Google Groups "FLEx list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+unsubscribe@googlegroups.com.

To post to this group, send email to flex...@googlegroups.com.

--
You are subscribed to the publicly accessible group "FLEx list".
Only members can post but anyone can view messages on the website.
To change your status, please write to flex_d...@sil.org.
You can join this group by going to http://groups.google.com/group/flex-list.
---
You received this message because you are subscribed to the Google Groups "FLEx list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+unsubscribe@googlegroups.com.

To post to this group, send email to flex...@googlegroups.com.

David Wilkinson

unread,
Nov 1, 2017, 7:11:19 PM11/1/17
to flex...@googlegroups.com

Alan, I may well be missing something or not have understood exactly what you are wanting. But once you have done the filter on the note string in ‘concordance’ view, and the results show in the bottom left ‘concordance results’ window, is there a reason you can’t select the column ‘occurrence’ (which seems to show the baseline) and where it says ‘show all’ click the down arrow and choose ‘filter for’ and use whatever search string or regex to restrict the search further?

 

I cannot see a way to export the data. But if you click on the ‘Concordance results’ heading to make sure that window is active, then edit: select all, edit: copy, you should be able to paste that into Excel and it should have kept reasonable formatting and you could search in Excel if needed.

 

David

--

You are subscribed to the publicly accessible group "FLEx list".
Only members can post but anyone can view messages on the website.
To change your status, please write to flex_d...@sil.org.
You can join this group by going to http://groups.google.com/group/flex-list.
---
You received this message because you are subscribed to the Google Groups "FLEx list" group.

To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+...@googlegroups.com.

Alan Vogel

unread,
Nov 10, 2017, 9:13:14 AM11/10/17
to flex...@googlegroups.com
Thank you for your comments, Ron. I think one of the new features in the version of Xlingpaper that just came out has some relation to what you are talking about. I wanted to bracket sentence constituents in texts, and I looked at the tagging tab in Flex. This did what I wanted to do, but only on one level. That is, it can not handle nesting. If you want to tag the subject and the predicate of a sentence, you can do that. But if you want to tag a noun within the subject, you can't do that if you want the subject to still be tagged. I made a feature request to the Flex developers. This would be a very valuable addition to Flex, especially since one can search for tags and build a concordance based on them.

In the meantime, I mentioned to Andy Black that it would be valuable to have a macro in Xlingpaper to do this kind of bracketing of constituents, and he made a macro right away to do this, which is in the new version of Xlingpaper. This does handle multiple levels of nesting, and is a very nice addition to Xlingpaper. Alan
To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+...@googlegroups.com.

To post to this group, send email to flex...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages