[omeka-dev] AjaxCreate and Controlled Vocabularies

86 views
Skip to first unread message

Tom Scheinfeldt

unread,
Apr 21, 2010, 3:09:22 PM4/21/10
to Omeka Dev
One of the things our user community (or potential user community) is
dying for is something to handle controlled vocabularies. I'm
wondering if Patrick (and maybe Jim Safley and the rest of the crew)
think AjaxCreate could be hacked to handle authority lists and the
like?

--
You received this message because you are subscribed to the Google Groups "Omeka Dev" group.
To post to this group, send email to omek...@googlegroups.com.
To unsubscribe from this group, send email to omeka-dev+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/omeka-dev?hl=en.

Ethan Gruber

unread,
Apr 21, 2010, 3:17:31 PM4/21/10
to omek...@googlegroups.com
I've done a lot of work with controlled vocabularies in a totally different framework (XForms), but the vocabularies themselves were stored in a Solr index and queried through TermsComponent.  You could use ajax to take advantage of terms stored in Solr in the Omeka framework just as easily, I think.  I guess that is one more item in the list of features a Solr plugin may provide in the future.

Ethan

Tom Scheinfeldt

unread,
Apr 21, 2010, 3:22:29 PM4/21/10
to omeka-dev
I hadn't thought about Solr. That's a better long term solution. But
definitely something we should keep on the radar.

Thanks, Ethan.

Patrick Murray-John

unread,
Apr 21, 2010, 3:51:24 PM4/21/10
to omek...@googlegroups.com
AjaxCreate might be a way at part of it, at least in short term. Where
and how the vocabularies are stored is definitely a key question.

I'm a little fuzzy about cases...is this on the right track:
I'm thinking that the mission is to create Tags in Omeka, selecting from
the controlled vocabulary?
So the idea would be that, instead of a text field to create a new Tag,
it would provide a dropdown, populated (somehow) with the vocabulary?

If that's the case, then we'd really just need an array of the terms in
the vocabulary to stuff into the dialog form, and the game would just be
how to generate and/or store that array.

Or, are we talking about a way to create the Omeka Tags entirely from
scratch?

I guess what I'm wondering about is what other workflow AjaxCreate might
be useful in, and at what point in the workflow?

Thanks for the shout-outs on Twitter today--much appreciated!!

Patrick

Tom Scheinfeldt

unread,
Apr 21, 2010, 3:56:17 PM4/21/10
to omeka-dev
Patrick-- I'm actually thinking about core and item type metadata in
addition to tags. But that's the idea: creating metadata from
pre-defined lists to insure consistency across items and collections.

Ethan Gruber

unread,
Apr 21, 2010, 4:01:47 PM4/21/10
to omek...@googlegroups.com
So as a practical example, connect the Dublin Core subject input box with the LCSH controlled vocabulary with autosuggested terms?

Tom Scheinfeldt

unread,
Apr 21, 2010, 4:14:44 PM4/21/10
to omeka-dev
Right. Or any arbitrary, pre-configured list of terms. I don't even
know if it has to autosuggest. It could just be a drop down menu.

Thanks for talking this through.

Patrick Murray-John

unread,
Apr 21, 2010, 4:51:46 PM4/21/10
to omek...@googlegroups.com
Hmmm....this is getting interesting quickly!

Here's what I'm thinking, though it moves us outside the realm of
hacking AjaxCreate

Create a plugin (did I mention I suffer from scope-creep a lot?) called
ControlledVocab, whose job it is to store user-created or imported
vocabs and/or manage the connection to a web service publishing a
controlled vocab (so, for example, we don't have to import all of LCSH!)

Models might look like this:

class ControlledVocabTerm extends Omeka_Record {

public $id; // duh
public $name; // the text name of the term (e.g., "History--18th
century")
public $uri; // the uri of the term, if it has one (e.g.
http://id.loc.gov/authorities/sh2002006124#concept )
public $vocab_id; // id in a ControlledVocab table (see below)
public $element_id; //the id of the element that the term is
relevant to. (e.g., the id for Dublin Core Subject) or maybe serialize
an array of ids?

//or would it be better to have $element_name and $element_set ?
}

class ControlledVocab extends Omeka_Record {
public $id; // just 'cuz
public $name; // the name of the controlled vocabulary (e.g., LCSH
or whatever name an admin creates for their new vocab
public $uri; // the uri of the vocabulary, if it has one
public $api_url; // the url of a service that allows querying the
vocab, if it has one
}

Then, in AjaxCreate, a helper function like so:

ajax_create_controlled_vocab_dialog(array $options, $callback = false)
that returns the link to open a dialog, just like the current
ajax_create_dialog helper

$options would carry the info needed to ajax to the server to get the
array of term names to stuff into a dropdown (or an autocomplete?)
array( 'vocab_name' => 'My Vocab',
'element_name' => 'Subject',
'element_set' => 'Dublin Core',
. . . //other settings similar to the helper that's there now.
)

Based on the vocab name, a Controller in ControlledVocab would know how
to handle the request (e.g., look up data from the tables or use an api
at a web service url) and send the array of term names back.

Some way to import terms would be nice. I'm not sure whether
admin-created vocabs or importing vocabs is the higher priority.

Whaddya all think?

Thanks!
Patrick

Jim Safley

unread,
Apr 21, 2010, 6:34:43 PM4/21/10
to omek...@googlegroups.com
Controlled vocabularies are surprisingly easy to implement in Omeka
using a combination of filters. See the filters under Miscellaneous
Filters, here:

http://omeka.org/codex/Plugin_API/Filters

Just write a plugin that uses the Form Input Filter to generate a
drop-down select box of terms, either hard coded or pulling from a
vocabulary API. For example:

<?php
add_filter(array('Form', 'Item', 'Dublin Core', 'Subject'),
'ControlledVocabPlugin::filterDCSubject');
class ControlledVocabPlugin
{
public static $dcSubjects = array('red' => 'red',
'yellow' => 'yellow',
'green' => 'green',
'blue' => 'blue',
'brown' => 'brown',
'black' => 'black',
'white' => 'white');

public static function filterDCSubject($html, $inputNameStem, $value,
$options, $record, $element)
{
return __v()->formSelect($inputNameStem . '[text]',
$value,
$options,
ControlledVocabPlugin::$dcSubjects);
}
}
?>

This is the simplest example I can think of, but I imagine it could
form the basis for a generalized controlled vocabulary plugin.

Jim

On Wed, Apr 21, 2010 at 4:14 PM, Tom Scheinfeldt <t...@foundhistory.org> wrote:
> Right. Or any arbitrary, pre-configured list of terms. I don't even
> know if it has to autosuggest. It could just be a drop down menu.
>
> Thanks for talking this through.

Patrick Murray-John

unread,
Apr 21, 2010, 8:45:22 PM4/21/10
to omek...@googlegroups.com
Jim,

Awesome...I was completely unfamiliar with that approach.

And, I think I was wrong in my assessment before -- I don't think that
hacking/tweaking AjaxCreate could do the job. The way the javascript
works is too different from the intended outcome. That said, I see some
possibilities for a different tack on the javascript if it comes to that.

There's a branch I'd like to ask about in this approach. What if I'd
like to have the usual open text input available, as well as the
dropdown for controlled vocab options? Could something with this
approach let that happen? And, how would it get stuffed into the Item
Edit page? And, if this isn't really a desired feature, the question is
moot, but I thought it might be worth bringing up.

Very awesome to be introduced to the formSelect method. Must wrap my
brain around it!

Thanks,
Patrick

Ethan Gruber

unread,
Apr 21, 2010, 9:56:47 PM4/21/10
to omek...@googlegroups.com
Something that I think is really important to note is that controlled vocabulary indices may have tens or hundreds of thousands of entries, even millions.  A solution for populating a drop down menu from LCSH, for example, won't work.  It's not scalable enough.  There are close to 400,000 LCSH terms, and the Library of Congress doesn't maintain a service by which you can access terms on the fly.  LCSH terms are publicly available in an rdf file that is 400-500 megabytes, though if you parse out only the term, id, and creation/modified dates, you can whittle the size down to less than 75.  Even still, you need to develop a process by which the vocabulary list is queried for terms that match your inputted keystrokes, which is why I recommend autosuggest for controlled vocabulary terms if you are dealing with LCSH, Getty geographical place names, or other large collections of terms.

I don't think this is as easy as creating a few filters.  You'll need to write a lot of javascript, assuming you already have a service that can provide the terms in a machine-processable fashion.

Ethan

Tom Scheinfeldt

unread,
Apr 21, 2010, 10:05:11 PM4/21/10
to omek...@googlegroups.com

It's definitely true that dropdowns won't work for LCSH, Getty, or other large, standard vocabularies.  But I think a lot of projects would be happy with a "lite" version of a controlled vocabulary plugin just to speed up the data entry process and maintain some internal consistency in their homegrown metadata conventions. So, maybe this is two plugins rather than one.

Thanks, all.

On Apr 21, 2010 9:56 PM, "Ethan Gruber" <ewg4...@gmail.com> wrote:

Something that I think is really important to note is that controlled vocabulary indices may have tens or hundreds of thousands of entries, even millions.  A solution for populating a drop down menu from LCSH, for example, won't work.  It's not scalable enough.  There are close to 400,000 LCSH terms, and the Library of Congress doesn't maintain a service by which you can access terms on the fly.  LCSH terms are publicly available in an rdf file that is 400-500 megabytes, though if you parse out only the term, id, and creation/modified dates, you can whittle the size down to less than 75.  Even still, you need to develop a process by which the vocabulary list is queried for terms that match your inputted keystrokes, which is why I recommend autosuggest for controlled vocabulary terms if you are dealing with LCSH, Getty geographical place names, or other large collections of terms.

I don't think this is as easy as creating a few filters.  You'll need to write a lot of javascript, assuming you already have a service that can provide the terms in a machine-processable fashion.

Ethan



On Wed, Apr 21, 2010 at 8:45 PM, Patrick Murray-John <pgos...@umw.edu> wrote:
>
> Jim,
>

> Awesom...

Jim Safley

unread,
Apr 21, 2010, 11:08:55 PM4/21/10
to omek...@googlegroups.com
Patrick,

> What if I'd like to have the usual open text input available, as well
> as the dropdown for controlled vocab options?

Good question. The "Before Saving Item" filter will do the trick:

<?php
add_filter(array('Form', 'Item', 'Dublin Core', 'Subject'),
'ControlledVocabPlugin::filterItemForm');
add_filter(array('Save', 'Item', 'Dublin Core', 'Subject'),
'ControlledVocabPlugin::filterItemSave');

class ControlledVocabPlugin
{
public static $dcSubjects = array('' => '',
'red' => 'red',
'yellow' => 'yellow',
'green' => 'green',
'blue' => 'blue',
'brown' => 'brown',
'black' => 'black',
'white' => 'white');

public static function filterItemForm($html, $inputNameStem, $value,
$options, $record, $element)
{
$html .= __v()->formText($inputNameStem . '[text][0]', $value,
$options);
$html .= '<br />';
$html .= __v()->formSelect($inputNameStem . '[text][1]', $value,
$options,
ControlledVocabPlugin::$dcSubjects);
return $html;
}

public static function filterItemSave($elementText, $record, $element)
{
if (strlen(trim($elementText[0]))) {
return $elementText[0];
}
return $elementText[1];
}
}
?>

Note that ControlledVocabPlugin::filterItemSave returns the text input
if filled out, else it returns the selected option. This is what
eventually gets saved to the database.

Jim

Patrick Murray-John

unread,
Apr 22, 2010, 8:52:51 PM4/22/10
to omek...@googlegroups.com
I spent a little while today thinking and coding through some of this,
and want to throw some questions back out there

Am I right in thinking that the filterItemSave method could yield odd
results if someone hits the button to add another value to the
ElementTexts? I haven't sorted out details or tested, but it looks like
that might create some conflicts.

I'm a little concerned about the user not having a signal about which
value is saved if they create an ambiguity. That is, if they both type
something in and select from the dropdown. It'd be an oddity, yes, but I
definitely believe in users' ability to create oddities.

So, what do you think of this tack on it. Using the itemFilterForm
method, insert a <select> that will update the textarea when selected
via some javascript trickery? I'm thinking that'd let people keep
attention on the text that's in the box as what's going to be saved.

It'd mean building that javascript, but there would be the step of
building the <select> in any case, and it might be easy enough to update
the textarea based on the change to the <select>.

Whaddya think?

Thanks,
Patrick


Jim Safley wrote:

Patrick Murray-John

unread,
Apr 22, 2010, 9:17:15 PM4/22/10
to omek...@googlegroups.com
Out of curiosity, what range of ElementSet/Elements does it seems like
people would like a controlled vocabulary to help with? DC Subject is a
clear one, and Tom said that some Item Type Metadata would be targets.
Has anyone put together custom ElementSets that they're working with? Or
custom Item Type Metadata?

I'm just curious about how people are putting the metadata to use.

Thanks!
Patrick

Shirley Lincicum

unread,
Apr 22, 2010, 9:45:40 PM4/22/10
to omek...@googlegroups.com
Speaking as a metadata librarian who has been inputting some test
items to evaluate Omeka recently, here's a list of the basic DC fields
where I would love to be able to select from a controlled vocabulary:

Subject (LCSH, Getty vocabularies, ERIC Thesaurus, locally maintained
list, etc.)
Creator (LCNAF, ULAN, locally maintained list, etc.)
Contributor (LCNAF, ULAN, locally maintained list, etc.)
Rights (Creative Commons, locally maintained list)
Format (MIME type list, etc.)
Language (RFC 4646)
Type (DCMI Type Vocabulary, etc.)
Coverage (Getty TGN, geonames, locally maintained list, etc.)

I don't know how out of scope this might be for what you're currently
working on, or if this is already possible and I just don't know how
to accomplish it, but it would be extremely helpful if an
administrator could specify which controlled vocabularies to offer or
require on a collection-by-collection basis. It would also be helpful
if the input form could support adding terms from multiple
vocabularies in separate fields. I visualize this as having some sort
of drop down or radio buttons indicating available vocabularies
adjacent to the input box that the user could click to control the
source of the terms to match against as they fill in the box. I agree
with the earlier comment that for large vocabularies like LCSH and
Getty's, having some sort of smart auto-suggest, as opposed to a
drop-down menu is a must.

Shirley

Jim Safley

unread,
Apr 23, 2010, 10:30:54 AM4/23/10
to omek...@googlegroups.com
Patrick,

> Am I right in thinking that the filterItemSave method could yield odd
> results if someone hits the button to add another value to the ElementTexts?
> I haven't sorted out details or tested, but it looks like that might create
> some conflicts.

You should try the code I provided. Just make a directory in plugins/
called ControlledVocab, place the code in plugin.php, and install it.
When adding an input, you'll find a strange warning error (something
we have to straighten out), but I think you'll find the results to be
typical and expected.

> I'm a little concerned about the user not having a signal about which value
> is saved if they create an ambiguity. That is, if they both type something
> in and select from the dropdown. It'd be an oddity, yes, but I definitely
> believe in users' ability to create oddities.

I agree that ambiguity can be problematic in the current setup, and I
will never underestimate the ability of users to create oddities.
Though I wonder how far we should go to hold their hand. Data input is
a learned task, requiring trial and error. But I digress...

> So, what do you think of this tack on it. Using the itemFilterForm method,
> insert a <select> that will update the textarea when selected via some
> javascript trickery? I'm thinking that'd let people keep attention on the
> text that's in the box as what's going to be saved.

That's a good tack, one that will certainly reduce ambiguity. I think
we'll request a simple controlled vocabulary plugin from the
developers during the May playdate. It would be a good problem to hack
around.

Tom Scheinfeldt

unread,
Apr 23, 2010, 11:43:51 AM4/23/10
to omek...@googlegroups.com

Thanks, Shirley. That's exactly the kind of input we need.

On Apr 23, 2010 10:31 AM, "Jim Safley" <jims...@gmail.com> wrote:

Patrick,


> Am I right in thinking that the filterItemSave method could yield odd

> results if someone hits t...

You should try the code I provided. Just make a directory in plugins/
called ControlledVocab, place the code in plugin.php, and install it.
When adding an input, you'll find a strange warning error (something
we have to straighten out), but I think you'll find the results to be
typical and expected.


> I'm a little concerned about the user not having a signal about which value

> is saved if they cr...

I agree that ambiguity can be problematic in the current setup, and I
will never underestimate the ability of users to create oddities.
Though I wonder how far we should go to hold their hand. Data input is
a learned task, requiring trial and error. But I digress...


> So, what do you think of this tack on it. Using the itemFilterForm method,

> insert a <select> th...

That's a good tack, one that will certainly reduce ambiguity. I think
we'll request a simple controlled vocabulary plugin from the
developers during the May playdate. It would be a good problem to hack
around.


Jim

--
You received this message because you are subscribed to the Google Groups "Omeka Dev" grou...

Patrick Murray-John

unread,
Apr 23, 2010, 1:55:00 PM4/23/10
to omek...@googlegroups.com

Awesome! Thanks!

And, uh-oh.

I'd actually spent a while building things up over the last day or two,
and it looked like there were troubles when adding an additional input.

Even more uh-oh -- I kinda obsessed about the idea of the plugin and
went nuts. What I have is here:
http://github.com/patrickmj/ControlledVocab .

The step I'm working on now is taking the stored Controlled Vocab terms
and building the array to pass to the _formSelect in filterItemForm, and
setting up the filters for each of the Elements that have term data.

Hope I didn't jump the gun too much on playdate plans!

Patrick


Tom Scheinfeldt wrote:
>
> Thanks, Shirley. That's exactly the kind of input we need.
>
>> On Apr 23, 2010 10:31 AM, "Jim Safley" <jims...@gmail.com

Jim Safley

unread,
Apr 23, 2010, 1:54:16 PM4/23/10
to omek...@googlegroups.com
Patrick,

> Has anyone put together custom ElementSets that they're working with? Or custom
> Item Type Metadata?
>
> I'm just curious about how people are putting the metadata to use.

The Dublin Core Extended plugin extends the Dublin Core element set,
i.e. it adds the full set of DCMI metadata terms to the element set
rather than creating an entirely new one.

The Zotero Importer plugin creates a Zotero element set containing all
possible Zotero fields (there are dozens). Imported item metadata map
to the Zotero AND Dublin Core element sets. This, I think, is when
data redundancy is appropriate. Mapping to Dublin Core ES is
convenient for theme writers, but is ambiguous; mapping to Zotero ES
is unambiguous, but possibly tedious for theme writers.

I haven't worked on it, but the Contribution plugin appears to create
a Contribution Form element set that contains elements specific to
online contributions.

As you see, we haven't fully explored the possibilities for element
sets. Most of our efforts focus on Dublin Core because, well, it's
easier and has near-universal application. But I would like to see
more element sets out there (not to mention item types).

Jim

Patrick Murray-John

unread,
Apr 23, 2010, 1:57:46 PM4/23/10
to omek...@googlegroups.com
Shirley,

Many thanks! I had skipped over this email, and thought that Tom's email
thanking you was directed at Jim Safley. I figured it was just a weird
CHNM thing that Tom would call Jim Shirley. :)

Patrick

Shirley Lincicum

unread,
Apr 23, 2010, 5:01:36 PM4/23/10
to omek...@googlegroups.com
Patrick, Tom, et al.,

I'm glad you found my contributions useful. It's a little intimidating
to jump into the middle of a developers discussion when I feel like my
comprehension is about 50%. :-) I'm thrilled to see work being done on
a controlled vocabulary plugin, as this is exactly the functionality
that I feel Omeka (along with most other catalog/repository
applications) desperately need, but I lack the programming skills to
create myself. I'm happy to help with further refinement and testing
of the plugin, if needed.

Perhaps a bit of an aside, but is anyone working on the MetaComplete
plugin that was solicited as part of the plugin "rush" announced
following code4lib?
http://omeka.org/c/index.php/Plugin_Rush_2010#MetaComplete_.281.1-1.0.29
I see some overlap between MetaComplete and ControlledVocab. For
example, the specs for MetaComplete would essentially allow my
existing metadata to function as a local controlled list.

Shirley

Jim Safley

unread,
Apr 23, 2010, 9:03:15 PM4/23/10
to omek...@googlegroups.com
> I figured it was just a weird CHNM thing that Tom would call Jim Shirley. :)

Tom has called me weirder things.

Shirley, yes we value your input and thank you for braving the deep
end. I'm not sure if anyone is working on the MetaComplete plugin, but
I also see overlap. Technically speaking, your idea to specify a
vocabulary for individual fields is possible, and if anyone attempts
it (or any other strategy for that matter), I recommend abstracting
out the way the plugin accesses/retrieves/parses the various
vocabularies so more vocabularies can be added without too much
hassle.

This is becoming quite the ambitious imaginary plugin!

Patrick Murray-John

unread,
Apr 23, 2010, 9:13:30 PM4/23/10
to omek...@googlegroups.com
Shirley,

Yeah, I know that some of the convo can be intimidating. But on the
other hand, as Tom said, yours is exactly the kind of response that is
hugely important. It affirms the connection between what happens on the
code-side and what's needed on the boots-on-the-ground-side. From my
perspective, the more contact between the two, the better!

I've been thinking about what you said, and it makes wonderful sense.
And, now that you're here -haha!- we might just take you up on that
offer to test and refine, and most importantly keep insights coming.

I haven't peeked at the MetaComplete plugin stuff, but my intuition says
to agree.

I hope you'll join in the fun on this list. For me at least, it did give
a needed perspective that I wouldn't have thought of myself. That's part
of the fun of contributing to open source projects.

Patrick

Patrick Murray-John

unread,
Apr 23, 2010, 9:30:51 PM4/23/10
to omek...@googlegroups.com
I realized tonight that I need to go on a mission of figuring out how
the model filtering mechanism works. I'm hoping that that is a step
toward abstracting things, but not entirely sure.

I see some awkwardness about some of the controlled vocabs based on
licensing. That is, it might be tricky to develop a plugin to work with
a licensed controlled vocabulary (i.e., Getty's stuff and others).

On the other hand, I'm really curious about Shirley's mention of
geonames, which is all kinds of open with an api and even has an RDF
representation easily GETable. Could I ask more about what kinds of
geonames info you are drawing on and want to use?

Patrick

Patrick Murray-John

unread,
Apr 23, 2010, 9:38:02 PM4/23/10
to omek...@googlegroups.com
Sorry, all,

I had two thoughts in my head on that last post. About abstracting
jQuery, for real.

It looks like ImageAnnotation plugin pulls in jQuery, but that's the
only one and core doesn't use it?

I'm wondering if anyone else is working on plugins that do, or would
like to, take advantage of jQuery? And if there is enough interest in
making a jQuery plugin that does nothing more than slap in jQuery? I
think a jQuery plugin might be a good route so that any jQuery
users/developers could keep to a codified base.

Whaddya think?

Patrick

Jim Safley

unread,
Apr 25, 2010, 10:10:31 AM4/25/10
to omek...@googlegroups.com
Jeremy can correct me if I'm wrong, but I think we're moving away from
Prototype in favor of jQuery for Omeka 2.0. However (and there's no
way you could've known this) we do include jQuery in the core, here:
/application/views/scripts/javascripts/jquery.js. Include it in the
view header by using js('jquery'). It is an older version (v1.3.2),
and Omeka 2.0 is not coming out anytime soon, so a jQuery plugin as a
stopgap could be useful.

Jim
Reply all
Reply to author
Forward
0 new messages