Language Context in Verb and ObjectType identifiers

44 views
Skip to first unread message

James M Snell

unread,
Jul 19, 2014, 3:34:31 PM7/19/14
to activity...@googlegroups.com
Up to this point, verb and objectType identifiers have always been
assumed to be either simple tokens or opaque IRIs. Typically, they end
up being expressed in terms of the english infinitive or singular noun
root forms of the words. For instance,

"pet"
"http://example.org/verb/pet"

"dog"
"http://example.org/noun/dog"

While this works, it is limiting in terms of internationalization.
Verbs and objectTypes may be expressed using any language.
Unfortunately, there's no existing language context we can use to
process those.

What I want to propose is a new convention (and slight spec change)
that allows the language context to be optionally included with the
verb or objectType identifier. For instance:

"pet"
"en/pet"
"ru/питомца" (forgive me if my Russian is a bit off)
"http://example.org/verb/pet"
"http://example.org/verb/en/pet"
"http://example.org/verb/ru/питомца"

The idea here is to preceed the verb or objectType identifier segment
with a Language Tag identifier. In the absolute IRI form, this can be
done easily by convention today. In the simple form, however, the
Activity Streams spec currently forbids the "en/pet" or "ru/питомца"
forms. I would like to amend the specification to allow for simple
token forms to be optionally prefixed by a language tag.

The basic idea here is to provide a simple way for internationalized
verb and objectType identifiers to be related back to one another via
translation. This is particularly helpful for Idiomatic Verbs such as
"flag-as-inappropriate", which could be represented in Russian as
"ru/знак-как-нарушение" (again, forgive me if my Russian is off)

It's a simple convention that ought to prove useful. Thoughts?

- James

Owen Shepherd

unread,
Jul 19, 2014, 4:18:58 PM7/19/14
to activity...@googlegroups.com

-1

 

What are we gaining here? The verb/objectType tokens are not user facing data. Here we end up with a million mappings to canonicalize the multilingual terms into the English form so processors can understand them. Why?

 

We already have a simple route to internationalization: the processor is responsible for mapping the verb to a displayable string anyway. My own processor does a table lookup in order to compose an internationalized activity string. This kind of feature would make that job harder, not easier

Verbs and objectTypes don't need a language context. That is implied: the reader's language. Processors already know what language to render the presentation text in.

 

> -----Original Message-----

> From: activity...@googlegroups.com [mailto:activity-

> str...@googlegroups.com] On Behalf Of James M Snell

> Sent: 19 July 2014 20:34

> To: activity...@googlegroups.com

> Subject: Language Context in Verb and ObjectType identifiers

> --

> You received this message because you are subscribed to the Google Groups

> "Activity Streams" group.

> To unsubscribe from this group and stop receiving emails from it, send an

> email to activity-strea...@googlegroups.com.

> To post to this group, send email to activity...@googlegroups.com.

> Visit this group at http://groups.google.com/group/activity-streams.

> For more options, visit https://groups.google.com/d/optout.

James M Snell

unread,
Jul 19, 2014, 4:52:21 PM7/19/14
to activity...@googlegroups.com
Given a statement such as:

{
"actor": "acct:j...@example.org",
"verb": "comer",
"object": "http://example.org/abc"
}

How does a processor determine how to represent the statement in human
readable terms? You mention using a lookup table. Yes, that works, but
then the kinds of statements you are able to represent become limited
by the finite set of keys in your table placing strict limits on
extensibility. By encouraging publishers to follow some basic
conventions it becomes possible to apply simple heuristics to
reasonably interpret verbs and object types that may not currently be
in your lookup table.

The optional language tag prefixing here gives a bit more metadata to
help inform the heuristics used to generate the human readable
statement. It's not about specifying what language to render the
statement in when displaying it, it's about making sure that processor
can understand what it has received.

The language tag itself can be easily ignored. System's can treat
"comer" and "sp/comer" as equivalent. In your system, for instance, if
the language tag provides no additional value relative to your lookup
table, drop the language tag segment and proceed as normal... or treat
the entire identifier as just another extension verb/objectType
identifier that your system does not support.

- James

Evan Prodromou

unread,
Jul 19, 2014, 6:31:51 PM7/19/14
to activity...@googlegroups.com
-1

Verbs are opaque identifiers. Having thousands of ways to represent "eat" or "post" does not help interop. I18N belongs at the UX layer.

Sent from my iPhone

James M Snell

unread,
Jul 19, 2014, 7:54:44 PM7/19/14
to activity...@googlegroups.com
I'm not suggesting having thousands of ways to represent "eat" or
"post". I'm suggesting a simple convention that can be employed when
new verb identifiers are minted. Implementations that are based on
constrained lists of verb and object types would remain unchanged. If
a server wishes to only accept the "verb" eat, then it would continue
to do so.

Note that the specifically currently places no constraints on verb or
object type identifiers -- that is, we already have thousands of ways
to represent "eat" or "post". Who decides which is correct?

- James

Andreas Kuckartz

unread,
Jul 20, 2014, 1:41:37 AM7/20/14
to activity...@googlegroups.com
Please do _not_ create a new type of "context" for this but use the JSON-LD features @language and language maps.

And please also consider using skos:Concept.

Cheers,
Andreas

Andreas Kuckartz

unread,
Jul 20, 2014, 1:41:46 AM7/20/14
to activity...@googlegroups.com
For those who have no idea what SKOS is about:
http://www.w3.org/TR/skos-primer/

As you can see i18n is built in.

Cheers,
Andreas

James M Snell

unread,
Jul 20, 2014, 2:10:50 PM7/20/14
to activity...@googlegroups.com
AS2 already leverages language maps for natural language fields. This
is different. verb and objectType identifiers are opaque identifiers
that carry semantic meaning. They are not (directly) intended for
display and are not intended to vary depending on user preference.
Once a verb or objectType identifier is minted, it is expected to
remain constant. The conventions I am suggesting are applicable only
when the identifier minted.

To be clear, the verb identifiers "pet", "en/pet" and "ru/питомца"
would still be distinct, non-equivalent opaque verb identifiers under
my proposal. However, given that they follow some basic conventions
(infinitive form + embedded language context clues), a
heuristics-based analyzer can come along later and do far more
interesting things with those than what can be accomplished with just
a look-up table approach.

Up to this point, we've gone off the general assumption that there
would some form of centralized registry of common verbs and object
types. That's the road we started to pave with the AS 1.0 base schema
[1]. The reality, however, is that various communities of practice
have gone down the path of creating their own libraries of verbs and
objectType identifiers. If we make the assumption that this will be a
common practice, then we help interop the most by providing best
practice conventions and guidelines for the creation of new
identifiers *in addition to* encouraging those communities of practice
to reuse existing ones.

It is important to note, however, that while verb and object type
identifiers themselves are not intended for display, they are used to
inform implementations about what to do display. For instance, a verb
"post" could result in the display "John posted a new entry" while the
verb "upload" could result in the display "John uploaded a new entry".
With the lookup table approach, there is a static mapping between the
verb identifier and the template used to display information to the
user. This is limiting both in that it requires a priori knowledge of
how to represent each possible type of activity and it requires that
we restrict extensibility to keep things from becoming overly
complicated. By applying the conventions I suggest, we can make this
task much more dynamic.

[1] https://github.com/activitystreams/activity-schema/blob/master/activity-schema.md

Evan Prodromou

unread,
Jul 20, 2014, 3:05:25 PM7/20/14
to activity...@googlegroups.com
James,

I understand the motivation: to make the "verb" somewhat self-documenting.

However, I think the demand for this is low, and the risk is that the "verb" and "type" properties slide into becoming natural-language fields.

I think that's a big risk, and that's why I gave a -1.

-Evan

Sent from my iPhone

Erik Wilde

unread,
Jul 20, 2014, 5:34:52 PM7/20/14
to activity...@googlegroups.com
joining the -1 momentum here. if you want to make things globally
unique, mint a globally unique URI (and pray that a global concept makes
sense and is adopted). if you want to make i18n a first-level concept,
make it first-level instead of hiding it in some brittle string conventions.

cheers,

dret.
--
erik wilde | mailto:dr...@berkeley.edu - tel:+1-510-2061079 |
| UC Berkeley - School of Information (ISchool) |
| http://dret.net/netdret http://twitter.com/dret |

Pat Cappelaere

unread,
Sep 23, 2014, 11:40:26 AM9/23/14
to activity...@googlegroups.com
I am curious now… How do you intend to internationalize actions?

Example: In my case, users can download various imagery products, [re]-process them or browse

“actions”: {
“download”: [],
“process: {},
“browse”: []
}

A consumer may get an action that has never been seen before. How would it display the action back to a user with a different locale?
Thanks,
Pat.
Message has been deleted

cappelaere

unread,
Sep 27, 2014, 9:21:02 AM9/27/14
to activity...@googlegroups.com
To avoid internationalizing a property name such as an action, would consider changing:


“actions”: {
        “download”: [],
        “process: {},
        “browse”: []
}

To:

“actions”: [
        {  "verb": “download”,
           "objects": [] },
        {  "verb":“process",
           "object": {} },
        { "verb": “browse”,
          "objects": []
        }
]

The property value generated on the server side can then be used for display and this makes the syntax a little closer to the activity streams.
WDYT?
Thanks again,
Pat.
Reply all
Reply to author
Forward
0 new messages