A use case for xOperator

1 view
Skip to first unread message

david.alston

unread,
Oct 29, 2008, 12:47:43 PM10/29/08
to xOperator Open Discussion
Greetings!

About a month ago I stumbled across the concepts of the semantic
web and I have been gorging on as much documentation about it as I
can. I work at a University and have begun trying to put together
some of the semantic tools in ways that will make the lives of the
Faculty/Staff/Students easier.

I would like to setup an xOperator agent to be available to
anyone with a University LoginID so that they can ask questions like
"What time is my next class?"

It looks like xOperator will be able to manage most of what I
need after I setup a SPARQL endpoint that knows about all of the
students' classes. The "personal agent" aspect of xOperator will also
provide amazing opportunities for our userbase, but my first focus is
providing an IM bot on which I can adjust responses via AIML.

Are the following possibilities are within the scope/roadmap of
xOperator? I'd like to get an idea of how difficulty this dream is
going to be to accomplish so that I can schedule my time
appropriately.


If any of these ideas seem impractical or there is a better way
to accomplish the same results I'd love to find an easier way :^)

) able to generate SPARQL queries that include the XMPP username of
the person sending messages to the bot
- perhaps some variable like %%USERNAME%%

) able to check the XMPP username's access permissions to SPARQL
endpoints requested (via LDAP or SPARQL/LDAP)

) some sort of encryption for queries leaving the bot
- I'm still trying to figure out how to encrypt access to SPARQL end-
points.. we don't want to have a SPARQL endpoint that anyone can query
without authenticating in some fashion first
- maybe I could setup an xOperator agent for each resource and limit
who they communicate with to only the "master bot".. maybe that would
send the SPARQL queries over an encrypted XMPP channel..

) respond only to University LoginID's
- eg. "only allow Jabber ID's that are in the @university.edu domain"

) find some way to limit queries to only those SPARQL endpoints that
have the information requested
- perhaps an optional list of query-able attributes per namespace
- if no attributes are listed for a namespace, then it always tries
the query there

) allow one agent to keep track of multiple users
- eg. the bot would have a database where it can store user-specific
settings like what their iCal stores are, what SPARQL endpoints they
have configured, and possibly store their custom templates as well
- most of our users won't know what the semantic web is and if I can
have one xOperator agent running for all of the users it would make it
easier for users to personalize the bot for their own needs


If anyone has any ideas or feedback about these things I'm
practically starving for information. Thanks for all the work ya'll
have put into xOperator (as well as the other AKSW projects)! They've
inspired me :^)

--David Alston

Sebastian Dietzold

unread,
Oct 30, 2008, 8:45:26 AM10/30/08
to xOperator Open Discussion
quote david.alston (29.10.2008):

Hi David,

> About a month ago I stumbled across the concepts of the semantic web
> and I have been gorging on as much documentation about it as I can. I
> work at a University and have begun trying to put together some of the
> semantic tools in ways that will make the lives of the
> Faculty/Staff/Students easier.

Sounds great. In which university do you work?

> I would like to setup an xOperator agent to be available to anyone
> with a University LoginID so that they can ask questions like "What time
> is my next class?"

Does this mean that anyone with an University LoginID has automatically an
XMPP account?

> It looks like xOperator will be able to manage most of what I need
> after I setup a SPARQL endpoint that knows about all of the students'
> classes. The "personal agent" aspect of xOperator will also provide
> amazing opportunities for our userbase, but my first focus is providing
> an IM bot on which I can adjust responses via AIML.

I agree. your usecase matches the group agent scenario, which means the
agent has no proxy account.

> ) able to generate SPARQL queries that include the XMPP username of the
> person sending messages to the bot - perhaps some variable like
> %%USERNAME%%

This is not such a problem. The script context has to be extended with
that value.

> ) able to check the XMPP username's access permissions to SPARQL
> endpoints requested (via LDAP or SPARQL/LDAP)

What exactly do you mean with that? In general: groovy scripts can use any
java API you want, which means you can use an LDAP API too.

another idea about this is: why not managing the access control over the
RDF triple store or the endpoint? In OntoWiki, we have at least model
based access control, so you can manage, which information is given back
to users in a raw scope.

> ) some sort of encryption for queries leaving the bot

You mean SPARQL Endpoints which use HTTPS?

> - I'm still trying to figure out how to encrypt access to SPARQL end-
> points.. we don't want to have a SPARQL endpoint that anyone can query
> without authenticating in some fashion first

you do not need to configure the endpoint in the config for this. you can
encode it in the groovy script directly, which means no "query" command
will fired against the endpoint and also no queries from other agents.

in detail: normally, query script fire queries against all endpoints, all
neighbouring agents and the local store. you can use the query command to
access only a specific, not configured, endpoint. have a look in the
dbpedia*groovy files and look for context.queryRemote(documentQuery,url)
...

> ) respond only to University LoginID's - eg. "only allow Jabber ID's
> that are in the @university.edu domain"

this can achieve the university XMPP server for you so that only
university users are in the roster of the agent (I dont know, how stable
are huge rosters ...)

> ) find some way to limit queries to only those SPARQL endpoints that
> have the information requested
> - perhaps an optional list of query-able attributes per namespace
> - if no attributes are listed for a namespace, then it always tries
> the query there

these request are very hard to tackle. which type of queries do you want
to use and in which way should the agent fire the queries and answer the
user requests? Maybe you can manage this in your scripts?

> ) allow one agent to keep track of multiple users
> - eg. the bot would have a database where it can store user-specific
> settings like what their iCal stores are, what SPARQL endpoints they
> have configured, and possibly store their custom templates as well
> - most of our users won't know what the semantic web is and if I can
> have one xOperator agent running for all of the users it would make it
> easier for users to personalize the bot for their own needs

In this case, agents cant communicate in their neighourhood cause they do
not act as their users and other agents do not allow them to query them.

There are ideas to run only one xOperator to serve many users but in a
more independent way (complete separate configs, scripts and files). But
for us, these dev-direction has a low priority for now. A first step is
scheduled for 0.3 as issue 29: Scripts to install xOperator as a service.

> If anyone has any ideas or feedback about these things I'm
> practically starving for information. Thanks for all the work ya'll
> have put into xOperator (as well as the other AKSW projects)! They've
> inspired me :^)


--
Sebastian Dietzold - Department of Computer Science; University of Leipzig
Tel/Fax: +49 341 97 323-66/-29 http://bis.uni-leipzig.de/SebastianDietzold

David Alston

unread,
Nov 1, 2008, 1:42:42 PM11/1/08
to xope...@googlegroups.com
Greetings!

     Responses inline..

On Thu, Oct 30, 2008 at 7:45 AM, Sebastian Dietzold <diet...@informatik.uni-leipzig.de> wrote:

quote david.alston (29.10.2008):

Sounds great. In which university do you work?
University of Texas at Dallas in the US.
 

Does this mean that anyone with an University LoginID has automatically an XMPP account?
Yes.  Unfortunately, most people don't know about it yet, but I'm hoping that this idea  will be one of the features we can provide that will help with the advertising :^)


I agree. your usecase matches the group agent scenario, which means the agent has no proxy account.
Yeah, that is how I have my test xOperator bot setup at the moment
 


) able to generate SPARQL queries that include the XMPP username of the person sending messages to the bot - perhaps some variable like %%USERNAME%%

This is not such a problem. The script context has to be extended with that value.
I'm not really a programmer.. maybe I'll be able to pick up the groovy scripting language, though.
 

) able to check the XMPP username's access permissions to SPARQL endpoints requested (via LDAP or SPARQL/LDAP)

What exactly do you mean with that? In general: groovy scripts can use any java API you want, which means you can use an LDAP API too.

another idea about this is: why not managing the access control over the RDF triple store or the endpoint? In OntoWiki, we have at least model based access control, so you can manage, which information is given back to users in a raw scope.
We already manage access groups in LDAP and I want to be able to integrate with that.  I intend to make an encrypted RDF front-end for the LDAP server, but for something that is going to be accessed as often as the access groups will be I'd prefer to go straight to LDAP if I can.  Later, I intend to create a triple-store that will also be able to verify access permissions so that I can answer questions like "What hosts can Joe Zanzabar login to?" but that is further in the future and less effecient than a direct LDAP query :^)

Being able to use Java API's in Groovy is good news!  I guess I'll have to learn a bit of java too.. :^)


) some sort of encryption for queries leaving the bot
You mean SPARQL Endpoints which use HTTPS?
Yes!  That's what I would like to do.. I just don't know how the SPARQL query clients (eg. the IM bot) will handle the certificates.  I have some experience dealing with certs, but I haven't seen any documentation saying how a SPARQL client might use them.

I guess I'll have to write a Groovy script for that..



- I'm still trying to figure out how to encrypt access to SPARQL end- points.. we don't want to have a SPARQL endpoint that anyone can query without authenticating in some fashion first

you do not need to configure the endpoint in the config for this. you can encode it in the groovy script directly, which means no "query" command will fired against the endpoint and also no queries from other agents.

in detail: normally, query script fire queries against all endpoints, all neighbouring agents and the local store. you can use the query command to access only a specific, not configured, endpoint. have a look in the dbpedia*groovy files and look for context.queryRemote(documentQuery,url) ...
Thatz good to know!  I'll be able to have a "default" SPARQL endpoint that knows about the rest of the SPARQL endpoints I'll set up and if that doesn't return any results then check the "user-configured" SPARQL endpoints.



) respond only to University LoginID's - eg. "only allow Jabber ID's that are in the @university.edu domain"

this can achieve the university XMPP server for you so that only university users are in the roster of the agent (I dont know, how stable are huge rosters ...)
We have thousands of LoginID's.. I'm a little hesitant to begin testing rosters of that size...

Maybe I can achieve this by using the Groovy script that verifies the access permissions, but I'd like to be able to add an extra layer of security.  Otherwise, j...@gmail.com will be able to add my bot's JabberID to his google talk account and begin asking questions of the UTD bot.. Our Information Security team would sleep better at night if we weren't so open to a DOS attack :^)


) find some way to limit queries to only those SPARQL endpoints that have the information requested
- perhaps an optional list of query-able attributes per namespace
- if no attributes are listed for a namespace, then it always tries
the query there

these request are very hard to tackle. which type of queries do you want to use and in which way should the agent fire the queries and answer the user requests? Maybe you can manage this in your scripts?
I suppose (after I learn Groovy) I'll be able to write my own version of this for our private SPARQL endpoints.. but this is the algorythm I was thinking of..

1) translate AIML to SPARQL query
2) run SPARQL query against end-point
3) if there is an error, and there are other endpoints, then switch to next endpoint and goto step 2.
4) if no end-point returns success then print "No End-Points can answer your query"
5) if the user has specified a "debug" state, then print all the error messages from the queries back to the user

Of course, this doesn't handle the case where the user might want to include data from multiple end-points in the same query.. in which case I imagine the algorythm would look like this..

1) use AIML to break down sentence into multiple query strings
2) run through the known endpoints with each query
3) store the "select" variables from the SPARQL queries in a hash table
4) run the SPARQL queries against each end-point until all the "select" variables have been filled
5) return the response string with the variables replaced with their values

Obviously this last algorythm would have to be modified for queries that need to be presented in table form..

 

) allow one agent to keep track of multiple users
- eg. the bot would have a database where it can store user-specific settings like what their iCal stores are, what SPARQL endpoints they have configured, and possibly store their custom templates as well
- most of our users won't know what the semantic web is and if I can have one xOperator agent running for all of the users it would make it easier for users to personalize the bot for their own needs

In this case, agents cant communicate in their neighourhood cause they do not act as their users and other agents do not allow them to query them.

There are ideas to run only one xOperator to serve many users but in a more independent way (complete separate configs, scripts and files). But for us, these dev-direction has a low priority for now. A first step is scheduled for 0.3 as issue 29: Scripts to install xOperator as a service.
Wahoo :^)
 


   If anyone has any ideas or feedback about these things I'm practically starving for information.  Thanks for all the work ya'll have put into xOperator (as well as the other AKSW projects)!  They've inspired me :^)




--
Sebastian Dietzold - Department of Computer Science; University of Leipzig
Tel/Fax: +49 341 97 323-66/-29 http://bis.uni-leipzig.de/SebastianDietzold

     Thanks for the great reply!  I'm becoming more and more confident that this will work :^)

--
"Without rules there  is no game for it is by the rules the game is defined."
         --SOv

Jörg Unbehauen

unread,
Nov 3, 2008, 2:49:31 PM11/3/08
to xope...@googlegroups.com
Hi!

i got some things to add and remark. Find them inline.

2008/11/1 David Alston <david....@gmail.com>:


> Greetings!
>
> Responses inline..
>
> On Thu, Oct 30, 2008 at 7:45 AM, Sebastian Dietzold
> <diet...@informatik.uni-leipzig.de> wrote:
>>
>> quote david.alston (29.10.2008):
>>
>> Sounds great. In which university do you work?
>
> University of Texas at Dallas in the US.
>

Great!

With groovy you can execute non sparql queries inside a script as
well. basically you could just instantiate your own application server
in a script, which would be nonsense, but possible.


>
> Being able to use Java API's in Groovy is good news! I guess I'll have to
> learn a bit of java too.. :^)
>
>
>> ) some sort of encryption for queries leaving the bot
>>
>> You mean SPARQL Endpoints which use HTTPS?
>
> Yes! That's what I would like to do.. I just don't know how the SPARQL
> query clients (eg. the IM bot) will handle the certificates. I have some
> experience dealing with certs, but I haven't seen any documentation saying
> how a SPARQL client might use them.

Querying an endpoint that uses https should not be a problem at all,
at least for the client and as long as the certificate is valid. The
library used (http-client from apache commons) should handle it, see:
http://hc.apache.org/httpclient-3.x/sslguide.html

Creating a filter based upon the jid should be no problem and will
most likely implemented in the next few days, all together with the
new access control system.

>
>>
>>> ) find some way to limit queries to only those SPARQL endpoints that have
>>> the information requested
>>> - perhaps an optional list of query-able attributes per namespace
>>> - if no attributes are listed for a namespace, then it always tries
>>> the query there
>>
>> these request are very hard to tackle. which type of queries do you want
>> to use and in which way should the agent fire the queries and answer the
>> user requests? Maybe you can manage this in your scripts?
>
> I suppose (after I learn Groovy) I'll be able to write my own version of
> this for our private SPARQL endpoints.. but this is the algorythm I was
> thinking of..
>
> 1) translate AIML to SPARQL query
> 2) run SPARQL query against end-point
> 3) if there is an error, and there are other endpoints, then switch to next
> endpoint and goto step 2.
> 4) if no end-point returns success then print "No End-Points can answer your
> query"
> 5) if the user has specified a "debug" state, then print all the error
> messages from the queries back to the user

Having a fall-back sounds like a good idea to me, but if the endpoints
are identical, then may be some kind of load balancing on the server
side would be better, as the client is currently not able to
efficiently realize that one server is gone/ is back.

>
> Of course, this doesn't handle the case where the user might want to include
> data from multiple end-points in the same query.. in which case I imagine
> the algorythm would look like this..
>
> 1) use AIML to break down sentence into multiple query strings
> 2) run through the known endpoints with each query
> 3) store the "select" variables from the SPARQL queries in a hash table
> 4) run the SPARQL queries against each end-point until all the "select"
> variables have been filled
> 5) return the response string with the variables replaced with their values
>
> Obviously this last algorythm would have to be modified for queries that
> need to be presented in table form..


Combining multiple queries into one result could be a bit of work. We
have for example the where is * now template, in which we execute
multiple queries, but more in a step by step fashion (find the
calendar of a person, then query that calendar, shows only the results
of the last query), but they are definitly doable and i would be happy
to help.

David Alston

unread,
Nov 6, 2008, 10:52:05 AM11/6/08
to xope...@googlegroups.com
Greetings!

     Responses inline..

On Mon, Nov 3, 2008 at 1:49 PM, Jörg Unbehauen <joerg.u...@googlemail.com> wrote:

Hi!

i got some things to add and remark. Find them inline.

2008/11/1 David Alston <david....@gmail.com>:
>

> On Thu, Oct 30, 2008 at 7:45 AM, Sebastian Dietzold
> <diet...@informatik.uni-leipzig.de> wrote:
>>
>> quote david.alston (29.10.2008):
> Yes!  That's what I would like to do.. I just don't know how the SPARQL

> query clients (eg. the IM bot) will handle the certificates.  I have some
> experience dealing with certs, but I haven't seen any documentation saying
> how a SPARQL client might use them.

Querying an endpoint that uses https should not be a problem at all,
at least for the client and as long as the certificate is valid. The
library used (http-client from apache commons) should handle it, see:
http://hc.apache.org/httpclient-3.x/sslguide.html


This is great!  I'm glad it's going to be fairly straightforward!  More reasons for me to learn java :^)

Creating a filter based upon the jid should be no problem and will
most likely implemented in the next few days, all together with the
new access control system.

Looks like I stumbled across the right project.. ya'll are active and going in the exact directions I need :^)
 
>
> I suppose (after I learn Groovy) I'll be able to write my own version of
> this for our private SPARQL endpoints.. but this is the algorythm I was
> thinking of..
>
> 1) translate AIML to SPARQL query
> 2) run SPARQL query against end-point
> 3) if there is an error, and there are other endpoints, then switch to next
> endpoint and goto step 2.
> 4) if no end-point returns success then print "No End-Points can answer your
> query"
> 5) if the user has specified a "debug" state, then print all the error
> messages from the queries back to the user

Having a fall-back sounds like a good idea to me, but if the endpoints
are identical, then may be some kind of load balancing on the server
side would be better, as the client is currently not able to
efficiently realize that one server is gone/ is back. 

maybe I could provide some sort of "presence notification" for each of the endpoints through XMPP.. sounds like a long-shot, though.. :^|



Combining multiple queries into one result could be a bit of work. We
have for example the where is * now template, in which we execute
multiple queries, but more in a step by step fashion (find the
calendar of a person, then query that calendar, shows only the results
of the last query), but they are definitly doable and i would be happy
to help.

This might be better done through AIML, then.. I know that you can use <srai> tags and split up the sentence into multiple SPARQL queries.. it will mean manually editing the AIML files, but that isn't difficult :^)

Reply all
Reply to author
Forward
0 new messages