Starting work on Feature #6876

65 views
Skip to first unread message

Steve Breker

unread,
Feb 26, 2015, 5:12:20 PM2/26/15
to ica-ato...@googlegroups.com
Hello!

I am working through the process of adding a new Repository facet to the Authority record browse and Authority record view pages and am hoping to get some input as to the best way to accomplish this.    I am hoping that you can let me know if I am on the right track here or not!   :)

As far as putting a facet on the Authority record view page, I have not yet begun to look at this in detail.  Any tips on this would be appreciated!


With regard to adding a Repository facet to the Auth record browse screen, I will need to:

1) add a property 'repository_id' in the mappings.yml file for the 'Actor' type.  e.g.:

  actor:
    _attributes:
      i18n: true
      timestamp: true
      autocompleteFields: [authorizedFormOfName]
      rawFields:  [authorizedFormOfName]
    dynamic: strict
    properties:
      slug: { type: string, index: not_analyzed }
      description_identifier: { type:string, index: not_analyzed }
      entity_type_id: { type: integer, index: not_analyzed, include_in_all: false }
      repository_id: { type: integer, index: not_analyzed, include_in_all: false }

I had considered trying to add the Repository type as a "_foreign_types" to the Actor type in mappings.yml but I don't think this gets me what I want (a repository facet) so I abandoned this.

2) update the plugins/arElasticSearchPlugin/lib/model/arElasticSearchActorPdo.class.php (and not arElasticSearchActor.class.php) to associate the repository ID with the QubitActor in Elasticsearch.   This would leverage what I build in (3) below to get all the repository_id's and serialize them.   Is the pdo class the correct place to put this?

3) add a method to lib/model/QubitActor.php that will return the repositories that are linked to the Actor via its relationships.  (The fonds are associated to the Actor using the "Relationships Area" when an Actor is edited.   SQL that does what I think it should do is as follows (this is an e.g. I was using to get the repos for a test Actor):

SELECT r2.* FROM `actor`
left join relation on relation.object_id = actor.id
left join event on event.actor_id = actor.id
left join information_object on information_object.id = relation.subject_id
left join information_object as t2 on event.information_object_id = t2.id
left join repository on repository.id = information_object.repository_id
left join repository as r2 on r2.id = t2.repository_id
where actor.id = 42326


4) My initial thoughts are that code will have to be added to facilitate the dropping of the Relationships so that if a Repository is removed from the Actor in the Names detail window, that this is removed from Elasticsearch.   Do I have this correct?   I am not sure yet where this needs to be added.



Thanks,
Steve

David at Artefactual

unread,
Mar 5, 2015, 1:59:21 PM3/5/15
to ica-ato...@googlegroups.com
Hi Steve,

Thanks for asking on-list for feedback about this feature, and sorry for the delayed response.

My initial thoughts is that I would *love* for this to be done in conjunction with issue #4266 which would involve linking an authority record to a "controlling" institution (or institutions) at the database level.  One of the problems we often have with Union lists created in AtoM is multiple institutions having their own biographic / administrative history for a prominent historical person or organization.  The way csv import currently works in AtoM 2.1 the repository_id is being checked on import to prevent one institution's data from stomping another institutions data for the same person or organization, BUT in the AtoM UI the link between the controlling institution and authority record must be inferred via the institutions holdings (archival descriptions) which is not super transparent and can be computationally expensive to trace.

Anyway, this is not necessarily germane to your question, but I wanted to raise it for consideration.  I've added some more directly relevant comments in-line below in green.


On Thursday, February 26, 2015 at 2:12:20 PM UTC-8, Steve Breker wrote:
Hello!

I am working through the process of adding a new Repository facet to the Authority record browse and Authority record view pages and am hoping to get some input as to the best way to accomplish this.    I am hoping that you can let me know if I am on the right track here or not!   :)

As far as putting a facet on the Authority record view page, I have not yet begun to look at this in detail.  Any tips on this would be appreciated!


With regard to adding a Repository facet to the Auth record browse screen, I will need to:

1) add a property 'repository_id' in the mappings.yml file for the 'Actor' type.  e.g.:

  actor:
    _attributes:
      i18n: true
      timestamp: true
      autocompleteFields: [authorizedFormOfName]
      rawFields:  [authorizedFormOfName]
    dynamic: strict
    properties:
      slug: { type: string, index: not_analyzed }
      description_identifier: { type:string, index: not_analyzed }
      entity_type_id: { type: integer, index: not_analyzed, include_in_all: false }
      repository_id: { type: integer, index: not_analyzed, include_in_all: false }

I had considered trying to add the Repository type as a "_foreign_types" to the Actor type in mappings.yml but I don't think this gets me what I want (a repository facet) so I abandoned this.

I think the "_foreign_types" are specifically for data entities that are related in the database via a foreign key relation.  I haven't worked with ES in AtoM much, but your configuration above looks good to me.
 

2) update the plugins/arElasticSearchPlugin/lib/model/arElasticSearchActorPdo.class.php (and not arElasticSearchActor.class.php) to associate the repository ID with the QubitActor in Elasticsearch.   This would leverage what I build in (3) below to get all the repository_id's and serialize them.   Is the pdo class the correct place to put this?

Yes,  arElasticSearchActorPdo.class.php is the correct place to add your code. arElasticSearchActor.class.php is used to find all actors in the database and add them to the search index via the arElasticSearchActorPdo class.

You should be using Elasticsearch's array value ability here rather then serializing to a string of repository ids.  This may be what you meant, but I just want to be sure.


3) add a method to lib/model/QubitActor.php that will return the repositories that are linked to the Actor via its relationships.  (The fonds are associated to the Actor using the "Relationships Area" when an Actor is edited.   SQL that does what I think it should do is as follows (this is an e.g. I was using to get the repos for a test Actor):

SELECT r2.* FROM `actor`
left join relation on relation.object_id = actor.id
left join event on event.actor_id = actor.id
left join information_object on information_object.id = relation.subject_id
left join information_object as t2 on event.information_object_id = t2.id
left join repository on repository.id = information_object.repository_id
left join repository as r2 on r2.id = t2.repository_id
where actor.id = 42326

This is a really complex query, and I think it will miss any repository_ids that are linked via the first join (relation table) on the information_object (you are only returning the r2, event-linked, repository rows).  I've attempted to write one big query that does what you want, but I haven't had any luck.  I would suggest doing two separate queries and creating a union list of repository_ids programmatically. E.g. (in pseudo-code)

# Get information objects related to actor via name access points (relation table)

$r1 = SELECT t1.repository_id FROM `actor`

left join relation on relation.object_id = actor.id
left join information_object as t1 on relation.subject_id = t1.id;

# Get information objects related to actor via event
$r2 = SELECT t2.repository_id FROM `actor`

left join event on event.actor_id = actor.id
left join information_object as t2 on event.information_object_id = t2.id;

# Join the two together
$runion = array_unique($r1, $r2);


 


4) My initial thoughts are that code will have to be added to facilitate the dropping of the Relationships so that if a Repository is removed from the Actor in the Names detail window, that this is removed from Elasticsearch.   Do I have this correct?   I am not sure yet where this needs to be added.
 
I don't understand.  In AtoM there is currently no way to link an actor directly to a repository, so I don't know what you mean by "
if a Repository is removed from the Actor in the Names detail window". Can you clarify?

Regards,
David
Juhasz
Reply all
Reply to author
Forward
0 new messages