some concerns re geoconcerns

4 views
Skip to first unread message

James, Eric

unread,
Aug 18, 2016, 12:55:06 PM8/18/16
to hydra-gis-working-group

Eliot, James, Darren, John, all,


Sorry I was not able to attend the last meeting.  But I was able to check things out a little bit and wanted to share some concerns:


1) Was running into a dependency issue with the rails pinned to the geo concerns gem in the vagrant app. It's just a temporary thing with the version, but here's a pull request with a patch that gets it running:

https://github.com/geoconcerns/geo-concerns-vagrant/pull/2

2) Was looking at the sprint schedule

https://github.com/projecthydra-labs/geo_concerns/milestone/4
Definitely interested in the "Export to Geoblacklight" ticket and have a few questions. a) Would this involve an replacement or mapping of the forms in geoConcerns to the geoblacklight schema? b) would it make sense or be possible to upload serialized metadata (MODS,FGDC) and create a hook that auto populates these forms where applicable?


3) The primary use case I'm focused on is Sanborn maps.  We have ~400 volumes of multimage maps.  On the geoConcerns end of things how would that look in terms of PCDM?  There is that nice diagram relating scanned maps, raster, and, vector.  Given a works:image representing the volume, how would the individual images relate to that? We have a PDF of all the images in a volume, sometimes there is a "index" image for the volume, and then there are 1.n images in that volume.  It seems to me the ideal representation in the discovery interface would be to search on the volume, and within the volume result return a list of all the child maps, along side a map where the bounding box appears on a zoomed-in map for the volume when hovering over a child in the volume list.  I guess once related effectively in the ontology (hasMember maybe), this is more a geoBL view thing.


4) Also it would be tedious to upload ~400 volumes with dozens of images each in the interface, so a cvs or other form of automated upload would probably be a requirement, and again one ideally that could map metadata serializations to geoBL schema.


5) Some general CurationConcerns issues: 

a) Trying to upload jpgs worked, trying to upload a large PDF (80Mb) resulted in a "failed to allocate memory" error. 

b) I guess this is just the way PCDM works, but loading one scanned map with 2 images and a metadata file resulted in 42 solr documents which seems like a lot of overhead, see: 

https://gist.github.com/yulgit1/a202eeb87e20ca078bc488c2f264c75c

Most of them related to access control documents that don't seem to control anything (everything I uploaded is open).  When this does get integrated with geoBL schema, not sure how the PCDM overhead would relate to the geoBL schema doc.  I guess there would be a link from the PCDM work to the schema doc? Or maybe the PCDM work doc would be the schema doc?


Anyway sorry I don't have more time to devote to this but wanted to raise these cases to this group, do you share these concerns?  If it helps maybe I can break these down into some tickets?


Thanks,

Eric




 


Darren Hardy Ph.D.

unread,
Aug 18, 2016, 6:48:19 PM8/18/16
to James, Eric, hydra-gis-working-group
Hi Eric,

Great comments. For (2a) our plan I think is to map the GeoConcerns schema (which uses CurationConcerns’ schema) into the GeoBlacklight schema. So, for example, if you look here:

https://github.com/projecthydra/curation_concerns/blob/master/app/models/concerns/curation_concerns/basic_metadata.rb

this is where much of the GeoBlacklight metadata will be mapped from.

For (2b), we have this functionality implemented but it’s very difficult to find in the UI now. See the "Populate metadata from FGDC” section here:

http://geoconcerns.github.io/tutorial/2016/06/07/create-a-raster-work.html

Maybe this is what you had in mind?

For (3) the discovery interface questions for relating child maps back to the parent and such are open right now, and are being discussed in a few places:

https://github.com/geoblacklight/geoblacklight/issues/459
https://github.com/geoblacklight/geoblacklight/issues/412
https://github.com/geoblacklight/geoblacklight/issues/406
https://github.com/geoblacklight/geoblacklight/issues/405

There’s also a GeoBlacklight development Slack channel here: https://geoblacklight.slack.com/

Hope that helps some,
-Darren
> gist:a202eeb87e20ca078bc488c2f264c75c
> gist.github.com
> property=
>
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups "Hydra GIS Working Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to hydra-gis-working...@googlegroups.com.
> To post to this group, send email to hydra-gis-w...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/hydra-gis-working-group/SN1PR08MB187002DB7052C43ECF28003BF2150%40SN1PR08MB1870.namprd08.prod.outlook.com.
> For more options, visit https://groups.google.com/d/optout.


John Huck

unread,
Aug 18, 2016, 6:50:40 PM8/18/16
to James, Eric, hydra-gis-working-group
Hi Eric,

Thank you for your questions! I will not be able to answer all of them, but I did want to comment that your case with the Sanborn map volumes is very interesting, and possibly outside of the current scope of model (others may disagree with me, I don't know). I think we talked about what to do in these cases as something we might need to tackle in a subsequent iteration of the model (if undertaken).

Essentially, it's an atlas problem, where you have multiple separate maps (multi-file, it sounds like) that it sounds like don't constitute parts of a whole single work in the way that sheets in a map series make up a single 'map'.

As a cataloguer, when I catalogue an atlas, the description doesn't get below the level of the aggregation, but in the wonderful world of digitization, we can now provide direct access to the individual maps inside, but that presents the new problem of how to represent this, because if you give 'article level' access (by way of analogy), you need to have some kind of description for each map. And the scale of your project makes pretty clear that this is not a small question: 400 volumes, I would imagine hundreds of maps in each volume.

As I understand PCDM, if you were modelling a multi-page book, each page would be a PCDM work, with a fileset which included the various files (low-res scans, hi-res scans, ocr, etc.), then you would have a PCDM work at a higher level that had all the page works as members. That higher level work is the piece that I think we are missing from our model.

To use our model as it is now and group the maps by volume, it seems to me that you would need to put all the files in the fileset for the volume 'work' and then figure out how to identify images that needed to be put together to form a single image.

Those are just some of my initial thoughts about the question.

John

--
You received this message because you are subscribed to the Google Groups "Hydra GIS Working Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hydra-gis-working-group+unsub...@googlegroups.com.
To post to this group, send email to hydra-gis-working-group@googlegroups.com.



--
John Huck
Metadata & Cataloguing Librarian
University of Alberta Libraries
5-25E Cameron Library

James, Eric

unread,
Aug 24, 2016, 12:33:05 PM8/24/16
to hydra-gis-working-group

Thank you John and Darren, this is very helpful.  Some follow up:


This example is great, I am interested in doing something similar:

https://earthworks.stanford.edu/catalog/stanford-vj008bs4183


Along with this in some cases there are actually scanned index-maps that would be nice to display. So along with the vector index, would like to show this image (maybe through a link, or on the show page itself, or maybe by clicking on the map outside of the designated squares).  I guess that's all configurable with geoblacklight.


And, again, I apologize for being skeptical, but noticed this generated through geoconcerns:

create a work = 5 records
add an image to work = 17 records
add an mods to work = 26 records


So for one described image, not even getting into adding raster or vector, results in 26 docs.  Does that scale (say 1000000 images x 26 records/image)? I guess I just would like confirmation that yes that's how PCDM works and there shouldn't be any scaling issues.


Furthermore, regarding the geoblacklight integration with geoconcern, is the plan for geoblacklight to run on this solr index created by geoconcerns?  IE. the solr index that will have these 26 PCDM records +  what I'd suspect would also be the 1 geoblacklight schema doc?  In other words is geoblacklight integrated to use the PCDM generated documents (access/permissions, filesets, lists,etc), or does it just care about the geoblacklight schema document?  If so, what predicate would be used for linking (of say an type:Image Work to the geoblacklight schema doc).  Also, once integrated with geobl schema, is the direction for the metadata in the form to be replaced by the geobl schema completely, or will it remain in the hydraworks Work as is (and the geobl schema will just be it's own doc, generated by a mapping of this Basic+additional metadata to it)?


Thanks,

Eric


From: John Huck <john...@ualberta.ca>
Sent: Thursday, August 18, 2016 6:50:39 PM
To: James, Eric
Cc: hydra-gis-working-group
Subject: Re: some concerns re geoconcerns
 
Reply all
Reply to author
Forward
0 new messages