Eliot, James, Darren, John, all,
Sorry I was not able to attend the last meeting. But I was able to check things out a little bit and wanted to share some concerns:
1) Was running into a dependency issue with the rails pinned to the geo concerns gem in the vagrant app. It's just a temporary thing with the version, but here's a pull request with a patch that gets it running:
https://github.com/geoconcerns/geo-concerns-vagrant/pull/2
2) Was looking at the sprint schedule
https://github.com/projecthydra-labs/geo_concerns/milestone/4
Definitely interested in the "Export to Geoblacklight" ticket and have a few questions. a) Would this involve an replacement or mapping of the forms in geoConcerns to the geoblacklight schema? b) would it make sense or be possible to upload serialized metadata
(MODS,FGDC) and create a hook that auto populates these forms where applicable?
3) The primary use case I'm focused on is Sanborn maps. We have ~400 volumes of multimage maps. On the geoConcerns end of things how would that look in terms of PCDM? There is that nice diagram relating scanned maps, raster, and, vector. Given a works:image representing the volume, how would the individual images relate to that? We have a PDF of all the images in a volume, sometimes there is a "index" image for the volume, and then there are 1.n images in that volume. It seems to me the ideal representation in the discovery interface would be to search on the volume, and within the volume result return a list of all the child maps, along side a map where the bounding box appears on a zoomed-in map for the volume when hovering over a child in the volume list. I guess once related effectively in the ontology (hasMember maybe), this is more a geoBL view thing.
4) Also it would be tedious to upload ~400 volumes with dozens of images each in the interface, so a cvs or other form of automated upload would probably be a requirement, and again one ideally that could map metadata serializations to geoBL schema.
5) Some general CurationConcerns issues:
a) Trying to upload jpgs worked, trying to upload a large PDF (80Mb) resulted in a "failed to allocate memory" error.
b) I guess this is just the way PCDM works, but loading one scanned map with 2 images and a metadata file resulted in 42 solr documents which seems like a lot of overhead, see:
https://gist.github.com/yulgit1/a202eeb87e20ca078bc488c2f264c75c
Most of them related to access control documents that don't seem to control anything (everything I uploaded is open). When this does get integrated with geoBL schema, not sure how the PCDM overhead would relate to the geoBL schema doc. I guess there would be a link from the PCDM work to the schema doc? Or maybe the PCDM work doc would be the schema doc?
Anyway sorry I don't have more time to devote to this but wanted to raise these cases to this group, do you share these concerns? If it helps maybe I can break these down into some tickets?
Thanks,
Eric
--
You received this message because you are subscribed to the Google Groups "Hydra GIS Working Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hydra-gis-working-group+unsub...@googlegroups.com.
To post to this group, send email to hydra-gis-working-group@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hydra-gis-working-group/SN1PR08MB187002DB7052C43ECF28003BF2150%40SN1PR08MB1870.namprd08.prod.outlook.com.
For more options, visit https://groups.google.com/d/optout.
Thank you John and Darren, this is very helpful. Some follow up:
This example is great, I am interested in doing something similar:
https://earthworks.stanford.edu/catalog/stanford-vj008bs4183
Along with this in some cases there are actually scanned index-maps that would be nice to display. So along with the vector index, would like to show this image (maybe through a link, or on the show page itself, or maybe by clicking on the map outside of the designated squares). I guess that's all configurable with geoblacklight.
And, again, I apologize for being skeptical, but noticed this generated through geoconcerns:
So for one described image, not even getting into adding raster or vector, results in 26 docs. Does that scale (say 1000000 images x 26 records/image)? I guess I just would like confirmation that yes that's how PCDM works and there shouldn't be any scaling issues.
Furthermore, regarding the geoblacklight integration with geoconcern, is the plan for geoblacklight to run on this solr index created by geoconcerns? IE. the solr index that will have these 26 PCDM records + what I'd suspect would also be the 1 geoblacklight schema doc? In other words is geoblacklight integrated to use the PCDM generated documents (access/permissions, filesets, lists,etc), or does it just care about the geoblacklight schema document? If so, what predicate would be used for linking (of say an type:Image Work to the geoblacklight schema doc). Also, once integrated with geobl schema, is the direction for the metadata in the form to be replaced by the geobl schema completely, or will it remain in the hydraworks Work as is (and the geobl schema will just be it's own doc, generated by a mapping of this Basic+additional metadata to it)?
Thanks,
Eric