PostgreSQL Queries for DSpace 6.2

628 views
Skip to first unread message

Stacy Pennington

unread,
Nov 16, 2017, 3:03:07 PM11/16/17
to DSpace Technical Support
Does anyone out there have some useful PostgreSQL queries already written for DSpace 6.2?

I've found several sets of PostgreSQL queries that were useful for earlier versions of DSpace, but most of them appear to be written between the versions 1.8 and 4.x, and the database changes from the past few years, particularly the UUID use, make many of them not work today. My attempts to repair them and make them work for 6.2 haven't gone well.

What I'm actually looking for is a query that would provide a hierarchical list of communities, sub-communities, and collections in simple list, probably with just 2-space character indention at each successive sub-level, include the number of items at each level. Basically, think of something like what you get when you click Browse > Communities & Collections in the JSPUI interface but without the images and intro text of each community/collection and with the number of items for each level.

(Basically, we need this because we are trying to flatten and uncomplicate our current DSpace topology, and we have over 500 collections, so getting an overview or monitoring our progress is really difficult.)

Before I dive into this and try to create one of these SQL queries on my own, it seems like a good idea to see if anyone has already done it.

Thanks.

Stacy

Terry Brady

unread,
Nov 16, 2017, 6:04:44 PM11/16/17
to DSpace Technical Support
Do you have the REST API enabled?  The following endpoint will provide a dump of the hierarchy that is easy to parse.


If that does not help, here are some SQL that might help.

with recursive r_comm2coll as (
  select 
    community_id, 
    collection_id
  from 
    community2collection
  union
  select 
    cm2cm.parent_comm_id as community_id, 
    r.collection_id
  from 
    r_comm2coll r
  inner join
    community2community cm2cm
  on
    cm2cm.child_comm_id = r.community_id
)

select distinct
   commh.handle as CommHandle,
   collh.handle as CollHandle,
   (
    select text_value 
from metadatavalue mv 
where 
  mv.resource_id=commh.resource_id and mv.resource_type_id=4
  and metadata_field_id = (
    select metadata_field_id
from metadatafieldregistry where element='title' and qualifier is null
  )
  ) as CommName, 
  (
    select text_value 
from metadatavalue mv 
where 
  mv.resource_id=collh.resource_id and mv.resource_type_id=3
  and metadata_field_id = (
    select metadata_field_id
from metadatafieldregistry where element='title' and qualifier is null
  )
  ) as CollName 

from
  handle commh
inner join r_comm2coll
  on r_comm2coll.community_id=commh.resource_id and commh.resource_type_id=4
inner join handle collh
  on r_comm2coll.collection_id=collh.resource_id and collh.resource_type_id=3
order by CommHandle,CollHandle
  
  
 





--
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech+unsubscribe@googlegroups.com.
To post to this group, send email to dspac...@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.



--
Terry Brady
Applications Programmer Analyst
Georgetown University Library Information Technology
425-298-5498 (Seattle, WA)

Stacy Pennington

unread,
Nov 17, 2017, 3:12:50 PM11/17/17
to DSpace Technical Support
Terry,

Thanks for the excellent advice regarding using /rest/hierarchy. I was able to get a (long) list that could be easily cleaned up in Excel and used to track the flattening of the hierarchy.

I only started using the REST API a few months ago, but I'm really excited about what it can do. We're successfully using it with Omeka S to feed items in DSpace communities and collections to Omeka S. It is just so handy, especially with authenticated REST use in version 6.

As for the query you sent, I get this error in PostgreSQL when I run it:

ERROR:  column mv.resource_id does not exist
LINE 26:    mv.resource_id=commh.resource_id and mv.resource_type_id=...
            ^
********** Error **********

ERROR: column mv.resource_id does not exist
SQL state: 42703
Character: 478

It looks like the metadatavalue in my 6.2 doesn't have a resource_id column (or resource_type column, for that matter).

Do you think something is wrong with my metadatavalue table? Or is your metadatavalue table from an earlier version of DSpace?

Is there another way to look up these CommName and CollName values?

Thanks.

Stacy
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech...@googlegroups.com.

To post to this group, send email to dspac...@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

Terry Brady

unread,
Nov 17, 2017, 3:54:25 PM11/17/17
to DSpace Technical Support
Stacy,

I just realized that I tested the query in my DSpace 5 instance.  Try the following instead.

with recursive r_comm2coll as (
  select 
    community_id, 
    collection_id
  from 
    community2collection
  union
  select 
    cm2cm.parent_comm_id as community_id, 
    r.collection_id
  from 
    r_comm2coll r
  inner join
    community2community cm2cm
  on
    cm2cm.child_comm_id = r.community_id
)

select distinct
   commh.handle as CommHandle,
   collh.handle as CollHandle,
   (
    select text_value 
from metadatavalue mv 
where 
  mv.dspace_object_id=r_comm2coll.community_id
  and metadata_field_id = (
    select metadata_field_id
from metadatafieldregistry where element='title' and qualifier is null
  )
  ) as CommName, 
  (
    select text_value 
from metadatavalue mv 
where 
  mv.dspace_object_id=r_comm2coll.collection_id
  and metadata_field_id = (
    select metadata_field_id
from metadatafieldregistry where element='title' and qualifier is null
  )
  ) as CollName 

from
  handle commh
inner join r_comm2coll
  on r_comm2coll.community_id=commh.resource_id 
inner join handle collh
  on r_comm2coll.collection_id=collh.resource_id 
order by CommHandle,CollHandle

To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech+unsubscribe@googlegroups.com.

To post to this group, send email to dspac...@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

Stacy Pennington

unread,
Nov 21, 2017, 6:01:01 PM11/21/17
to DSpace Technical Support
Terry, sorry for the delayed reply.

I'm getting an error with that new query:

ERROR:  more than one row returned by a subquery used as an expression
********** Error **********

ERROR: more than one row returned by a subquery used as an expression
SQL state: 21000

Do you think it is likely that my DSpace database has some extraneous rows that could be causing this problem?

Stacy

Terry Brady

unread,
Nov 21, 2017, 6:22:39 PM11/21/17
to DSpace Technical Support
I imagine that there is one of 2 reasons why this did not work.  My query did not address these scenarios.

1. You have 2 schemas loaded that each have an element of title
2. You have titles in multiple languages


The following query has a placeholder for language commented out if that is the scenario.

If you still have trouble, you can uncomment the "limit 1" portions of the query to force the subqueries to return 1 row.

with recursive r_comm2coll as (
  select 
    community_id, 
    collection_id
  from 
    community2collection
  union
  select 
    cm2cm.parent_comm_id as community_id, 
    r.collection_id
  from 
    r_comm2coll r
  inner join
    community2community cm2cm
  on
    cm2cm.child_comm_id = r.community_id
)

select distinct
   commh.handle as CommHandle,
   collh.handle as CollHandle,
   (
    select text_value 
    from metadatavalue mv 
    where 
      /*text_lang like 'en%' and*/
      mv.dspace_object_id=r_comm2coll.community_id
      and metadata_field_id = (
        select metadata_field_id
        from metadatafieldregistry where element='title' and qualifier is null
        and metadata_schema_id = (
          select metadata_schema_id 
          from metadataschemaregistry 
          where short_id='dc'
        )
      )
    /*limit 1*/
  ) as CommName, 
  (
    select text_value 
    from metadatavalue mv 
    where 
      /*text_lang like 'en%' and*/
      mv.dspace_object_id=r_comm2coll.collection_id
      and metadata_field_id = (
        select metadata_field_id
        from metadatafieldregistry where element='title' and qualifier is null
        and metadata_schema_id = (
          select metadata_schema_id 
          from metadataschemaregistry 
          where short_id='dc'
        )
      )
    /*limit 1*/
  ) as CollName 

from
  handle commh
inner join r_comm2coll
  on r_comm2coll.community_id=commh.resource_id 
inner join handle collh
  on r_comm2coll.collection_id=collh.resource_id 
order by CommHandle,CollHandle
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech+unsubscribe@googlegroups.com.

To post to this group, send email to dspac...@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

Tim Donohue

unread,
Nov 22, 2017, 11:11:52 AM11/22/17
to Stacy Pennington, DSpace Technical Support
Hi Stacy,

Just as a quick FYI, some "helper SQL functions" have been gathered on the wiki at: https://wiki.duraspace.org/display/DSPACE/Helper+SQL+functions+for+DSpace+6

These functions make querying the DSpace 6 database a bit easier (see examples on that page).

- Tim

--
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech...@googlegroups.com.
To post to this group, send email to dspac...@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.
--
Tim Donohue
Technical Lead for DSpace & DSpaceDirect
DuraSpace.org | DSpace.org | DSpaceDirect.org

Stacy Pennington

unread,
Nov 22, 2017, 3:27:34 PM11/22/17
to DSpace Technical Support
Thanks, Terry! That altered query does work and provides a good, flattened list that can be used to compare topologies, as we go through the process of simplifying our overall structure.

Yes, we've been suffering with the en_us DC issue with mass metadata editing via CSV. We are in the process of cleaning that up and standardizing it.

Stacy

Stacy Pennington

unread,
Nov 22, 2017, 3:30:14 PM11/22/17
to DSpace Technical Support
Thanks, Tim. Those are some very useful DB functions for making it much easier to query DSpace's very normalized and efficient table definitions.

I'll definitely build my custom queries around those helper functions.

Stacy
Reply all
Reply to author
Forward
0 new messages