Extra Author Affiliations when browsing a dataverse (trailing spaces, etc) What to do?

58 views
Skip to first unread message

Nick Lauland

unread,
Sep 25, 2017, 4:52:17 PM9/25/17
to Dataverse Users Community
Hey guys, I was unable to find this in docs, sorry if I missed it...

We've had some users copy and paste extra spaces when submitting, which seem the sames to show the same affiliation multiple times.

We corrected the metadata in question and now the extra affiliation is shown in paranthesis. (see screenshots, note extra space isn't visible)

The correct behavior when making a change seems to be creating a new version, which it does.  But the display seems odd.

I'd like to be able to fix this typo as if it never occurred, but I'm guessing that opens a whole ball of wax as far as preserving all changes, etc.


Thanks for any hints!

utexas_example.png
txstate_example.png
Message has been deleted

julian...@g.harvard.edu

unread,
Sep 25, 2017, 6:07:00 PM9/25/17
to Dataverse Users Community
Hi Nick,

I tested this on Demo Dataverse, where leading and trailing spaces in the author affiliation field on the dataset create page are being trimmed from what I can tell.

But since people are copying and pasting, maybe they're pasting other invisible characters, which aren't being trimmed. I tried with the unicode character U+3000, which wasn't trimmed in Demo Dataverse, so 
 Harvard University
is treated as one value, while  
 Harvard University
and 
Harvard University
are both treated as one value that is different from the first.

Maybe Dataverse should trim those characters as well?

Could you write any more about why you'd like to be able to fix this typo as if it never occurred?

Sherry Lake

unread,
Sep 26, 2017, 9:05:11 AM9/26/17
to Dataverse Users Community
I don't think this is a space issue. It is a change in how Dataverse represents affiliations. I noticed parenthesis around some of my affiliations, but not others, older ones. Not sure which upgrade made this change but now dataverse considers affiliations without the parens NOT the same as with the parens (as related to the facets). 

Here is an example: Here you will see both "University of Virginia Library" and "(University of Virginia Library)" in the facets. You can see in the search results the top affiliation has parens, the bottom ones not:

We talk more about this on the call today.

--
Sherry Lake
Screen Shot 2017-09-26 at 8.57.05 AM.png

Sherry Lake

unread,
Sep 26, 2017, 9:07:35 AM9/26/17
to Dataverse Users Community
Oh, forgot to add this one screen shot. It's from Harvard's dataverse with my search:    authorAffiliation:"University of Virginia"
You can see parens and not parens around affiliaction in the search results:
Screen Shot 2017-09-26 at 9.05.30 AM.png

Philip Durbin

unread,
Sep 26, 2017, 9:40:19 AM9/26/17
to dataverse...@googlegroups.com
I'm pretty sure this is a bug that was introduced* in Dataverse 4.7.1. Is the pattern that only datasets created since your deployed 4.7.1 are affected?

Nick or Sherry, can you please open a GitHub issue about this? I'm actually having trouble reproducing the bug on the latest code in the "develop" branch so if you can include the steps in the GitHub issue, it would be most appreciated. I doubt is was magically fixed in the latest code. I probably just don't know how to trigger the bug.

Thanks!

Phil

* Perhaps the change from dsfv.getValue() to dsfv.getDisplayValue() in https://github.com/IQSS/dataverse/pull/3973 but I'm really not sure.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse-community@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/d7cc6602-7f2c-4d41-9803-8a1f3a0fcee9%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--

Sherry Lake

unread,
Sep 26, 2017, 3:04:18 PM9/26/17
to Dataverse Users Community
Phil,

I haven't opened a github issue yet, but will. I tried to replicate this on phoenix, but the affiliation looks okay there.

Here's an observation of what we are seeing with a search on Harvard, go to this URL on Harvard:


Notice in the public search results display, datasets created since Aug 7th have parens around author affiliation AND contact affiliation. 
Datasets (in this list) created on Mar 19, 2017 and earlier do not have parens. 

UVa just deployed 4.7.1 this week, so the bug was not introduced with that version. UVa went form 4.6.1 to 4.7.1 last week (up to 4.7 and then 4.7.1 yesterday). We were seeing the parens since I think 4.6.1.

--
Sherry

Kraffmiller, Stephen E

unread,
Sep 26, 2017, 3:24:36 PM9/26/17
to dataverse...@googlegroups.com
Hi Sherry,

I saw the issue in my local environment because I don’t drop my database as often as Phil does.  The adding of parens to facets is not happening in the just released version 4.8.  In order to completely fix the view of the facets you’ll need to upgrade to 4.8, drop your indexes and re-index.  

If that’s a satisfactory solution please let us know.

Thanks

Steve



--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/4eda0122-3cbd-4512-b850-60efb1cad86e%40googlegroups.com.

Sherry Lake

unread,
Sep 26, 2017, 3:49:54 PM9/26/17
to Dataverse Users Community
This is OK for me, UVa. But Nick at TDL was the one who started this thread; should ask him.


On Tuesday, September 26, 2017 at 3:24:36 PM UTC-4, Steve Kraffmiller wrote:
Hi Sherry,

I saw the issue in my local environment because I don’t drop my database as often as Phil does.  The adding of parens to facets is not happening in the just released version 4.8.  In order to completely fix the view of the facets you’ll need to upgrade to 4.8, drop your indexes and re-index.  

If that’s a satisfactory solution please let us know.

Thanks

Steve


On Sep 26, 2017, at 3:04 PM, Sherry Lake <shla...@gmail.com> wrote:

Phil,

I haven't opened a github issue yet, but will. I tried to replicate this on phoenix, but the affiliation looks okay there.

Here's an observation of what we are seeing with a search on Harvard, go to this URL on Harvard:


Notice in the public search results display, datasets created since Aug 7th have parens around author affiliation AND contact affiliation. 
Datasets (in this list) created on Mar 19, 2017 and earlier do not have parens. 

UVa just deployed 4.7.1 this week, so the bug was not introduced with that version. UVa went form 4.6.1 to 4.7.1 last week (up to 4.7 and then 4.7.1 yesterday). We were seeing the parens since I think 4.6.1.

--
Sherry

On Tuesday, September 26, 2017 at 9:40:19 AM UTC-4, Philip Durbin wrote:
I'm pretty sure this is a bug that was introduced* in Dataverse 4.7.1. Is the pattern that only datasets created since your deployed 4.7.1 are affected?

Nick or Sherry, can you please open a GitHub issue about this? I'm actually having trouble reproducing the bug on the latest code in the "develop" branch so if you can include the steps in the GitHub issue, it would be most appreciated. I doubt is was magically fixed in the latest code. I probably just don't know how to trigger the bug.

Thanks!

Phil

* Perhaps the change from dsfv.getValue() to dsfv.getDisplayValue() in https://github.com/IQSS/dataverse/pull/3973 but I'm really not sure.



--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
Message has been deleted

Nick Lauland

unread,
Sep 27, 2017, 4:19:19 PM9/27/17
to Dataverse Users Community
Ok, I dropped and recreated the database on a test machine...is that how you're rebuilding the indices?
No help, but got me thinking...What are those parenthesis actually meant for?
I have been able to find anything but I might be looking for the wrong terms.
If they're useful I can just educate the users.

Kraffmiller, Stephen E

unread,
Sep 27, 2017, 4:25:30 PM9/27/17
to dataverse...@googlegroups.com
Hi Nick,

 It involves refreshing Solr, not the Dataverse database.

The parenthesis are for display of the dataset metadata on the Dataset Page. We didn’t intend to add them to the search facets.

Steve




--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

To post to this group, send email to dataverse...@googlegroups.com.

Nick Lauland

unread,
Sep 28, 2017, 5:14:12 PM9/28/17
to Dataverse Users Community
Some progress, after the re-index the "extras" are all collapsed into a single entry per affiliation.  The only downside is they now ALL have parenthesis, but consistency is the main thing, and that's mostly cosmetic.  (see attached from test server). FYI, here at TDL we use the affiliation for whom the item belongs to (U.T. Texas, Baylor, A&M, etc), in addition to their own top level Dataverses.
tdl-test-affiliations.png
Reply all
Reply to author
Forward
0 new messages