Bug on Dataset search vs Datavese search ?

Skip to first unread message

Mullai at UM

Sep 9, 2016, 3:46:11 PM9/9/16
to Dataverse Users Community

We are using Dataverse version 4.3.1, 

The Search (Find) done from a Dataset page returns “There are no files in this dataset” . For the search to work on the Dataset page the search term should exactly match the metadata of the file, even
a comma or a blank space must also match.

For example:. 
- The Description field of a File has “Alberta, semi-urban, blackfoot, female”. 
- From Dataset page - Files tabs, searching for the exact term would return results. But, if we even skip a comma or even a blank space from the above search term, we get no results.

Similar search done from the Dataverse level, the results are good. The search from the Dataverse level returns results matching every word combined or separate, from the search term. 
Why is this happening, is it a feature or a bug? 


Philip Durbin

Sep 12, 2016, 8:44:47 AM9/12/16
to dataverse...@googlegroups.com
The reason you are seeing a difference in behavior from "Search" (or "Advanced Search") at the dataverse level vs. "Find" at the dataset level is that the former is powered by Solr, which (as a search engine) is friendlier and more forgiving about user input and the latter is powered by a SQL query*, which is less forgiving.

To give another example, the Solr-powered search is able to find https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/B0NY7Y based on the following queries:

- "Dataset - How to survey about turnout stata 12.tab"
- "Dataset How to survey about turnout stata 12.tab" (missing hyphen)

(I didn't actually type the quotes above.)

However, the SQL query doesn't find the file/dataset if the hyphen is missing. (That server is running 4.5, by the way.)

Perhaps the takeaway or workaround is to be aware that if you're having trouble finding a file at the dataset level, you can always back up to the dataverse level and see if you have better luck searching from there.

I suspect that most users would prefer the more forgiving search behavior (you can always make it more strict by putting quotes around your query) and there's an existing issue called "Use Solr for file listing on dataset page" at https://github.com/IQSS/dataverse/issues/2455 but it's a significant amount of work. Anyway, you're welcome to open a more targeted GitHub issue if you'd like to capture the behavior above as a bug.

I hope this helps!


* I think in the code the method is called "findFileMetadataByDatasetVersionIdLabelSearchTerm" which is using "LIKE %searchTerm%" syntax.

You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse-community@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/191d3849-0fce-41fb-b7fa-9f3e820131b5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Message has been deleted
0 new messages