Dataverse / dataset search

46 views
Skip to first unread message

Erin MacPherson

unread,
Sep 26, 2018, 2:20:25 PM9/26/18
to Dataverse Users Community
Hi all,

What fields does Dataverse query when someone does a basic search? Is it all fields in the record, or just the browse/search facets that have been selected? I realize you can do a more specific search in the advanced search, but want to know what fields would be queried in the basic search.
Thank you,
Erin


Philip Durbin

unread,
Sep 26, 2018, 3:52:28 PM9/26/18
to dataverse...@googlegroups.com
Hi Erin,

Basic search covers most fields, 236 of them based on the latest code on my laptop but this includes custom metadata blocks used by Harvard Dataverse.

Basic search is powered by a "catch all" field in Solr (the search engine software used by Dataverse) that's called "_text_" but you don't search on this field directly. Just doing a basic search looks in this "catch all" field.

The way that Dataverse populates this "catch all" field is by using Solr's "copyField" feature to copy lots and lots of Dataverse metadata fields into the "catch all" field.

To see the list of fields searched by basic search, you could look in the Solr config[1] for "copyField" and "_text" and exclude comments (<!-- -->) like this:

grep copyField conf/solr/7.3.0/schema.xml | grep _text_ | grep -v '<!--'

I'll include a list of the fields covered by basic search below[2] using this "grep" (and some cleanup).

There are more details I could go into but I hope this helps. If this isn't clear, please let me know!

Thanks,

Phil


2. List of fields covered by basic search (includes custom metadata blocks used by Harvard Dataverse)

- description
- variableName
- variableLabel
- dvSubject
- dvAffiliation
- dsPersistentId
- name
- fileType
- fileNameWithoutExtension
- ARCS1
- ARCS2
- ARCS3
- ARCS4
- ARCS5
- PSRI1
- PSRI10
- PSRI11
- PSRI2
- PSRI3
- PSRI4
- PSRI5
- PSRI6
- PSRI7
- PSRI8
- PSRI9
- accessToSources
- actionsToMinimizeLoss
- alternativeTitle
- alternativeURL
- astroFacility
- astroInstrument
- astroObject
- astroType
- author
- authorAffiliation
- authorIdentifier
- authorIdentifierScheme
- authorName
- characteristicOfSources
- city
- classificationSchemaCHIA
- cleaningOperations
- collectionMode
- collectorTraining
- contributor
- contributorName
- contributorType
- controlOperations
- country
- coverage.Depth
- coverage.ObjectCount
- coverage.ObjectDensity
- coverage.Polarization
- coverage.Redshift.MaximumValue
- coverage.Redshift.MinimumValue
- coverage.RedshiftValue
- coverage.SkyFraction
- coverage.Spatial
- coverage.Spectral.Bandpass
- coverage.Spectral.CentralWavelength
- coverage.Spectral.MaximumWavelength
- coverage.Spectral.MinimumWavelength
- coverage.Spectral.Wavelength
- coverage.Temporal
- coverage.Temporal.StartTime
- coverage.Temporal.StopTime
- dataCollectionSituation
- dataCollector
- dataSources
- datadePublicao
- datasetContact
- datasetContactAffiliation
- datasetContactEmail
- datasetContactName
- datasetLevelErrorNotes
- dateOfCollection
- dateOfCollectionEnd
- dateOfCollectionStart
- dateOfDeposit
- datesAdditionalInformationCHIA
- depositor
- deviationsFromSampleDesign
- distributionDate
- distributor
- distributorAbbreviation
- distributorAffiliation
- distributorLogoURL
- distributorName
- distributorURL
- dsDescription
- dsDescriptionDate
- dsDescriptionValue
- eastLongitude
- frequencyOfDataCollection
- geographicBoundingBox
- geographicCoverage
- geographicUnit
- grantNumber
- grantNumberAgency
- grantNumberValue
- gsdAccreditation
- gsdCoordinator
- gsdCourseName
- gsdFacultyName
- gsdPrizes
- gsdProgramBrief
- gsdRecommendation
- gsdSemester
- gsdSiteType
- gsdStudentName
- gsdStudentProgram
- gsdTags
- gsdTypes
- hbgdkiAnthropometry
- hbgdkiBiosampleType
- hbgdkiBirthWeight
- hbgdkiFeedingCare
- hbgdkiGestationalAge
- hbgdkiImmunizations
- hbgdkiInfantChildhoodMorbidity
- hbgdkiIntervention
- hbgdkiLowerLimitAge
- hbgdkiMaternalChar
- hbgdkiNeurocognitiveDev
- hbgdkiOther
- hbgdkiPregnancyBirth
- hbgdkiSocioeconomicChar
- hbgdkiStudyName
- hbgdkiStudyRegistry
- hbgdkiStudyRegistryNumber
- hbgdkiStudyRegistryType
- hbgdkiStudyType
- hbgdkiUnitsLowerLimitAge
- hbgdkiUnitsUpperLimitAge
- hbgdkiUpperLimitAge
- hbgdkiWaterSanHygiene
- journalArticleType
- journalIssue
- journalPubDate
- journalVolume
- journalVolumeIssue
- keyword
- keywordValue
- keywordVocabulary
- keywordVocabularyURI
- kindOfData
- language
- localdePublicao
- mraCollection
- northLongitude
- notesText
- numero
- originOfSources
- otherDataAppraisal
- otherGeographicCoverage
- otherId
- otherIdAgency
- otherIdValue
- otherReferences
- producer
- producerAbbreviation
- producerAffiliation
- producerLogoURL
- producerName
- producerURL
- productionDate
- productionPlace
- proprietrio
- provenanceCHIA
- psiBehavior
- psiDonor
- psiHealthArea
- psiIntervention
- psiPopulation
- psiProductsServices
- psiStudyDesignElement
- psiStudyType
- publication
- publicationCitation
- publicationIDNumber
- publicationIDType
- publicationURL
- redshiftType
- relatedDatasets
- relatedMaterial
- researchInstrument
- resolution.Redshift
- resolution.Spatial
- resolution.Spectral
- resolution.Temporal
- responseRate
- rightsAvailabilityCHIA
- samplingErrorEstimates
- samplingProcedure
- series
- seriesInformation
- seriesName
- socialScienceNotes
- socialScienceNotesSubject
- socialScienceNotesText
- socialScienceNotesType
- software
- softwareName
- softwareVersion
- sourceCHIA
- southLongitude
- state
- studyAssayCellType
- studyAssayMeasurementType
- studyAssayOrganism
- studyAssayOtherMeasurmentType
- studyAssayOtherOrganism
- studyAssayPlatform
- studyAssayTechnologyType
- studyDesignType
- studyFactorType
- subject
- subtitle
- targetSampleActualSize
- targetSampleSize
- targetSampleSizeFormula
- timeMethod
- timePeriodCovered
- timePeriodCoveredEnd
- timePeriodCoveredStart
- title
- titulo
- topicClassValue
- topicClassVocab
- topicClassVocabURI
- topicClassification
- unitOfAnalysis
- universe
- variablesCHIA
- weighting
- westLongitude

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/bc0f5435-de06-4d5d-ba4b-ace3629324bd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Reply all
Reply to author
Forward
0 new messages