Dataverse statistical calculations

54 views
Skip to first unread message

Pedro Luis

unread,
May 7, 2026, 9:12:34 AMMay 7
to Dataverse Users Community
Ladies and Gentlemen,

I have a question regarding the process of calculating variables during the ingestion of SAV files. The ingest process performs this calculation, correct? Is it possible to suppress the calculation of certain variables, such as the YEAR? Is this suppression possible within SPSS or does it depend solely on Dataverse?

Thank you for your support.

Philip Durbin

unread,
May 28, 2026, 4:23:11 PMMay 28
to dataverse...@googlegroups.com
Hi Pedro Luis,

No, it is not possible to tell Dataverse to handle any individual variables in SAV files in any special way.

Can you please clarify what you mean by calculating variables? Are you talking about calculating summary statistics (min, max, etc.)?

Also, what is your goal? In the DDI, do you want summary statistics to be omitted for your YEAR variable? Or do you have something else in mind?

Thanks,

Phil


--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dataverse-community/c7da19dd-85da-4978-99ad-20877a2b5ab3n%40googlegroups.com.


--

Pedro Luis

unread,
Jun 5, 2026, 2:46:58 PM (8 days ago) Jun 5
to Dataverse Users Community
Hi Philip,

I'm working on extracting DDI from tabular files, hence the question. During file ingestion, Dataverse calculates summary statistics for all numeric variables. Is this correct? When I load a CSV file, it performs this calculation on all variables it identifies as numeric. Is this also correct? When I configure a structured file using SPSS (SAV file), I can define the variable metrics as "Scale," "Nominal," or "Ordinal," and this makes a difference for SPSS, but not for Dataverse, as it considers everything numeric unless it's a character type... Is this also correct?

My question was regarding the variable settings in SPSS, whether they would make any difference to Dataverse. For example, if I set the variable to "Nominal," Dataverse would stop calculating summary statistics, but I tried these settings this week without success. Dataverse only failed to calculate summary statistics for character type variables.

Example with a numeric variable:
<var ID="v6852" name="CO_CEP" intrvl="discrete">
<location fileid="f276"/>
<labl level="variable">CEP</labl>
<sumStat type="mode">.</sumStat>
<sumStat type="min">7.685E7</sumStat>
<sumStat type="max">7.6997E7</sumStat>
<sumStat type="vald">501.0</sumStat>
<sumStat type="invd">0.0</sumStat>
<sumStat type="mean">7.691397609580839E7</sumStat>
<sumStat type="medn">7.6907648E7</sumStat>
<sumStat type="stdev">47741.80343756191</sumStat>
<varFormat type="numeric"/>

Example with the same variable, but changing to character type:
<var ID="v6901" name="CO_CEP" intrvl="discrete">
<location fileid="f277"/>
<labl level="variable">CEP</labl>
<varFormat type="character"/>

The variable ANO (year) is another example where I don't need to calculate the summary statistics in this case. Therefore, I wanted to know if there's a way to omit the calculation.

Thanks,
Reply all
Reply to author
Forward
0 new messages