Is there a maximum dataset size for the QCA package?

83 views
Skip to first unread message

Hannah Cohoon

unread,
Jul 10, 2020, 11:26:24 AM7/10/20
to QCA with R
Version and session info at bottom.

I have a 114x52 dataset that I'm trying to run a QCA analysis on, but I'm getting errors that make me wonder if the dataset is too big/I have too many causal variables. When running superSubset() on some of the 52 variables, I got the following error: "INTEGER() can only be applied to a 'integer', not a 'double'". If I lower the incl.cut or remove enough variables, the error doesn't appear. When running minimize(), I get the following returned: "NAs introduced by coercion to integer range. Error: Conditions '0,1,2,3,4,5,6,7,8,9,X' do not match the set names from "snames" argument." I've included sample code below to show the issues. Someone on Stack Overflow noted that they can't reproduce the minimize() error with version 3.0 of QCA but can with the latest.

Is my dataset too big for QCA or is there something else going on?

little_sample_data <- sample(c(0,1), 50*10, replace = TRUE)
little_example_df <- matrix(little_sample_data, nrow = 50, ncol = 10) %>%
  as.data.frame() 

#works
superSubset(little_example_df, outcome="V1", incl.cut = .9)

#works
tt <- truthTable(little_example_df, outcome = "V1", complete = TRUE, show.cases = TRUE, sort.by = "incl")
minimize(tt)
big_sample_data <- sample(c(0,1), 52*114, replace = TRUE)
big_example_df <- matrix(big_sample_data, nrow = 114, ncol =52) %>%
  as.data.frame()

#yields INTEGER() error
superSubset(big_example_df, outcome="V1", incl.cut = .9)

#yields Conditions error
tt <- truthTable(big_example_df, outcome = "V1", complete = TRUE, show.cases = TRUE, sort.by = "incl")
minimize(tt)
print(version)
               _                           
platform       x86_64-apple-darwin17.0     
arch           x86_64                      
os             darwin17.0                  
system         x86_64, darwin17.0          
status                                     
major          4                           
minor          0.2                         
year           2020                        
month          06                          
day            22                          
svn rev        78730                       
language       R                           
version.string R version 4.0.2 (2020-06-22)
nickname       Taking Off Again            
> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.5

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] forcats_0.5.0   stringr_1.4.0   dplyr_1.0.0     purrr_0.3.4     readr_1.3.1     tidyr_1.1.0     tibble_3.0.2    ggplot2_3.3.2  
 [9] tidyverse_1.3.0 QCA_3.8.2       admisc_0.8     

loaded via a namespace (and not attached):
 [1] tidyselect_1.1.0 haven_2.3.1      lattice_0.20-41  colorspace_1.4-1 vctrs_0.3.1      generics_0.0.2   htmltools_0.5.0  yaml_2.2.1      
 [9] blob_1.2.1       rlang_0.4.6      later_1.1.0.1    pillar_1.4.4     withr_2.2.0      glue_1.4.1       DBI_1.1.0        dbplyr_1.4.4    
[17] modelr_0.1.8     readxl_1.3.1     lifecycle_0.2.0  munsell_0.5.0    gtable_0.3.0     cellranger_1.1.0 rvest_0.3.5      fastmap_1.0.1   
[25] httpuv_1.5.4     fansi_0.4.1      broom_0.5.6      Rcpp_1.0.5       xtable_1.8-4     promises_1.1.1   backports_1.1.8  scales_1.1.1    
[33] jsonlite_1.7.0   mime_0.9         fs_1.4.2         hms_0.5.3        digest_0.6.25    stringi_1.4.6    shiny_1.5.0      grid_4.0.2      
[41] cli_2.0.2        tools_4.0.2      magrittr_1.5     venn_1.9         crayon_1.3.4     pkgconfig_2.0.3  ellipsis_0.3.1   xml2_1.3.2      
[49] reprex_0.3.0     lubridate_1.7.9  assertthat_0.2.1 httr_1.4.1       rstudioapi_0.11  R6_2.4.1         nlme_3.1-148     compiler_4.0.2 

Adrian Dușa

unread,
Jul 10, 2020, 12:48:49 PM7/10/20
to Hannah Cohoon, QCA with R
Hi Hannah,

The short answer is yes, there is definitely a limit to what the QCA algorithms can process.
This is because the complexity of the problem is exponential to the powers of 3, meaning that each new causal condition included in the analysis increases the complexity of the problem by three times.
In practical terms, the truth table is constructed by allocating each row a certain number, calculated as the decimal representation of (most of the times) the binary representation of the presence/absence of each causal condition. For 4 causal conditions, a sequence where all are present is the binary 1111 and its decimal equivalent is 15 (or the 16th row).
There is a physical limit to which such numbers can be represented in memory, for instance 2^32 -1 is the largest possible number that can be written in 32-bit computers, and if I am not mistaken R still has this limit despite moving to the 64-bit architecture.

So even if a computer could process 52 conditions, it would still be impossible to represent everything into memory and this is most likely the cause of your error. The C-level workhorse expects a valid truth table, and yours cannot be valid because numbers cannot even be represented.

In practical terms, the most advanced QCA algorithm can probably process up to 25-30 conditions, but it also depends on the number of observed truth table configurations. There is a huge number (quickly rising to infinity) of combinations of causal conditions that need to be checked, and the more such configurations the more computer iterations need to be performed. It might work for days without returning a result, and even if it did it would be practically meaningless because of the extremely high model complexity: if it would return millions of possible solutions, how would that help anyone?

Please see this article for further technical information:

I hope it helps,
Adrian

--
You received this message because you are subscribed to the Google Groups "QCA with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qcawithr+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/qcawithr/14f3959f-6e18-499f-949a-db3028d50667o%40googlegroups.com.


--
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr. 90-92
050663 Bucharest sector 5
Romania
Reply all
Reply to author
Forward
0 new messages