Issue with QCA's "exclude" argument in the "minimization "function

Nicolai Schulz

unread,

Aug 13, 2021, 5:31:09 AM8/13/21

to qcaw...@googlegroups.com, Adrian Dușa

Dear QCA Group,

I am facing issues related (but not identical) to issues I had posted about in this group in June 2020. I had also discussed more similar issues with Adrian per email prior to that.

Specifically, I am facing issues with the "exclude" argument of the "minimization" function. Using the "findrows" function I have identified simultaneous subset relations in both observed configurations (Type 3) and logical remainders (Type 2). I "saved" and abbreviated these as vectors "SSROC" and "SSRLR" respectively. Following an enhanced standard analysis approach (for the sake of simplicity, only using an enhanced parsimonious approach in the shared script), I decided to exclude these from the minimization process. I tried to do so first only excluding SSROC, then SSRLR, and then a joint vector SSR. My issue is that it seems the exclusion is not working for me. I notice that in two ways:

The observed configurations that should have been excluded are still considered/shown in the prime implicant chart.
The logical remainders that should have been excluded are still included in the simplifying assumptions.
I have struggled how to exclude SSROC and SSRLR simultaneously (with the SSR joint vector of the two not recognized) and would be helpful for guidance how to do so, once issues 1. and 2. are taken clarified.

For comparison, I also re-did the analysis though this time manually excluding the relevant rows, e.g. by using this function "TT_MAN_SSROC$tt['1', 'OUT'] <- 0". I agree with Adrian that ideally manual manipulation of the data should be avoided (even though it is still taught in QCA courses), but for the sake of comparison, I thought it is justifiable here. Excluding the data manually led to the expected changes: no more rows identified as problematic/simultaneous subset relation affected are included in the PI chart or SA list. Of course, the solution is also different.

Thus, I would be extremely grateful if you could take a look at the attached script and dataset (in .dta format - code to import is included in the script) to see where I'm going wrong. The key section is under point "6.". Please find the results of print(version) and sessionInfo() below the signature.

One last related question. I noticed that "findRows(obj = , type = 2)" delivers a different list of SSRLR when run after eliminating the SSROC identified by "findRows(obj = , type = 3)". This makes sense, I assume, as the logical remainders are affected by the observed configurations included/excluded. And, I assume, it also makes methodological sense to exclude only those SSRLR that are identified after SSROC are excluded, i.e. to sequence (for the above reason). I was wondering: (a) whether my assumption makes sense and (b) how/whether the exclude argument does this?

Many thanks for your time and support!

All the best,

Nicolai

platform x86_64-w64-mingw32

arch x86_64

os mingw32

system x86_64, mingw32

status

major 4

minor 1.0

year 2021

month 05

day 18

svn rev 80317

language R

version.string R version 4.1.0 (2021-05-18)

nickname Camp Pontanezen

R version 4.1.0 (2021-05-18)

Platform: x86_64-w64-mingw32/x64 (64-bit)

Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:

[1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 LC_MONETARY=German_Germany.1252 LC_NUMERIC=C LC_TIME=German_Germany.1252

attached base packages:

[1] stats graphics grDevices utils datasets methods base

other attached packages:

[1] haven_2.4.3 SetMethods_2.6 stargazer_5.2.2 ggrepel_0.9.1 ggplot2_3.3.5 QCA_3.12 admisc_0.17 dplyr_1.0.7 raster_3.4-13 rgdal_1.5-23

[11] sp_1.4-5

loaded via a namespace (and not attached):

[1] zoo_1.8-9 modeltools_0.2-23 tidyselect_1.1.1 purrr_0.3.4 lattice_0.20-44 colorspace_2.0-2 vctrs_0.3.8

[8] generics_0.1.0 stats4_4.1.0 htmltools_0.5.1.1 utf8_1.2.2 rlang_0.4.11 later_1.2.0 pillar_1.6.2

[15] glue_1.4.2 withr_2.4.2 DBI_1.1.1 fmsb_0.7.1 lifecycle_1.0.0 munsell_0.5.0 gtable_0.3.0

[22] codetools_0.2-18 forcats_0.5.1 tzdb_0.1.2 fastmap_1.1.0 httpuv_1.6.1 lmtest_0.9-38 flexmix_2.3-17

[29] fansi_0.5.0 Rcpp_1.0.7 readr_2.0.0 xtable_1.8-4 promises_1.2.0.1 scales_1.1.1 scatterplot3d_0.3-41

[36] mime_0.11 hms_1.1.0 digest_0.6.27 shiny_1.6.0 grid_4.1.0 tools_4.1.0 sandwich_3.0-1

[43] magrittr_2.0.1 tibble_3.1.3 Formula_1.2-4 crayon_1.4.1 venn_1.10 pkgconfig_2.0.3 ellipsis_0.3.2

[50] betareg_3.1-4 assertthat_0.2.1 R6_2.5.0 nnet_7.3-16 compiler_4.1.0

Schulz_Replication Script_V1.R

ResilienceOrientationData_NoNamibia.dta

Adrian Dușa

unread,

Aug 14, 2021, 4:45:22 AM8/14/21

to Nicolai Schulz, qcaw...@googlegroups.com

Dear Nicolai,

I am currently away and will only return to my computer on Tuesday next week.

What I do see in your script, however, is that you're using the 'exclude' argument in the minimize() function.

That is no longer the case, since version 3.7 actually, having been moved as a formal argument to function truthTable():

https://cran.r-project.org/web/packages/QCA/ChangeLog

Before the minimization, try creating a new version of your truth table using the exclude argument, and only then feed it to the minimization function.

Hope this helps,

Adrian

--
You received this message because you are subscribed to the Google Groups "QCA with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qcawithr+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/qcawithr/CALUqdYd6r7o4uEdv9WYy9V_UBKm-e2pvUxU9n9Emy0h1aRmRuw%40mail.gmail.com.

Adrian Dușa

unread,

Aug 14, 2021, 7:17:46 AM8/14/21

to Nicolai Schulz, QCA with R

That is something different, remember you can only exclude remainders not observed configurations.

If your SSROC object contain rows having an outcome equal to 1 it means these are observed positive configurations, and such inconsistencies need to be solved at the calibration level, or check the original raw data.

Something is definitely off prior to the QCA analysis, and it doesn't make much sense to exclude something that is empirically observed as positive.

Otherwise, if you want to exclude rows from multiple objects, something like this will do:

exclude = unique(c(SSROC, SSRLR))

The unique() part might not even be necessary, but I don't remember exactly how it works without my computer.

Best,

Adrian

On Sat, 14 Aug 2021 at 12:01, Nicolai Schulz <d.nicola...@gmail.com> wrote:

Dear Adrian,

many thanks for your quick reply and the update. I will be on leave for the next week myself, so no hurry at all in replying.

I just gave this a first shot (just for SSROC), but can't say I was successful (with the targeted rows still showing in the truth table). This is what I tried:

#Generate a truth table to identify SSROC.
TT <- truthTable(data=ROD_NN, outcome = "RES", conditions = "DEM, ARA, TRD, OIL, EXP, FOR", incl.cut=0.81, n.cut=2, sort.by="incl, n", complete=TRUE, show.cases=TRUE, dcc = FALSE)

#6.1.2 Find simultaneous subset relations in the empirical observed configurations
findRows(obj = TT, type = 3)
# --> 1 7 60
SSROC <- findRows(obj = TT, type = 3)

TT_SSROC <- truthTable(data=ROD_NN, outcome = "RES", conditions = "DEM, ARA, TRD, OIL, EXP, FOR", incl.cut=0.81, n.cut=2, sort.by="incl, n", complete=TRUE, show.cases=TRUE, dcc = FALSE, exclude = SSROC)
TT_SSROC
# The rows 1, 7, 60 still show in the truth table as Outcome = 1.

Also, I'd still be unsure how to simultaneously exclude SSRLR. Would "exclude = c("SSROC", "SSRLR")" work?

Once again, many thanks and please no hurry in replying!

All the best,
Nicolai

Adrian Dușa

unread,

Aug 14, 2021, 7:40:58 AM8/14/21

to Nicolai Schulz, QCA with R

Also, it should be noted that function findRows() has a Boolean aargument called "remainders", active by default, to limit the resulting row numbers to unobserved configurations only.

If your script returns observed positive configurations despite that argument, then I need to further inspect the function and patch it.

Best,

Adrian

Adrian Dușa

unread,

Aug 17, 2021, 3:12:10 AM8/17/21

to Nicolai Schulz, QCA with R

And now the more extended answer: the argument "remainders" has effect only for the subset relations, where type = 1.

Of course, it has no effect over the simultaneous subset relations (SSR), since those are by definition observed.

However, there is a distinction between finding:

- untenable assumptions (referring to the remainders), and

- simultaneous subset relations (referring to the observed positive configurations).

Only remainders can be excluded from the analysis, by assigning a value of 0 for the output.

Observed configurations are empirical evidence, and as such they are already allocated a value for the output.

Dealing with SSRs should therefore be dealt with in a different way than simply excluding them from the analysis. Since they are observed empirical evidence, if that evidence is ambiguous then one needs to restart the dialogue with the cases, possibly revisiting the calibration procedure or even the raw data itself.

I agree however this is not sufficiently made explicit, and I will amend the documentation accordingly.

Thanks for making this point,

Adrian

Nicolai Schulz

unread,

Aug 22, 2021, 3:47:26 PM8/22/21

to Adrian Dușa, QCA with R

Dear Adrian,

Many thanks for your quick and detailed reply. A couple of follow-ups, if I may. My sincere apologies for their length.

1. First, leaving SSROC (type 3) aside and focusing on CSA/SSRLR. Following our conversation, I have proceeded as follows in R:

#6.1.3 Find simultaneous subset relations in logical remainders
findRows(obj = TT, type = 2)

#Identifed rows --> 2 5 6 9 13 15 17 18 20 21 22 24 26 28 30 32 37 38 39 44 46 47 48 52 53 54 55 56 62 64

SSRLR <- findRows(obj = TT, type = 2)
TT_SSRLR <- truthTable(data=ROD_NN, outcome = "RES", conditions = "DEM, ARA, TRD, OIL, EXP, FOR", incl.cut=0.81, n.cut=2, sort.by="incl, n", complete=TRUE, show.cases=TRUE, dcc = FALSE, exclude = SSRLR)
TT_SSRLR

I come across the following problem. Many of the CSAs identified using findRows are still included in the TruthTable as possible logical reminders ("?") and are not excluded ("0") as I understand they should be. Examples are: 2, 22, and 53. These rows, however, are cases coded as "?" because they have less than the specified case number cut-off (n.cut=2). Is it a problem that they are now not coded as 0?

2. With regard to the theory and practice of excluding SSROC (observed configurations with simultaneous subset relations). Many thanks for clarifying your position on this. Admittedly, this wasn't entirely clear to me in your book where on page 193 you wrote:

"For this particular dataset, there are no such simultaneous subset relations (of type = 3 in the command above). Similar to the untenable assumptions, where some remainders are excluded from the minimization, the same is possible about observed configurations running the minimization process excluding the simultaneous subset relations. The function minimize() doesn’t care if it is an observed configuration or a remainder, everything supplied via the argument exclude is equally excluded."

Thus, I had read this section as saying excluding SSROC is both theoretically and practically (using the QCA package) possible. I understand you saying now that it is (a) no longer practically possible since version 3.7 and (b) in your view not theoretically and methodologically advisable. This, I understand, seems to be a fairly complex debate, one Carsten Schneider and you also had with regard to an older post of mine in the QCA Facebook group, where you advised against tempering the observed configuration outcomes, while Carsten felt there are cases it could be done. A QCA teaching script that also guided my work wrote:

"Therefore a decision must be made [about SSROC, after comparative truth table and xy plot analyses]: i) should it be included into the further analysis of the outcome, ii) included in the analysis of the non-outcome, or iii) neither in the analysis of Y nor ~Y. What should not happen is that the rows are used in both analysis."

And to a related question of mine in the QCA Facebook group Nena Oană had written the following:

"However, remember that sometimes you might not want to simply exclude all SSR from both the truth table for Y and ~Y and you can actually base that decision on PRI and on plotting TT rows and seeing in which truth table you have more typical/deviant cases consistency in kind for that small truth table row that ended up being a consistent enough subset of both Y and ~Y."

These two sources seem to imply that exclusion of SSROC are possible, after careful analysis of the affected configurations (which admittedly, simply excluding the findRows-identified configurations does not do justice). Now I'm a bit stuck with what to do. Clearly, the suggestion of re-questioning all my calibrations and potential conditions isn't super attractive. But maybe what I don't fully understand yet is why the existence of SSROC is necessarily due to faulty calibrations, etc.? Is it not possible that the calibration of the conditions is fine BUT that a specific configuration X of well-calibrated conditions nevertheless always has rather low values which makes it easy that X --> Y/~Y?

For additional clarity (I hope), let's perhaps take a look at my truth table (excluding logical remainders for the sake of brevity). The findRows Type 3 function identified 60, 1, and 7 as SSROC. Each of them consists of two cases, one of the two being a DCC, and PRI values being low to very low. Looking at the truth table I also feel that 58 could be excluded with more DCC (3) than non-DCC (2). Practically, 7 and 58 could be excluded by setting the consistency threshold higher (which to me raises the question of whether this isn't also "tempering" with the data?). But 60 and 1 are still "in". Would it be justified to kick them "out" given their low PRI and 50%-DCC-share? If so, is there perhaps even an option to exclude configurations with PRI-values lower than, say, 0.6?

Apologies, if any of my points are unclear. I will also link this Google group topic in my recent Facebook QCA group post for transparency and potential fruitful interaction, if that is ok.

Many thanks again for your quick and thorough replies - I really appreciate it.

All the best,

Nicolai

P.S. What is the best way to export Truth Tables and Minimization Tables to Word?

Adrian Dușa

unread,

Aug 23, 2021, 1:30:29 AM8/23/21

to Nicolai Schulz, QCA with R

Dear Nicolai,

I now get a distinct feeling of not being consistent with my own text :)

You are quite right, and I have just checked the code in version 3.1 of the package (at the time the book was written) and indeed the exclude argument did not care about observed configurations, everything being equally excluded.

In between, many things happened. The package evolved, the argument exclude was passed from function minimize() to function truthTable() and for some reason, when I re-wrote the code, it left the observed configurations untouched. I still very much think the observed evidence should be re-examined before a mechanical exclusion, but on the other hand I've always strived for backwards compatibility and this change is not.

Your code is very detailed and demonstrates a lot of care for the best practise procedures, that is obvious. However, you seem to end up with a lot of excluded rows, both observed and unobserved, and despite all this very thorough work, the solution will be close to the conservative one. Time will tell if my hunch is right, but in your place I would invest an equal amount of attention to the observed cases: they are rather many for 6 causal conditions, by the way, it seems like a large N study rather than a normal QCA. And I think is the problematic aspect of your research is the input data, not your QCA technique.

Re the TT_SSRLR example, you've stumbled upon a bug, precisely because of the above explanation: it tries to not touch what it "observes" in the truth table (the bug here being that it observes everything, including remainders, when activating complete = TRUE).

The code works as expected without activating the argument "complete = TRUE". To get you past this until patching the code, try this and you will notice the offending rows being correctly coded as 0:

TT_SSRLR <- truthTable(data=ROD_NN, outcome = "RES",

conditions = "DEM, ARA, TRD, OIL, EXP, FOR", incl.cut=0.81, n.cut=2,

sort.by="incl, n", show.cases=TRUE, dcc = FALSE, exclude = SSRLR)

Your email is greatly appreciated, and your detailed explanations even more.Will create a patch, which will also solve the coding of the observed cases, and release it soon.

Best wishes,

Adrian

Nicolai Schulz

unread,

Aug 23, 2021, 9:44:27 AM8/23/21

to QCA with R

Dear Adrian,

many thanks for your quick reply - that is very helpful!

Should I have any follow-ups I know who to ask :)

All the best,

Nicolai

Nicolai Schulz

unread,

Aug 23, 2021, 3:26:47 PM8/23/21

to QCA with R, Adrian Dușa

Dear Adrian,

I now tried implementing your code that does not include "complete = TRUE". However, all logical remainders are now excluded completely from the table (no more "?") and the SSRLR (including those with n<2) are not coded "0". For comparison, please see the screenshot below. Did I do something wrong?

A semi-related question: is it advisable to use pri.cut = 0.5 in the truthTable function? Put differently: is there a reason not to?

Many thanks and all the best,

Nicolai

You received this message because you are subscribed to a topic in the Google Groups "QCA with R" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/qcawithr/UmGOPKeyooM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to qcawithr+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/qcawithr/4b708bb4-c872-4144-bd81-6c7d149e02bfn%40googlegroups.com.

Adrian Dușa

unread,

Aug 23, 2021, 4:55:20 PM8/23/21

to Nicolai Schulz, QCA with R

Hello Nicolai,

Actually, that does not seems to be the case:

> TT_SSRLR2$tt$OUT[SSRLR]

[1] "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0"

[26] "0" "0" "0" "0" "0"

Which means that all configuration rows in SSRLR are in fact coded as "0" in the OUTput column.

And indeed, you only "see" 1s and 0s in the OUTput column, simply because you did not ask for "complete = TRUE".

If you roughly try to count the rows in your print screen, it's easy to notice there are not 64 rows but less. The rest is not printed, but still part of the "tt" component of the TT_SSRLR2 object, and those are the remaining remainders, not printed on the screen.

I forgot to answer about what is the best way to port such a truth table in Word / email.

What I do is a simple copy/paste from the R terminal, and format the code using a monospaced (fixed width) font. Easy trick, immediate result.

About the pri.cut, yes it is advisable to use it, that's why it is implemented in the truthTable() command. I don't know about the threshold value however, what I do know is "the higher the better". Likely, the QCA group on Facebook will have more pertinent advice on this matter.

Best,

Adrian

On Mon, 23 Aug 2021 at 22:26, Nicolai Schulz <d.nicola...@gmail.com> wrote:

Dear Adrian,

Nicolai Schulz

unread,

Aug 24, 2021, 2:40:08 AM8/24/21

to Adrian Dușa, QCA with R

Dear Adrian,

many thanks for the quick clarification and answers. The visualization is very helpful and reassuring, thank you.