R in condition selection?

Sarah Wang

unread,

Jun 18, 2025, 11:44:52 PM6/18/25

to QCA with R

Hi everyone,

I am writing to inquire if there is something in R that can help with the selection of conditions.

I have 40 cases and 16 potential conditions. I am trying to narrow down and find some main conditions that I should include in the analysis. Apart from the literature I can consult with, is there a way/ways in R that can help with the process?

Any literature related to this would also be great.

Thank you

Kind regards

Sarah

Pedro Carmona

unread,

Jun 19, 2025, 2:09:41 AM6/19/25

to QCA with R

Sarah,

I suggest taking a look at this paper: The Impact of Environmental Risk on Business Failure: A Fuzzy-Set Qualitative Comparative Analysis Approach with Extreme Gradient Boosting Feature Selection

https://doi.org/10.3390/a18040225

In this paper, the authors use the fsQCA and feature selection to select the most important features.

Regards,

Pedro

Patrick A. Mello

unread,

Jun 19, 2025, 3:08:43 AM6/19/25

to QCA with R

Hi Sarah and Pedro,

Thanks for the question about narrowing down conditions in QCA, and for the suggested paper.

The question occurs frequently, but in my view there is no “shortcut” to theoretical reasoning. If you have 16 candidate conditions, then you should narrow that down to 6-7 conditions at maximum for an individual QCA. Of course, you can group conditions and you can form different models, that would be part of the process. And you take into account prior studies’ results.

The “algorithmic” solution is very interesting, Pedro. I had not heard about XGBoost before but I will look into it.

That said, I’m skeptical because even though this approach was used to narrow down 25 conditions to 7 for the QCA, the analysis only yields a coverage of 0.223. This is not “reasonable” (p.14) but simply too low for a meaningful QCA. Of course, this is apparently a large-N QCA but still I would feel uncomfortable to reason about sufficient configurations for my outcome if my solution only covers 22% of the set-membership scores, especially if you opt for a “causal” interpretation of the results.

Best,

Patrick

dr. Patrick A. Mello
Assistant Professor of International Security, Department of Political Science

and Public Administration, Faculty of Social Sciences and Humanities

--
You received this message because you are subscribed to the Google Groups "QCA with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qcawithr+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/qcawithr/35992791-4120-4458-8146-dfee5f9fd0e0n%40googlegroups.com.

Ingo Rohlfing

unread,

Jun 19, 2025, 3:33:32 AM6/19/25

to QCA with R

I am with Patrick here. As the paper states, it uses XGBoost to select variables that optimize predictive accuracy. I think it is not clear that these are necessarily the variables / sets that are causal. Maybe there is some causal theory behind this machine-learning approach, but it does not read like it. In essence, the paper uses one of the variable selection processes that is discussed in this paper.

Amenta, Edwin, und Jane D. Poulsen. 1994. „Where to Begin: A Survey of Five Approaches to Selecting Independent Variables for Qualitative Comparative Analysis“. Sociological Methods & Research 23(1): 22–53.

Machine learning is more sophisticated than what was available and considered in the 1990s. Still, in my view, there is a friction between preselecting variables on statistical grounds that are then used in a set-relational analysis. Theory, theoretical importance, substantive importance, and aggregation of variables to higher-order variables, which is a matter of concept formation, seem preferrable to me.

Regards

Ingo

Stefano Assanti

unread,

Jun 19, 2025, 3:44:21 AM6/19/25

to QCA with R

Hi all,

Thank you, Pedro, for sharing the article. Very interesting read!

I don’t know of a specific R tool to assist directly with the theoretical selection of conditions, but I’d like to offer a possible strategy for reducing the number of conditions based on a conceptual distinction.

If you're able to conceptually classify your 16 conditions in terms of their "proximity" to your outcome - some being more "distant" (structural, contextual) and others more "proximate" (mechanism-related, actor-driven) - you might consider using a two-step QCA approach. This involves conducting sequential QCA analyses, typically beginning with more distal conditions and then incorporating proximate ones in a second stage.

This can be helpful in structuring the selection process and clarifying how different layers of causality interact. It’s also a strategy that integrates theoretical reasoning into a more manageable analytical design.

Here is a paper that discusses this approach in detail, which you might find helpful: https://link.springer.com/article/10.1007/s11135-018-0805-7

Warm regards,
Stefano

Sarah Wang

unread,

Jun 19, 2025, 4:18:23 AM6/19/25

to Stefano Assanti, Ingo Rohlfing, pere...@gmail.com, QCA with R

Thank you everyone for all the suggestions and resources. I came across this paper recently and it also talks about XGBoost. https://link.springer.com/article/10.1007/s11135-025-02146-2

I will check all the resources in detail and find a way for my project.

Kind regards

Sarah

You received this message because you are subscribed to a topic in the Google Groups "QCA with R" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/qcawithr/WoZxvpJgkWU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to qcawithr+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/qcawithr/02f81fe3-c917-49ca-8fb8-44492ff8aa80n%40googlegroups.com.

--

Sarah Wang

Adrian Dușa

unread,

Jun 19, 2025, 12:29:08 PM6/19/25

to Ingo Rohlfing, QCA with R

Totally agree.

Despite recent advances for large-N, I still believe that QCA is essentially a case-based method. As such, statistically selecting from a larger set of candidate conditions cannot really hold, especially with a medium sized number of cases.

Machine learning has other purposes, with really big data (much larger than large-N) so there really is no other alternative shortcut to theoretical reasoning.

If no other theory exists, at least selecting the conditions with the largest face validity should help.

Running multiple QCA analyses is also a strategy, possibly combined with a two step procedure.

Bottom line, the researcher should be responsible with the decision, not delegate it to a machine.

Best,

Adrian

To view this discussion visit https://groups.google.com/d/msgid/qcawithr/31e1a057-33dd-4185-b6a6-0c026a75cca1n%40googlegroups.com.

Reply all

Reply to author

Forward