Is there something new with "pimplot" in QCA 3.7?

Max Netherworlds

unread,

Mar 16, 2020, 8:14:34 AM3/16/20

to QCA with R

Dear all,

today I updated the QCA package to 3.7. Thank you, Adrian, for the improvements! (I have to get used to the new tilde-solutions though).

Apart from deleting all "use.tilde" commands, I did not change anything in my script.

But when I want to plot my intermediate solution with pimplot, I get an error message which did not occur with the old version. See the code for pimplot and error message below.

Has anyone else the same problem? Could it be due to the update?

Bests and thank you

Max

Code:

pimplot(data = Setdata,
        results=isGROW,
        outcome = "GROW",
        neg.out = FALSE,
        sol=1,
        all_labels = TRUE, jitter = FALSE)

error message:

Fehler in `[.data.frame`(data, , x) : undefined columns selected

Adrian Dușa

unread,

Mar 16, 2020, 9:30:22 AM3/16/20

to Max Netherworlds, QCA with R

Hi Max,

The function pimplot() is not part of the QCA package, but of SetMethods. They still have to follow-up and release another version which should be compatible with QCA 3.7

In the meantime, I wonder what would be wrong using the XYplot() commmand from the base package QCA?

Something like:

ttLF <- truthTable(LF, "SURV", incl.cut = 0.8, sort.by = "incl", show.cases = TRUE)

minimize(ttLF, include = "?")

# DEV*~IND + URB*STB => SURV

Then:

XYplot("DEV*~IND + URB*STB => SURV", data = LF)

or adding the case labels:

XYplot("DEV*~IND + URB*STB => SURV", data = LF, clabels = rownames(LF))

or the enhanced version:

XYplot("DEV*~IND + URB*STB => SURV", data = LF, enhanced = TRUE)

You can have a look on various parameters with:

?XYplot

Hope this helps,

Adrian

--
You received this message because you are subscribed to the Google Groups "QCA with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qcawithr+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/qcawithr/45e61f05-4764-4320-8c44-889f857d57a0%40googlegroups.com.

—
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr. 90-92
050663 Bucharest sector 5
Romania
https://adriandusa.eu

Max Netherworlds

unread,

Mar 16, 2020, 1:56:06 PM3/16/20

to QCA with R

Hi Adrian,

thank you for the prompt reply.

I think XYplot is a good alternative. Pimplot does a few things "automatically", where my confined skills in R do not reach out, when it comes to XYPlot.

Do you know, if it is possible, to ...

1) ... use the solution (i. e. "isSOL") as a placeholder, so that you can use the command interchangeably for different calculations, something such as:

XYplot(isGROW[["i.sol"]][["C1P1"]][["solution"]][[1]], GROW,
data = Setdata,
clabels = rownames(Setdata), jitter = TRUE, enhance = TRUE)

I still search for the right object.

2) ... plot the solution formula AND the different parts of the solution with one command

3) ... jitter the cases in the plot as such that one can see the case labels properly and not murky clusters

Let me know, if you have ideas for these challenges.

Bests,

Max

To unsubscribe from this group and stop receiving emails from it, send an email to qcaw...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/qcawithr/45e61f05-4764-4320-8c44-889f857d57a0%40googlegroups.com.

Adrian Dușa

unread,

Mar 16, 2020, 2:48:18 PM3/16/20

to Max Netherworlds, QCA with R

Yes it works like that too, and with intermediate solutions. Something like:

qcais <- minimize(ttLF, include = "?", details = TRUE, dir.exp = "DEV, STB, ~LIT*IND")

XYplot(qcais$i.sol$C1P1$solution[[1]], outcome = "SURV", data = LF,

jitter = TRUE, enhance = TRUE)

But since the intermediate solution is DEV*URB*STB + DEV*~IND*STB, I find it more cumbersome to extract it using qcais$i.sol$C1P1$solution[[1]], when the same can be obtained using:

XYplot("DEV*URB*STB + DEV*~IND*STB => SURV", data = LF,

jitter = TRUE, enhance = TRUE)

Hope this helps,

Adrian

To unsubscribe from this group and stop receiving emails from it, send an email to qcawithr+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/qcawithr/cf0681f2-be1f-4d3c-b60d-ba0830549c89%40googlegroups.com.

Brent Hutto

unread,

Mar 16, 2020, 3:58:48 PM3/16/20

to QCA with R

Max N,

I've been working on some R functions implementing alternative measures of consistency and coverage proposed by F. Veri recently. As part of doing that I ended up with a wrapper function for Adrian's XYplot, allowing creation of plots for a model solution including separate plots for each term (as pimplot does).

I'd be glad to share my R code with you if you think it would be helpful. It's not professional-coder quality but you are certainly welcome to it.

The one drawback is it doesn't pass through the "clabels" parameter (my data tends to have so many cases labeling them is useless). But I could probably put it in there easily enough. I do pass through XYplot's "enhance" and "model" params though.

Max Netherworlds

unread,

Mar 17, 2020, 5:32:19 AM3/17/20

to QCA with R

Ok, thank you. I will try what works best for me.

Am Montag, 16. März 2020 19:48:18 UTC+1 schrieb Adrian Dușa:

Yes it works like that too, and with intermediate solutions. Something like:

qcais <- minimize(ttLF, include = "?", details = TRUE, dir.exp = "DEV, STB, ~LIT*IND")

XYplot(qcais$i.sol$C1P1$solution[[1]], outcome = "SURV", data = LF,
jitter = TRUE, enhance = TRUE)

But since the intermediate solution is DEV*URB*STB + DEV*~IND*STB, I find it more cumbersome to extract it using qcais$i.sol$C1P1$solution[[1]], when the same can be obtained using:

XYplot("DEV*URB*STB + DEV*~IND*STB => SURV", data = LF,
jitter = TRUE, enhance = TRUE)

Hope this helps,
Adrian

To view this discussion on the web visit https://groups.google.com/d/msgid/qcawithr/cf0681f2-be1f-4d3c-b60d-ba0830549c89%40googlegroups.com.

Max Netherworlds

unread,

Mar 17, 2020, 5:33:21 AM3/17/20

to QCA with R

Hey Brent Hutto,

if you want to share, I would be interested in your code.

Bests,

Max

Brent Hutto

unread,

Mar 17, 2020, 9:43:17 AM3/17/20

to QCA with R

OK, for what it's worth here it is...

I just created a GitHub account and uploaded an R package with my functions. You can install them with the following two lines in R (assuming I understand the GitHub thing correctly).

library(devtools)

install_github("BrentHutto/VeriCov")

The first line gives you the devtools package which provided the install_github function. Once you install my VeriCov package you can just use

library(VeriCov)

And you'll be able to use my functions. The main function VeriCov is the one which lets you specify a QCA model object and create plots of each term in the model plus the overall model solution. So for instance if you minimized a model like this

modelKA <- minimize(fsKA, type='fs', conditions="GENP, CONF, REGIO", outcome="SUPRA", incl.cut=0.8)

You could pass that model object to my VeriCov function

VeriCov(expr=modelKA, out="SUPRA", dat=fsKA, plot=TRUE)

Or something like that. I have the "fsKA" dataset included in the package by the way. Anyway, there are a few notes in the help file associated with my package.

All this is very crude code, I'm just a R programming beginner. But maybe there's something in there will help you.

Brent Hutto

unread,

Mar 17, 2020, 9:53:44 AM3/17/20

to QCA with R

And here's a link to the actual R code of my three functions, if you just want the code and not the package.

https://github.com/BrentHutto/VeriCov/tree/master/R

Max Netherworlds

unread,

Mar 19, 2020, 7:23:17 AM3/19/20

to QCA with R

Thank you for sharing! It works and gives me excactly, what I need.

Max Netherworlds

unread,

Mar 20, 2020, 5:39:20 AM3/20/20

to QCA with R

Hey,

just another question, perhaps some of you has an idea:

With XYPlot the labelling of cases is different from pimplot, VeriCov does (as I understand it) not label cases.

Now I have plots of the kind you can see in the attached picture: case labels are not distinguishable. This was to some extent more convenient wit pimplot.

Of course, this is also a matter of large-N and calibration with only four values, but perhaps you know something to cope with the "clouds".

Otherwise I have to find a command to identify the cases in the respective areas (deviant cases coverage / consistency for example) to make my after-QCA analysis. Or I have to downgrade to 3.6 until pimplot is updated.

By the way: I read the paper by Veri and he made some good points. But I have not seen any paper citing his considerations regarding coverage. Do you have a source where the Veri-coverage is discussed?

Bests,

Max

Screenshot_Plot.png

Brent Hutto

unread,

Mar 20, 2020, 6:57:29 AM3/20/20

to QCA with R

My data also has large N so labeling cases isn't really practical. I can't really imagine any way to do a set of labels in these situations that's going to be readable.

And on the second question, no I do not think anyone is yet citing Veri's fit metrics.

Adrian Dușa

unread,

Mar 20, 2020, 6:00:04 PM3/20/20

to Brent Hutto, QCA with R

I agree with Brent, with large N labelling the cases is not very practical.
It would actually be possible by using a very small "cex" for the points and very small letters, but this only in the situation where the points would be fairly spread.

In your graph however, Max, this really doesn't seem to be truly fuzzy data. It looks like both conditions have four values only: 0, 0.3, 0.7 and 1.
Despite having values between 0 and 1, in my humble opinion this qualifies neither as fuzzy...

It should perhaps be underlined that XYplot() can do a whole lot more than its formal arguments. In fact, it can accept anything one can ask from a regular plot(), via the three dots argument "..."

Simply look at all those hundreds of arguments from ?par, for which XYplot() is a wrapper for.

Best,
Adrian

> --
> You received this message because you are subscribed to the Google Groups "QCA with R" group.

> To unsubscribe from this group and stop receiving emails from it, send an email to qcawithr+u...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/qcawithr/afaae88a-e778-4d51-a89a-de634bbc733f%40googlegroups.com.

—

Max Netherworlds

unread,

Mar 26, 2020, 5:41:38 AM3/26/20

to QCA with R

Hi together,

I want to get back to this topic, as I think this is interesting to discuss.

Firstly, I show you an example with PIMPLOT that dealt very well with the labelling of large-N. I wonder if XYplot can do this as well?

Secondly, I want to get back to Adrian's objection that the data seem to be "not truly fuzzy". Indeed, I read in your book "QCA with R", Adrian, that you do not recomment to use the "direct assignment" method (Verkuilen 2005), maybe ONLY for Likert-type scaled data (p. 83). My data base on a survey with 5-point-Likert-type scales, so I have to transfer them into a set in some way.

You propose a more suitable alternative to deal with Likert-type scales: the TFR method by Cheli and Lemmi 1995 (Dusa 2019, p. 108).

This method seems to be tempting, as it calculates a fuzzy score based on the data distribution. However, I have the following inquiries:

1) You recommend to use it for Likert-scales, which are large enough (p. 107), et least 7 points --> I have only 5! What can be the disadvantages of a too small scale?

2) The TFR method is data-driven, to be exact: driven by the distribution of the data. Doesn't this infringe the rule of "substantive external theoretical knowlege", which is strongly asserted by all QCA scholars? If I caculate the TFR with my whole sample of 800 cases and use them for the subset of 213 cases, would this be kind of an "external" knowledge?

3) Does it really matter that a set is calibrated as 0; 0.3; 0.7; 1? I have seen several studies doing it as such and Ragin 2009 as well as Schneider/Wagemann 2012 mention this approach without criticizing it, as I understand it correcly.

I am open for a discussion about this here.

Bests,

Max

Am Freitag, 20. März 2020 23:00:04 UTC+1 schrieb Adrian Dușa:

I agree with Brent, with large N labelling the cases is not very practical.
It would actually be possible by using a very small "cex" for the points and very small letters, but this only in the situation where the points would be fairly spread.

In your graph however, Max, this really doesn't seem to be truly fuzzy data. It looks like both conditions have four values only: 0, 0.3, 0.7 and 1.
Despite having values between 0 and 1, in my humble opinion this qualifies neither as fuzzy...

It should perhaps be underlined that XYplot() can do a whole lot more than its formal arguments. In fact, it can accept anything one can ask from a regular plot(), via the three dots argument "..."

Simply look at all those hundreds of arguments from ?par, for which XYplot() is a wrapper for.

Best,
Adrian

> On 20 Mar 2020, at 12:57, Brent Hutto <brent...@gmail.com> wrote:
>
> My data also has large N so labeling cases isn't really practical. I can't really imagine any way to do a set of labels in these situations that's going to be readable.
>
> And on the second question, no I do not think anyone is yet citing Veri's fit metrics.
>
> On Friday, March 20, 2020 at 5:39:20 AM UTC-4, Max Netherworlds wrote:
> Hey,
>
> just another question, perhaps some of you has an idea:
>
> With XYPlot the labelling of cases is different from pimplot, VeriCov does (as I understand it) not label cases.
>
> Now I have plots of the kind you can see in the attached picture: case labels are not distinguishable. This was to some extent more convenient wit pimplot.
>
> Of course, this is also a matter of large-N and calibration with only four values, but perhaps you know something to cope with the "clouds".
>
> Otherwise I have to find a command to identify the cases in the respective areas (deviant cases coverage / consistency for example) to make my after-QCA analysis. Or I have to downgrade to 3.6 until pimplot is updated.
>
> By the way: I read the paper by Veri and he made some good points. But I have not seen any paper citing his considerations regarding coverage. Do you have a source where the Veri-coverage is discussed?
>
> Bests,
> Max
>
> --
> You received this message because you are subscribed to the Google Groups "QCA with R" group.

> To unsubscribe from this group and stop receiving emails from it, send an email to qcaw...@googlegroups.com.

20-03-13-GROW.pdf

Adrian Dușa

unread,

Mar 26, 2020, 6:39:14 AM3/26/20

to Max Netherworlds, QCA with R

Hi Max,

This is uncharted territory, as there is no methodological study to pinpoint differences, but IMHO if you have something like 0, 0.3, 0.7 and 1 this looks to me as 4 levels of the same condition, rather than fuzzy values.

I would use a multi-value condition with 4 values instead (0, 1, 2 and 3), if you really want to use a Likert type response scale.

The other argument I bring to the table is that conditions in QCA (should) refer to concepts. It feels very strange to me that an unobserved concept is measured using a single item with a response scale from 1 to 5. Usually, such concepts are operationalized (using dimensions) and more than 1 item would be produced.

There is nothing wrong with using a 1 to 5 response scale for each item, but the overall concept consists of all (aggregated) items not just one. Suppose we have 10 items, and we create a summative score, then we obtain a composite index with values ranging from 5 ot 50. There is more than enough variation to use Ragin's direct method instead of the direct assignment on a single item...

Even with 3 items, you get a variation from 3 to 15 and use TFR, or even maybe the direct method with proper (theoretically justified) thresholds.

Is a single item ever useful?
Could you or anyone really argue in a serious paper that a single item ranging from 1 to 5 accurately measures an abstract concept...?

My 2 cents, and food for thought,
Adrian

> On 26 Mar 2020, at 11:41, Max Netherworlds <max.neth...@gmail.com> wrote:
>
> Hi together,
>
> I want to get back to this topic, as I think this is interesting to discuss.
>
> Firstly, I show you an example with PIMPLOT that dealt very well with the labelling of large-N. I wonder if XYplot can do this as well?
>
> Secondly, I want to get back to Adrian's objection that the data seem to be "not truly fuzzy". Indeed, I read in your book "QCA with R", Adrian, that you do not recomment to use the "direct assignment" method (Verkuilen 2005), maybe ONLY for Likert-type scaled data (p. 83). My data base on a survey with 5-point-Likert-type scales, so I have to transfer them into a set in some way.
>
> You propose a more suitable alternative to deal with Likert-type scales: the TFR method by Cheli and Lemmi 1995 (Dusa 2019, p. 108).
>
> This method seems to be tempting, as it calculates a fuzzy score based on the data distribution. However, I have the following inquiries:
> 1) You recommend to use it for Likert-scales, which are large enough (p. 107), et least 7 points --> I have only 5! What can be the disadvantages of a too small scale?
> 2) The TFR method is data-driven, to be exact: driven by the distribution of the data. Doesn't this infringe the rule of "substantive external theoretical knowlege", which is strongly asserted by all QCA scholars? If I caculate the TFR with my whole sample of 800 cases and use them for the subset of 213 cases, would this be kind of an "external" knowledge?
> 3) Does it really matter that a set is calibrated as 0; 0.3; 0.7; 1? I have seen several studies doing it as such and Ragin 2009 as well as Schneider/Wagemann 2012 mention this approach without criticizing it, as I understand it correcly.
>
> I am open for a discussion about this here.
>
> Bests,
> Max

Brent Hutto

unread,

Mar 26, 2020, 6:43:03 AM3/26/20

to qcaw...@googlegroups.com

Max (and the group),

My first and so far only application of QCA to (attempt) a publishable manuscript had a binary outcome and a bunch of four-item scales as conditions. I went with a very hard core "external theoretical knowledge" approach and dichotomized response as either agree or disagree (i.e. Strongly Agree and Agree combined, then Strongly Disagree and Disagree combined).

I've made a bit of a start on analyzing a second dataset which has a fuzzy outcome. So in that analysis I decided to treat those Likert-type scales as fuzzy as well. I did the direct mapping of 0, 0.3, 0.7, 1.0 for Strongly Disgree through Strongly Agree respectively. So I get plots a lot like your clumpy ones from this not-quite-fuzzy data. I have the same reservation as yourself about going with an empirical mapping peculiar to each dataset, that seems to lose one of the main benefit (to my thinking) of the QCA approach which is staying true as much as possible to the a priori meaning of the set memberships.

Adrian Dușa

unread,

Mar 26, 2020, 7:11:47 AM3/26/20

to Max Netherworlds, QCA with R

Forgot to reply about the case labelling.

The difference between pimplot() and XYplot() is that pimplot (from package SetMethods) is contructed with ggplot2 type of graphics, while XYplot (from package QCA) is constructed using base graphics.

It would be very easy (although not trivial) to allow both ggplots and base graphics in XYplots. I've done something similar with the function venn() in package venn, which (in the current version 1.9) gained an argument called "ggplot" to switch from base to ggplot graphics.

The real added value of the XYplot is its flexibility. What is perhaps less understood is that XYplots are extensible. One can use <any> of the plot parameters specified in ?par, even after the plot has been produced.

Consider this example:

library(QCA)

XYplot(CVF[, 5:6])

One can add labels to the plot using the base function text():

text(0.6, 0.6, "some text")

Both functions XYplot() and venn() have a special three dots argument "..." which allows access to all those graphical parameters in ?par, and function venn() allows access to all those graphical parameters specific to ggplots.

Whatever package allows plotting the case labels using those lines, is perfectly suitable with XYplot() as well. Package SetMethods uses ggrepel, and the same can be used by function XYplot() if and when it will allow creating ggplots as well. It is a matter of personal taste as to how clear is that plot with all those lines: there are so many lines and so few points, that (personally) makes me wonder what this plot is all about: the lines or the points?

Best,

Adrian

On 26 Mar 2020, at 11:41, Max Netherworlds <max.neth...@gmail.com> wrote:

To unsubscribe from this group and stop receiving emails from it, send an email to qcawithr+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/qcawithr/3385f37c-eec2-40f3-ac64-2c6a5b26687d%40googlegroups.com.
<20-03-13-GROW.pdf>

—

Max Netherworlds

unread,

Mar 26, 2020, 11:44:32 AM3/26/20

to QCA with R

Hey Adrian and Brent,

I agree with Adrian that using just one scale cannot represent the abstract concept. I forgot to mention that I also aggregate 4 or 5 Likert-type scaled items to calibrate the set memberships. To have the "conceptual control" over the calibration, I decided to direct assign the values 0, 0.3, 0.7 and 1 to the disjunctions and conjunctions of these items, e. g. if a case shows "does not apply" in just one of the items (and "does apply" in all the others), I theoretically propose that it can never be in the set, although aggregation by average or by sum would lead to a set membership.

Adrians suggestion to use the direct or indirect method for the sum of all items is a good idea and seems worth trying, but until today I hesitated to use the direct or indirect method, because it treats the sum of several ordinal scales as a kind of intervall scale, which in my view is false precesion and linear thinking for discrete values. But I should give it a try.

Direct and indirect method, as I understand it correctly, do not base on the data distribution in the sample, but TFR does. That is why I would classify TFR as the "last resort", because it is not external knowledge. Here I totally aggree with Brent.

Bests,

Max

Adrian Dușa

unread,

Mar 26, 2020, 12:05:04 PM3/26/20

to Max Netherworlds, QCA with R

Yes of course, TFR should be the last resort, but IMHO still better than a direct assignment on a single item.
Should several items be involved, then I don't see a problem. Aggregating by sum, or by taking the mean, should lead to a numeric score that would be easily integrated with the theoretically driven direct or indirect method.

That multiple ordinal items generate an aggregated interval scale is not something new, in fact this is the norm in regular statistics. All psychological tests (for instance) do that, and Cronbach alpha is employed specifically for this purpose. There is absolutely no problem with this approach.

Best,
Adrian

> To unsubscribe from this group and stop receiving emails from it, send an email to qcawithr+u...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/qcawithr/8edc61b9-c67b-4198-bb2f-0423a1d04998%40googlegroups.com.

Max Netherworlds

unread,

Apr 2, 2020, 6:47:28 AM4/2/20

to QCA with R

Hey Adrian,

I aggree with you that averaging/summing up of scales is established and works for a lot of set-theoretic questions. My concerns base on a working paper by Emmenegger et al. 2014 (Emmenegger, Patrick; Schraff, Dominik; Walter, André (2014): QCA, the Truth Table Analysis and Large-N Survey Data: The Benefits of Calibration and the Importance of Robustness Tests. In: Compasss Working Paper (2014-79)., which I want to quote here:

"Set-theoretic approaches also have an untapped potential with regard to the combination of multiple survey items. When a concept cannot be measured by a single indicator, it is common procedure in studies using regression-based approaches (but oddly enough also in studies using set-theoretical methods) to simply use the average of different variables or to conduct a factor analysis of variables that are expected to be in a causal relationship with the latent concept (for QCA studies using such inductive procedures see Berg-Schlosser 2008; Cárdenas 2012; Cheng et al. 2013; Crowley 2013; Engeli 2012; Grendstad 2007; Vaisey 2007). However, using such an inductive approach to capture sets is problematic for at least two reasons.
First, while indicators are typically numeric, concepts are constructed in terms of necessary and sufficient conditions (Goertz 2006). For instance, Canada is not a member of the category European democracies because Canada, although democratic, is not a European country. Hence, Canada’s set membership is zero and not 0.5 as the average of the variables ‘democratic’ and ‘European’ might imply. Hence, the calibration of sets by means of linear algebra is highly susceptible to misclassification, while conceptual thinking implies that variables are combined in a logical fashion using AND/OR operations. If the conceptual structure of necessary and sufficient conditions is not reflected in the measurement process, the result is concept-measure inconsistency. In our empirical example, scholars typically relied on inductive approaches, thus leaving conceptualization underdeveloped and resulting in large number of empirical work which is conceptually only loosely connected.
Second, averaging different variables to capture a concept is based on the assumption that all indicators are equally important for a concept. For instance, opposing immigration from poor Asian or African countries has the same weight as the opposition towards immigration from rich, neighbouring European countries. However, this line of argumentation is hardly justifiable for a number of reasons, which we discuss below." (pp. 9–10)

According to this, there seem to be different "schools" of set-theoretic thinkers, when it comes to the collection of data and subsequent calibration: Some rely on established scales, proofed as reliable with correlation-based parameters of fit such as Cronbach's alpha, and their linear combination. Others (such as Emmenegger et al.) seem to challenge this by stating that it comes from a correlation-based thinking that contrasts set-theoretic thinking. They rely on a conceptual informed calibration and aggregation. Perhaps you could call the former "quantitative", that latter "qualitative".

What is your opinion to this debate? Do you think both approaches have their legitimation?

Best,

Max

Am Donnerstag, 26. März 2020 17:05:04 UTC+1 schrieb Adrian Dușa:

Yes of course, TFR should be the last resort, but IMHO still better than a direct assignment on a single item.
Should several items be involved, then I don't see a problem. Aggregating by sum, or by taking the mean, should lead to a numeric score that would be easily integrated with the theoretically driven direct or indirect method.

That multiple ordinal items generate an aggregated interval scale is not something new, in fact this is the norm in regular statistics. All psychological tests (for instance) do that, and Cronbach alpha is employed specifically for this purpose. There is absolutely no problem with this approach.

Best,
Adrian

Adrian Dușa

unread,

Apr 2, 2020, 8:26:55 AM4/2/20

to Max Netherworlds, QCA with R

Hi Max,

I read that paper (and now even more closely) and confess to still not fully understand what it refers to. Perhaps it escapes me, but let us take the example of European democracies.

First of all, I simply do not believe there is any scale of democracy whatsoever that would ever include an item regarding the geographical location. An operational model of democracy includes potenatially multiple items, but none of those is location. I am not an expert and would be delighted to stand corrected, but as far as I am aware such a thing simply does not exist.

Hence their statement that "Canada’s set membership is zero and not 0.5 as the average" strike as odd, as it would never cross my mind to attempt such a location based aggregation.

Instead:
- I would first compute a (raw) score of democracy on Canada, using sumation, mean or whatever other aggregation method seems theoretically justified (thus obtaining an interval level, numeric score), then
- compute the set membership score for Canada (in the set of democratic countries) using the direct or indirect method, provided the raw score is interval numeric, and only then
- compute the set intersection between: the membership in the set of European countries and the membership in the set of democratic countries, correctly resulting zero (and not 0.5 as they seem to suggest).

Their second point that "...averaging different variables to capture a concept is based on the assumption that all indicators are equally important for a concept" is absolutely valid.

But yet again, as far as I am aware there is no methodological indication that average is the best aggregation method. Quite the contrary, there are numerous examples (for instance the Human Development Index) where different dimensions of the concept have different weights in the final aggregated score.

Whatever the aggregation, linear or weighted, the method would always produce an interval level raw score. And it is perfectly possible to compute a set membership based on that score, using Ragin's methods.

I really, really see no problem at all using this approach, but I might as well completely misunderstood Emmeneger, Schraf and Walter (2014) and apologize in advance if doing so.

But if I am right, then everything I mentioned in the previous messages should still be valid. I don't think there are different schools of set theoretical thinkers, just those following (proper?) methodology... or something else. Using established scales does not make some more "quantitative" than others, for set-theoretical thinking makes as all comparativists.

Best,
Adrian

> On 2 Apr 2020, at 13:47, Max Netherworlds <max.neth...@gmail.com> wrote:
>
> Hey Adrian,
>
> I aggree with you that averaging/summing up of scales is established and works for a lot of set-theoretic questions. My concerns base on a working paper by Emmenegger et al. 2014 (Emmenegger, Patrick; Schraff, Dominik; Walter, André (2014): QCA, the Truth Table Analysis and Large-N Survey Data: The Benefits of Calibration and the Importance of Robustness Tests. In: Compasss Working Paper (2014-79)., which I want to quote here:
>
> "Set-theoretic approaches also have an untapped potential with regard to the combination of multiple survey items. When a concept cannot be measured by a single indicator, it is common procedure in studies using regression-based approaches (but oddly enough also in studies using set-theoretical methods) to simply use the average of different variables or to conduct a factor analysis of variables that are expected to be in a causal relationship with the latent concept (for QCA studies using such inductive procedures see Berg-Schlosser 2008; Cárdenas 2012; Cheng et al. 2013; Crowley 2013; Engeli 2012; Grendstad 2007; Vaisey 2007). However, using such an inductive approach to capture sets is problematic for at least two reasons.
> First, while indicators are typically numeric, concepts are constructed in terms of necessary and sufficient conditions (Goertz 2006). For instance, Canada is not a member of the category European democracies because Canada, although democratic, is not a European country. Hence, Canada’s set membership is zero and not 0.5 as the average of the variables ‘democratic’ and ‘European’ might imply. Hence, the calibration of sets by means of linear algebra is highly susceptible to misclassification, while conceptual thinking implies that variables are combined in a logical fashion using AND/OR operations. If the conceptual structure of necessary and sufficient conditions is not reflected in the measurement process, the result is concept-measure inconsistency. In our empirical example, scholars typically relied on inductive approaches, thus leaving conceptualization underdeveloped and resulting in large number of empirical work which is conceptually only loosely connected.
> Second, averaging different variables to capture a concept is based on the assumption that all indicators are equally important for a concept. For instance, opposing immigration from poor Asian or African countries has the same weight as the opposition towards immigration from rich, neighbouring European countries. However, this line of argumentation is hardly justifiable for a number of reasons, which we discuss below." (pp. 9–10)
>
> According to this, there seem to be different "schools" of set-theoretic thinkers, when it comes to the collection of data and subsequent calibration: Some rely on established scales, proofed as reliable with correlation-based parameters of fit such as Cronbach's alpha, and their linear combination. Others (such as Emmenegger et al.) seem to challenge this by stating that it comes from a correlation-based thinking that contrasts set-theoretic thinking. They rely on a conceptual informed calibration and aggregation. Perhaps you could call the former "quantitative", that latter "qualitative".
>
> What is your opinion to this debate? Do you think both approaches have their legitimation?
>
> Best,
> Max

Reply all

Reply to author

Forward