Hello Nicolai,
The truth table object in the QCA package was designed in such a way that requires no manual intervention. There are so many arguments to the function, that allow such a fine grained array of possible options that manually changing the output values should never be necessary.
This is of course possible, but with the cost of breaking a number of features, if I remember correctly the esa() function in the package SetMethods has a problem with the case labels when doing such a manual intervention.
Now, as to why the argument exclude = c(2) does not seem to work, I believe it is explained in the help of that argument (at least in the newest versions of the package on CRAN):
exclude = A vector of (remainder) row numbers from the truth table, to code as negative output configurations.
So, what it excludes is the remainders only. This is a methodological, rather than a programming choice, because I believe observed evidence should not be touched in this way. The argument "exclude" is useful for (and was introduced specifically to) obtaining the so-called enhanced parsimonious solution. This is done by excluding remainders from the minimization process.
Observed data, on the other hand, can be dealt with in more detail through analysing the process of calibration. If those cases are problematic, as they seem to be, I would first try to see if there is nothing wrong with the calibration process.
It would be immensely interesting to also provide information about how (specifically) did you assign set membership scores to those cases. What were the raw scores? Which calibration method did you apply?
You might find some unexpected answers when following these questions...
I hope this helps,
Adrian
> On 10 Jun 2020, at 17:52, Nicolai Schulz <d.nicol...@gmail.com> wrote:
>
> Hello everyone,
>
> I'm fairly new to using QCA in R and have the following issue which can be replicated with the attached Script and data.
>
> The issue I face is that whereas I can effectively exclude truth table rows manually in R, using the QCA-packages "exclude"-argument during the minimization process does not always work.
>
> The example data will show that I:
> • Identify a Row (#2) in the truth table which is fully made up of a deviant case in kind (dcc), which I therefore want to exclude from the minimization (whether that is actually in itself methodologically justifiable is something I just asked in the QCA Facebook group here, but I am glad for any input here as well).
> • I then use "exclude=c(2)," to exclude that row during the minimization process.
> • Yet, that row/case shows up in the final solution-case table as it's own path (although it is factually represented only by one deviant case).
> • When manually excluding the row from the truth table (i.e. "ttGDP_VPHP$tt['2', 'OUT'] <- 0") and then minimizing, the "deviant path" indeed disappears (as I hoped/thought it would).
> In conclusion, my question is: Why does my attempt at using the "exclude"-argument not have the intended effect of actually excluding the said row in the minimization process? Where am I going wrong?
>
> Many thanks for your advice!
>
> All the best,
> Nicolai
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups "QCA with R" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to qcaw...@googlegroups.com.
Hi Nicolai,First, let us do a bit of housekeeping to minimize the code and make it quick to follow. Just like for variables, I would use very short names for the objects as well, for example instead of "SchulzQCA2020RepDataRaw" I'd use something like "srdr" (sort of an acronym).Then, I see your data was most likely exported from R, as it has the "rownames" of the original R data frame in the first column. At the same time, the real rownames (case names) are in the last column called "CountCodeP". You can assign that column as row.names directly, using this:srdr <- read.csv2("SchulzQCA2020RepDataRaw.csv", row.names = "CountCodeP")[, -1]The [, -1] part eliminates the irrelevant first column. If you'll want to save this new data frame, I recommend using the function export() from my package, which automatically assigns the name "Cases" to the first column containing the row names.
Next, I am looking at your calibration commands. First of all, I don't really understand why you insist on rounding your set membership scores, and why to two decimals only?This only introduces more imprecision, for instance on condition EPOS where the second and third values are 0.5032453678 and 0.5032453678. Rounding these numbers to two decimals produces the value 0.5 for both, which is very bad (and you do get a warning in the truth table procedure because of that).
But more importantly, I don't really buy the direct calibration of the raw condition invQ2_OB. That seems to be like a Likert type response scale from 1 to 5, and direct calibration cannot be applied on this type of raw data because this is a categorical variable, while the direct calibration expects a numerical (interval) variable. That is surely problematic, and I would warmly suggest reading the section 4.3 of my book (freely available on bookdown.org)
I really cannot say anything else about the choice of thresholds for the other conditions, short of more informed theoretical information, but they do strike as odd too. For instance the raw variable QuinPeRelAnEcGrFlex, which ranges from -6.172504 to +4.3759. Setting the full exclusion point to -2.366304 only excludes the outlier value -6.172504, as the second lowest value is -2.275266.
Besides being unnecessary, such a precision for the full exclusion threshold raises the eyebrows anyways. The precise values are simply empirical numbers from means and percentiles (although I accept the challenge that one should consider whether these are good refernce points). Then the full inclusion threshold of 2.060684 (again very precise) does not seem to be near the far end of the sorted values [Similar issue as above: This is the 75% in the full-sample, which there also lookes like a meaningful threshold]. This being a subset of your actual data, there might be more information that I am missing but in any case threshold values should attempt to differentiate between qualitatively different cases (using theory), rather than mechanically using some empirical distribution points like the mean, or the quartiles (data driven, and definitely not recommended).My suggestion would be to use the interactive threshold setter in the Graphical User Interface, or at least make use of the function Xplot() to visually inspect the raw distributions. Thank you for the suggestions. Will do (though that resembles the approach I took...yet, in the full-sample).Methodologically, I wonder why do you have variables containing negative numbers: are those scores obtained through some sort of factor analysis, perhaps? So Vdem_e_polity2 is the PolityIV regime type score ranging from -10 to 10. And "QuinPeRelAnEcGrFlex" is the annual economic growth of a country relative to countries at similar levels of development during the same period. Thus, it can be negative (slower than other comparable countries) or positive (faster).
I continue to believe your analysis (including the discussion about the deviant cases) should not continue until you've solved the calibration phase. I don't mean to say your calibration is completely wrong, but it seems to me like somewhat subjective and/or data driven.
I hope this helps,Adrian
--
You received this message because you are subscribed to the Google Groups "QCA with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qcawithr+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/qcawithr/624b355b-f803-4056-aeb5-8800fb229f67o%40googlegroups.com.