Laurence,
[1] Thank you for your replies. I do appreciate the time that you have taken to address my comments. Herein, I also will address your comments on my lack of word boundaries in my examples of regexes.
[2]
A Question: Unfortunately, I am not certain about the "token level" that you mention in your first response. Does this refer to a word as a whole?
[3] I am sorry to say that I am still encountering serious difficulties with regexes in AntConc v4. My experience with regex creation is based on various books, including one that I consult often:
Regular Expressions Cookbook, 2nd Edition, by Jan Goyvaerts & Steven Levithan (Sebastopol, CA: O'Reilly Media, 2012). Moreover, I test my regexes with the software RegexBuddy (coded by Goyvaerts) and they work in numerous regex engines. As a consequence, I am not certain what to do. In the spirit of improving, allow me to provide several examples of regexes that I have used.
Regex for a Single Word with Different Spellings [4] In my corpus of Du Bois texts, "cooperation" has 3 correct spellings: "cooperation", "co-operation", and the older "coöperation". In v3.5.9 this regex finds all 3 spellings:
{rgx-1} (?i)co-?[oö]peration
But in v4, using {rgx-1}, only "cooperation" and "coöperation" are matched, not "co-operation".
[5] In v4 I tried a regex using alternation:
{rgx-2} co-operation|cooperation|coöperation
which in v4 still does not locate "co-operation" with the dash. In my Du Bois corpus there are over 160 instances of "co-operation".
[6] The only regex that worked in v4 was
{rgx-3} co operation
without the dash, which was able to locate "co-operation" with the dash. As I expected, {rgx-3} did not locate "cooperation" or "coöperation".
[7] A regex that works in AntConc v4 is:
{rgx-4} self conscious
This regex without the dash will find "self-conscious" and "self-consciousness", both with the dash, as well as "self conscious" itself.
[8]
A Question: Is that what you mean by token in the regex implementation in AntConc 4? Namely, that the two search terms [node words] are matched regardless of the punctuation or blank space that may lie between them?
Proximity-Oriented Regular Expressions [9] I now move to proximity-oriented regexes, which are a necessary component of my political-theoretical projects. With proximity regexes I am seeking the manifold variation of Du Bois's ideas as he potentially ramified them across the hundreds of his writings. I am not seeking collocations.
[10]
A Request: In AntConc 4, how would I locate, for example, "self" and its variants ("ourselves", "himself") in relation to "conscious" and its variants ("unconscious", "consciously") over a gap of 20-100 words or of 1-400 characters? Might you please provide a sample regex?
[11] Any regex would need to be able to find the designated search terms [above] in the following passages from my corpus (such as I can do in v3.5.9):
{quot-1} "a world which yields him no true
self-consciousness"
{quot-2} "more deeply
self-critical, more
conscious of its power"
{quot-3}
{¶ } [....] If you count
yourselves as something more than your money, why may not I? {¶} To induce, then, in men a
consciousness of the humanity of all men,[....]
I can provide other examples from my interpretive research.
[12] The passage presented in {quot-3}, which crosses both sentence and paragraph boundaries, is salient to my research: who has consciousness and what is the content of that consciousness. It could not be matched by boundaries set to locate only "self" or "self-conscious". The "then" connects the two paragraphs, and thereby relates the essay's audience to the idea of the humanity embodied in all people, which is (I would argue) Du Bois's goal of the essay.
[13] The following proximity regexes work in v3.5.9 and will locate those quotations listed above:
{rgx-5} (?i)sel(?:.){0,100}?conscious
{rgx-6} (?i)sel[\w]*\W+(?:\w+\W+){0,20}?conscious[\w]*
These regexes appeared in my initial posting.
Word Boundaries[14] Herein arises the importance of the
strategic presence and absence of boundary markers in my regexes. In my iterative research process, the early steps involve capacious search terms. I will KWICly (but maybe not so quickly [sorry!]) examine the lists of matches to understand the range of possible words by which Du Bois expressed himself. I also believe in research serendipity.
[15] With regard to "sel" in relation to "conscious", I am not only looking for "self" but also for other possibilities, such as "ourselves", "himself", "selfless", etc. In the proximity-oriented regex {rgx-6},``sel``, which is to be located within 20 words of ``conscious``, seeks to match "self", "itself", "yourselves", "himself", etc. Thus, ``sel`` or even ``sel[\w]*`` is an intentional part of my research strategy involving proximity regexes.
[16] Moreover, in order to to examine what Du Bois wrote in relation to "conscious" or "consciousness" I do not want initially to exclude "unconscious", "half-conscious", or "consciously".
[17] As a next step in my research flow, I may then refine my new regex searches accordingly -- perhaps to include boundary markers to frame the search \bword\b. In short, I include, or not, word boundaries as a part of a larger interpretive strategy to understand the plenitude and nuances of Du Bois's voices in the words and ideas of his writings.
Matching a String within Sentences[18] As an example of another regex useful for my research: sometimes I wish to identify the entire sentence in which a match is found. To that end the following regex works in v3.5.9, but there are "No hits found!" in v4.
{rgx-7} (?i)[^\.?!]+self[\s-]conscious[\w]*.*?[\.?!]
I do not specify word boundaries because this regex, when applied to my corpus, only matches the desired phrases. I would like to perform this regex search in v4.
In Closing[19]
Another Request: Perhaps you could point me to documentation on the regex flavor used in AntConc 4? In addition, sample regexes greatly aid my comprehension and would help me in crafting regexes appropriate to v4.
[20] I thank you for your work on AntConc and for your assistance. My posting is long, because proximity regexes figure extensively in my academic research. Indeed, I presented at 3 academic conferences in 2021 utilizing AntConc v3.5.9 and various regexes as part of my process of interpreting Du Bois's ideas (at <
www.webdubois.org>). I am planning to publish academic writings -- hopefully sooner than later -- in which I wish to foreground AntConc and my proximity regexes by applying them to my corpus of Du Bois writings.
Ciao.
Robert