What am I missing in this regular expression?

59 views
Skip to first unread message

Neil

unread,
Apr 16, 2026, 3:34:03 PMApr 16
to NVDA Screen Reader Discussion
I'm trying to write a speech dictionary regexp to filter out lists of multiple in-line citations, replacing them with the word "citations".  If this can be refined into something robust, I highly recommend including it in the manual as a one-sentence example of what regular expressions can do; right now, users are left replacing words and phrases with "frog." My working pattern is already extraordinarily helpful to me in academic research, and I'm kicking myself for not exploring the feature years ago. Students would benefit tremendously.
What I have gotten working so far is fine for my purposes, but I'm also trying to learn regexp. So, any tutelage would be welcome.

The working pattern:
\([^)]*(19|20)[0-9]{2}[^)]+\)
...successfully filters things like "(Calvin and Hobbs, 1991; Swindle and Crouch 2007, spaghetti)".
As desired, single citations pass through: (Ragamuffin 2026) or (Atilla the Hun 453).
A shortcoming is that "(The 1973 Rehabilitation Act") matches the pattern.

Nothing I try to do to refine it following the first date match works.
Copilot's suggestion:
\([^)]*(19|20)[0-9]{2}[^)]*(19|20)[0-9]{2}[^)]*\)

I've also tried "
\([^)]*(19|20)[0-9]{2}[\];,]" followed by the rest of the working pattern, since multiple citations would be separated by a semicolon or comma (I'm throwing in the recognition that the date might be in brackets).
These an many other variants break. Regexps aren't always evaluated left to right, so maybe I'm missing something in there.

Andrew Downie

unread,
Apr 16, 2026, 10:08:56 PMApr 16
to nvda-...@nvaccess.org
Hi Neill
A colleague and I joke that regular expressions are wonderful in that one can spend an hour getting one to work and save oneself five minutes.  But, despite the frustration that can occur, they are a great resource.
I have written a series so that NVDA announces numbers more eloquently, including putting "dollar" after the number.  One thing I found was that they only work if put into the specific voice dictionary rather than the default.
I learned most of my regex from the help file in NoteTab.  I have NoteTab Pro, which costs money, but I am pretty sure the free version also includes the regex stuff.  One difference is that their flavour uses $ for subpatterns, whereas NVDA needs \.
If you want to drop a sample of the offending text into a reply, I would be happy to have a play with it, but with no guarantees that I would improve on what you have done already.

Andrew


--
***
Please note: the NVDA project has a Citizen and Contributor Code of Conduct.
NV Access expects that all community members will read and abide by the rules set out in this document while participating in this group.
https://github.com/nvaccess/nvda/blob/master/CODE_OF_CONDUCT.md
 
You can contact the group owners and moderators via nvda-user...@nvaccess.org.
---
You received this message because you are subscribed to the Google Groups "NVDA Screen Reader Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nvda-users+...@nvaccess.org.
To view this discussion visit https://groups.google.com/a/nvaccess.org/d/msgid/nvda-users/7849e0b3-46bf-4489-9449-81a19eeb166dn%40nvaccess.org.

Neil

unread,
Apr 17, 2026, 10:33:08 AMApr 17
to NVDA Screen Reader Discussion, Andrew Downie

Hi Andrew,

Props for NoteTab Pro. I used it twenty years ago, but I've long since lost the license, and it appears to largely have been abandonware for well over a decade. I never got too far into its scripting.

 NVDA uses the RE python engine. There's a different, more limited engine type called PCE or something like that used by, for example, Notepad++. I'll bet NoteTab does, too.

The "offending text" was what I mentioned below in that example; any text after a first date will match the pattern, but I can't find any way to test for a second date in order to make it much more likely that what's being matched is a list of two or more citations. Beyond that, the most robust pattern ready for widespread distribution would really nail it down by checking for semicolons or commas between citations other than the final match, as well as accounting for citations enclosed in brackets, which happens when listing a prior original publication date (Andrew 2026 [2023]) or when there's a parenthetical phrase accompanied by a citation (I ran into this the other day [Vivian, 2015]; Le Blanc [2021]). My rudimentary working pattern provided earlier catches the first case but not the second, which is itself baffling. I'm missing something basic, or else NVDA is in how it sends strings to the engine. It'd be quite some pattern to catch all that, but I know it's possible. Poses an interesting master class quiz question. 

joseph....@gmail.com

unread,
Apr 17, 2026, 10:38:03 AMApr 17
to nvda-...@nvaccess.org

Hi,

To reduce confusion for others: this is for citation style (I believe for APA (American Psychological Association) style). People will encounter text like what we are talking about in research papers and other academic (and sometimes non-academic) settings.

Cheers,

Joseph

Neil

unread,
Apr 17, 2026, 6:59:22 PMApr 17
to NVDA Screen Reader Discussion, joseph....@gmail.com
So, I was right about missing something, and I knew just what page to review, and now my brain has been permanently rearranged. The problem was [greedy quantifiers](https://www.regular-expressions.info/repeat.html#lazy). I can't say that what I've come up with won't break, or that it's really doing what I think it is, but I'm quitting while I'm ahead.
If you need a multiple inline citations filter in your speech dictionary, you know it. As an academic, it's a life-saver: I was in the middle of a book where every other sentence had strings of 3-10 citations. There are IOS e-reader apps that do much the same filtering. It'd be great if something like this landed in the user manual as an example.

Here is the pattern (should work with APA, Chicago, MLA, etc.):

'''
\([^)]*?(19|20)[0-9]{2}[;\],a-z]{1,2}[^)]+?(19|20)[0-9]{2}.*?\)
'''

This looks for two 19th or 20th century dates within parentheses. The first date must be followed by a letter, right bracket, comma, or semicolon. Any text can precede the dates or come after the last date (including more dates).

Here are examples of what it will and won't match, at least in my test file.

Sample matches|Things that pass through
(Akbar 1920, z 1921)|(Einstein 2008 [1905])
(Calvin and Hobbs, 1991; Swindle and Crouch 2007, spaghetti)|(Gillis1920, x 1921, zoinks)
(2021a, 2021b)|(it happened in 1929 or 1930)

So why on the second row does the right-hand example not match? A word NVDA marks as misspelled will break it during reading. Interestingly, when I selected the right-hand example to copy, it did match and NVDA said "citations".
Also, this might be obvious, but this filter works best when reading by paragraph or sentence, because citation lists spanning two lines won't get caught when arrowing up/down.

Quentin Christensen

unread,
Apr 19, 2026, 11:58:53 PMApr 19
to nvda-...@nvaccess.org, joseph....@gmail.com
Neil,

Thanks for sharing the update!

While undoubtedly a useful example, it might be a little complex for the user guide, given we aren't trying to teach Regex there.  I just had a look to see what we do say about regular expressions, and the most relevant section is here: https://download.nvaccess.org/documentation/userGuide.html#SpeechDictionaries


Though perhaps a link to https://www.regular-expressions.info/ might be worthwhile

I've also used https://regex101.com/ for testing expressions.  I had a good link to a series of lessons, but I can't find it offhand.



--

Quentin Christensen
Training and Support Manager

NV Access

Subscribe to email updates (blog, new versions, etc): https://nvaccess.org/newsletter

Kavein Thran

unread,
Apr 20, 2026, 10:36:53 PMApr 20
to nvda-...@nvaccess.org, joseph....@gmail.com
Hi,
Linking some useful info on regular expression in the user guide would be really useful. Series of lessons on regular expressions sounds fun Too, if you find the link by any chance, please share it here.

Regards,
Kavein
Kaveinthran (He/Him)
Curious, Native Blind

Disabled independent Human Rights Advocate
email: kavei...@gmail.com
twitter
My LinkedIn


Neil

unread,
Apr 22, 2026, 11:36:11 AM (14 days ago) Apr 22
to NVDA Screen Reader Discussion, Kavein Thran, joseph....@gmail.com
Kavein, Joseph linked several resources above. There's a link to the Python regexp tutorial in the manual already. However, I used the regularexpressions.info tutorial he linked to, because the one NVDA links to quickly goes into coding examples that would be more relevant to addon developers.
The tricky part for me learning them has been what the tutorial refers to as "greedy" quantifiers. The trick is that the asterisk will match everything until it gets to a whitespace character, it seems, swallowing the rest of character patterns in between, because they match the asterisk. Put a question mark after it, and it'll start looking for the next part of the pattern immediately. HTH.
For a great use of regular expressions, see Cary Mansien's Markdown Navigator addon.
I don't think an NVDA user wil understand what regular expressions can really do for them just by reading the speech dictionary section of the manual. They'd have to follow the provided link and read it for themselves, which is a barrier (the opposite of what a user guide is for). I think a complicated example of some kind would tell them immediately if it's something they want to explore. What the manual says on the topic is: "A regular expression is a pattern containing special symbols that allow you to match on more than one character at a time, or match on just numbers, or just letters, as a few examples. Regular expressions are not covered in this user guide." Meanwhile, what I used it for in NVDA has literally been life-altering, because I'm an academic. I had no idea speech dictionaries could be that powerful, or I would have benefited from it many years ago. But I never seem to align with the priorities of developers, bless their hearts. 

Andrew Downie

unread,
Apr 22, 2026, 8:44:54 PM (13 days ago) Apr 22
to nvda-...@nvaccess.org, Kavein Thran, joseph....@gmail.com
Neil, I fully appreciate your enthusiasm.  My doubt about having more details about regex in the NVDA manual is that it's hard enough to get people to read basic information.  Another resource like those dealing with NVDA and Word etc may be a more viable option.  While this discussion is around using in the NVDA speech dictionaries, regex has wider applications.  For example, I have just used in in a JavaScript resource and use it in several AutoHotKey scripts.  I don't see linking to an external resource as a barrier - just one press of Enter.  Just some food for thought.


Andrew


Reply all
Reply to author
Forward
0 new messages