Hi everyone,while working on the RISC-V ISA Explorer challenge I scanned the AsciiDoc source files from the ISA manual to extract extension names using regex patterns. One issue I ran into was false positives — author surnames like Zhang or Zabrocki and prose words like Scalar or Scatter matching the same pattern as Z-extensions. I handled this with a curated stopword list, but it feels fragile as the manual evolves. Is there a more robust approach — like parsing only section headers or extension definition blocks that the community uses when extracting structured data from AsciiDoc sources?
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To view this discussion visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/38ac20ce-e34b-4328-8f0e-46a86ed65917n%40groups.riscv.org.
@Andrew Waterman Do I infer correctly from the plural that such macros
will be used systematically in the future updates of the manual? I think that
will enable the automatic extraction of a lot of semantics as we care to define
macros for! (In addition to this specific case where the task was “to extract
extension names” and they “happen to help out…”.)
Thanks,
Ajit
====
From: 'Andrew Waterman' via RISC-V ISA Dev <isa...@groups.riscv.org>
Sent: Monday, May 18, 2026 2:06 PM
To: Yash Kaushik <yash005...@gmail.com>
Cc: RISC-V ISA Dev <isa...@groups.riscv.org>
Subject: Re: [isa-dev] Handling false positives in regex-based extension scanning of AsciiDoc sources
WARNING: This email originated from outside of Qualcomm. Please be wary of any links or attachments, and do not enable macros.
@Andrew Waterman Do I infer correctly from the plural that such macros
will be used systematically in the future updates of the manual? I think that
will enable the automatic extraction of a lot of semantics as we care to define
macros for! (In addition to this specific case where the task was “to extract
extension names” and they “happen to help out…”.)