I am writing to share the results of an automated code review of the Harbour core tree, carried out by Grok Build (xAI’s agentic coding environment in Cursor).
The review covered the compiler, VM, RTL, RDD drivers, hbmk2, CI configuration, bundled third-party libraries, and the manual test corpus. It was read-only: no pull requests or GitHub issues have been opened yet. I am posting here first to check whether these findings match known issues and what priority the project would assign before any patches are proposed.
───
About the review
Grok Build explored the repository systematically: source search for FIXME/TODO markers, inspection of critical subsystems (src/vm/, src/rtl/, src/rdd/, src/compiler/), CI workflows, contrib build logic, and bundled dependency versions. Findings were grouped by severity and area. This message is a summary; a longer write-up with detailed citations can be shared if useful.
───
Critical runtime issues
1. GC objects may be reused while other threads are still running (MT)
In src/vm/garbage.c, hb_gcCollectAll() calls hb_vmResumeThreads() before the destructor pass (clear()) and before freed blocks are released (HB_GARBAGE_FREE()).
hb_vmSuspendThreads() stops other threads so collection can proceed safely; the MTNOTE in the same function states that s_bCollecting and related state are only safe to change while other threads are stopped. Yet other threads are resumed while deleted blocks are still being torn down.
In practice, another thread can allocate or touch memory that is still being finalized — objects may be reused or referenced before cleanup has finished. That creates a race between GC teardown and normal VM execution: use-after-free, double-free, or corrupted object graphs in MT builds.
hb_vmResumeThreads() should run only after destructors and block freeing complete (still after hb_xclean()), so the full cleanup stays inside the window opened by a successful hb_vmSuspendThreads().
2. HRB loader — out-of-bounds read
In src/vm/runner.c, hb_hrbReadId() uses if (*pnBodyOffset > nBodySize) before reading. When *pnBodyOffset == nBodySize, the loop can still read one byte past the buffer end. Malformed or hostile .hrb files may trigger an out-of-bounds read. The analysis suggests using >= before each access, with consistent bounds checks throughout the loader.
3. HRB format — symbol scope truncated to 8 bits
In src/compiler/genhrb.c, symbol scope is written as (HB_BYTE) pSym->cScope (with an existing FIXME). src/vm/runner.c loads it into a 16-bit field. Upper scope flags are lost unless partially restored at link time. A versioned HRB format with at least 16-bit scope may be needed for correct linking and INIT/EXIT behaviour.
4. hb_dynsymEval() — callback outside dynsym lock
In src/vm/dynsym.c, the dynamic symbol pointer is passed to the user callback after HB_DYNSYM_UNLOCK(). In MT builds, another thread can resize the symbol table during the callback. hb_dynsymProtectEval() keeps the lock for the full walk — the two APIs appear inconsistent.
5. File I/O races (MT / shared RDD)
• src/rtl/filebuf.c — explicit FIXMEs on share-lock and close() races in MT mode.
• src/rtl/filesys.c — on platforms without pread/pwrite, seek+read/write are not atomic. Aliased work areas sharing one OS handle may interleave seeks and corrupt DBF/index I/O.
6. ASORT — signed index engine
In src/vm/asort.c, quicksort uses HB_ISIZ (signed) indices while arrays use HB_SIZE. Very large arrays risk index wrap and out-of-bounds access in pBaseArray->pItems.
───
Medium-severity runtime issues (summary)
• RDD: CDX update race around shared/network databases (dbfcdx1.c); NTX large-file open affecting unlocking (dbfntx1.c); incorrect DBOI_RECNO vs DBOI_KEYNORAW semantics (dbfcdx1.c); incomplete NSX cleanup on create failure; DBF PutRec allowing invalid deleted/encrypted markers; __dbArrange missing NULL/same-area validation (dbcmd.c); incomplete SKIPSCOPE() migration (workarea.c, dbf1.c).
• Threading: condition waits without while loops on some non-pthread paths; hb_threadQuitRequest() not guaranteed.
• Memory / GC: HB_PARANOID_MEM_CHECK off by default; hb_xquery() returning placeholder values on some Unix paths; hb_gcRefFree() edge cases around half-destroyed blocks.
• RTL: hb_fsCurDir() not thread-safe; broken keyboard handling in gtsln; extrap.c still using IsBadReadPtr on Windows.
• VM: virtual-memory APIs in vm.c are stubs returning 0/NULL.
doc/todo.txt also lists an open item: "Clean RDD code to be safe for return from RT errors" — several RDD findings may relate to that.
───
CI, build system, and dependencies
• CI checkout: all GitHub workflows use ref: ${{ github.event.client_payload.branch }}; for normal push/pull_request events this may be empty, risking builds of the wrong commit.
• Silent contrib skips: contrib/make.hb treats hbmk2 exit codes 10, 20, and 50 as non-fatal, so CI can pass while contribs are skipped.
• Legacy CI: .travis.yml remains with obsolete package names; GitHub Actions is the active pipeline.
• Workflow bugs: macos-ci.yml cache key references matrix.cpu but the matrix has no cpu dimension; HB_BUILD_TEST=strict is barely used outside contrib/hbssl.
• Test gap: CI runs mainly hbtest (plus one SQLite test on Linux). Large manual suites under tests/, tests/mt/, tests/rddtest/, and contrib/hbct/tests/ are not automated.
• Stale bundled libs: SQLite 3.44.2 (while sqlite3.hbp metadata still says 3.8.2), expat 2.2.5, PCRE 8.45, old minizip, libhpdf RC.
• Build security: hbmk2 backtick substitution in .hbc/.hbm can execute shell commands at build time — worth documenting or restricting.
• Documentation drift: README references contrib/hbqt which is not in the tree; outdated Debian package names in README vs current CI.
───
Suggested priorities (from the Grok Build report)
┌──────────┬────────────────────────────────────────────────────────────────────┐
│ Priority │ Area │
├──────────┼────────────────────────────────────────────────────────────────────┤
│ P0 │ MT GC: resume threads only after destructor pass and block freeing │
├──────────┼────────────────────────────────────────────────────────────────────┤
│ P0 │ HRB bounds checks and 16-bit symbol scope │
├──────────┼────────────────────────────────────────────────────────────────────┤
│ P1 │ dynsym eval locking; atomic RDD file I/O on non-Unix paths │
├──────────┼────────────────────────────────────────────────────────────────────┤
│ P1 │ CI checkout ref; contrib failure policy in CI │
├──────────┼────────────────────────────────────────────────────────────────────┤
│ P2 │ Automate hbct / rddtest / MT tests; refresh bundled deps and docs │
└──────────┴────────────────────────────────────────────────────────────────────┘
───
Questions for the list
1. Is the GC/thread ordering issue in garbage.c already known? Are there destructor or hb_xclean() paths that require other threads to run during cleanup?
2. How many of the remaining items are already tracked internally?
3. Is a new HRB format version acceptable, or should fixes stay within the current on-disk layout?
4. Which RDD drivers (CDX, NTX, NSX) are still actively maintained for multi-user deployments?
5. What is the preferred contribution path for Grok Build–assisted fixes: individual PRs per issue, or a focused stack in priority order?
If the list finds this useful, I can follow up with GitHub issues and/or PRs for agreed items. The complete Grok Build report (with file:line references) is available on request.
Thank you for maintaining Harbour.
Best regards,
Antonio