Automated code review findings (Grok Build) — MT GC ordering, HRB loader, RDD I/O, CI gaps

20 views
Skip to first unread message

Antonio Linares

unread,
7:39 AM (7 hours ago) 7:39 AM
to Harbour Developers
I am writing to share the results of an automated code review of the Harbour core tree, carried out by Grok Build (xAI’s agentic coding environment in Cursor).

The review covered the compiler, VM, RTL, RDD drivers, hbmk2, CI configuration, bundled third-party libraries, and the manual test corpus. It was read-only: no pull requests or GitHub issues have been opened yet. I am posting here first to check whether these findings match known issues and what priority the project would assign before any patches are proposed.

───

About the review

Grok Build explored the repository systematically: source search for FIXME/TODO markers, inspection of critical subsystems (src/vm/, src/rtl/, src/rdd/, src/compiler/), CI workflows, contrib build logic, and bundled dependency versions. Findings were grouped by severity and area. This message is a summary; a longer write-up with detailed citations can be shared if useful.

───

Critical runtime issues

1. GC objects may be reused while other threads are still running (MT)

In src/vm/garbage.c, hb_gcCollectAll() calls hb_vmResumeThreads() before the destructor pass (clear()) and before freed blocks are released (HB_GARBAGE_FREE()).

hb_vmSuspendThreads() stops other threads so collection can proceed safely; the MTNOTE in the same function states that s_bCollecting and related state are only safe to change while other threads are stopped. Yet other threads are resumed while deleted blocks are still being torn down.

In practice, another thread can allocate or touch memory that is still being finalized — objects may be reused or referenced before cleanup has finished. That creates a race between GC teardown and normal VM execution: use-after-free, double-free, or corrupted object graphs in MT builds.

hb_vmResumeThreads() should run only after destructors and block freeing complete (still after hb_xclean()), so the full cleanup stays inside the window opened by a successful hb_vmSuspendThreads().

2. HRB loader — out-of-bounds read

In src/vm/runner.c, hb_hrbReadId() uses if (*pnBodyOffset > nBodySize) before reading. When *pnBodyOffset == nBodySize, the loop can still read one byte past the buffer end. Malformed or hostile .hrb files may trigger an out-of-bounds read. The analysis suggests using >= before each access, with consistent bounds checks throughout the loader.

3. HRB format — symbol scope truncated to 8 bits

In src/compiler/genhrb.c, symbol scope is written as (HB_BYTE) pSym->cScope (with an existing FIXME). src/vm/runner.c loads it into a 16-bit field. Upper scope flags are lost unless partially restored at link time. A versioned HRB format with at least 16-bit scope may be needed for correct linking and INIT/EXIT behaviour.

4. hb_dynsymEval() — callback outside dynsym lock

In src/vm/dynsym.c, the dynamic symbol pointer is passed to the user callback after HB_DYNSYM_UNLOCK(). In MT builds, another thread can resize the symbol table during the callback. hb_dynsymProtectEval() keeps the lock for the full walk — the two APIs appear inconsistent.

5. File I/O races (MT / shared RDD)

• src/rtl/filebuf.c — explicit FIXMEs on share-lock and close() races in MT mode.
• src/rtl/filesys.c — on platforms without pread/pwrite, seek+read/write are not atomic. Aliased work areas sharing one OS handle may interleave seeks and corrupt DBF/index I/O.

6. ASORT — signed index engine

In src/vm/asort.c, quicksort uses HB_ISIZ (signed) indices while arrays use HB_SIZE. Very large arrays risk index wrap and out-of-bounds access in pBaseArray->pItems.

───

Medium-severity runtime issues (summary)

• RDD: CDX update race around shared/network databases (dbfcdx1.c); NTX large-file open affecting unlocking (dbfntx1.c); incorrect DBOI_RECNO vs DBOI_KEYNORAW semantics (dbfcdx1.c); incomplete NSX cleanup on create failure; DBF PutRec allowing invalid deleted/encrypted markers; __dbArrange missing NULL/same-area validation (dbcmd.c); incomplete SKIPSCOPE() migration (workarea.c, dbf1.c).
• Threading: condition waits without while loops on some non-pthread paths; hb_threadQuitRequest() not guaranteed.
• Memory / GC: HB_PARANOID_MEM_CHECK off by default; hb_xquery() returning placeholder values on some Unix paths; hb_gcRefFree() edge cases around half-destroyed blocks.
• RTL: hb_fsCurDir() not thread-safe; broken keyboard handling in gtsln; extrap.c still using IsBadReadPtr on Windows.
• VM: virtual-memory APIs in vm.c are stubs returning 0/NULL.

doc/todo.txt also lists an open item: "Clean RDD code to be safe for return from RT errors" — several RDD findings may relate to that.

───

CI, build system, and dependencies

• CI checkout: all GitHub workflows use ref: ${{ github.event.client_payload.branch }}; for normal push/pull_request events this may be empty, risking builds of the wrong commit.
• Silent contrib skips: contrib/make.hb treats hbmk2 exit codes 10, 20, and 50 as non-fatal, so CI can pass while contribs are skipped.
• Legacy CI: .travis.yml remains with obsolete package names; GitHub Actions is the active pipeline.
• Workflow bugs: macos-ci.yml cache key references matrix.cpu but the matrix has no cpu dimension; HB_BUILD_TEST=strict is barely used outside contrib/hbssl.
• Test gap: CI runs mainly hbtest (plus one SQLite test on Linux). Large manual suites under tests/, tests/mt/, tests/rddtest/, and contrib/hbct/tests/ are not automated.
• Stale bundled libs: SQLite 3.44.2 (while sqlite3.hbp metadata still says 3.8.2), expat 2.2.5, PCRE 8.45, old minizip, libhpdf RC.
• Build security: hbmk2 backtick substitution in .hbc/.hbm can execute shell commands at build time — worth documenting or restricting.
• Documentation drift: README references contrib/hbqt which is not in the tree; outdated Debian package names in README vs current CI.

───

Suggested priorities (from the Grok Build report)

┌──────────┬────────────────────────────────────────────────────────────────────┐
│ Priority │ Area                                                               │
├──────────┼────────────────────────────────────────────────────────────────────┤
│ P0       │ MT GC: resume threads only after destructor pass and block freeing │
├──────────┼────────────────────────────────────────────────────────────────────┤
│ P0       │ HRB bounds checks and 16-bit symbol scope                          │
├──────────┼────────────────────────────────────────────────────────────────────┤
│ P1       │ dynsym eval locking; atomic RDD file I/O on non-Unix paths         │
├──────────┼────────────────────────────────────────────────────────────────────┤
│ P1       │ CI checkout ref; contrib failure policy in CI                      │
├──────────┼────────────────────────────────────────────────────────────────────┤
│ P2       │ Automate hbct / rddtest / MT tests; refresh bundled deps and docs  │
└──────────┴────────────────────────────────────────────────────────────────────┘

───

Questions for the list

1. Is the GC/thread ordering issue in garbage.c already known? Are there destructor or hb_xclean() paths that require other threads to run during cleanup?
2. How many of the remaining items are already tracked internally?
3. Is a new HRB format version acceptable, or should fixes stay within the current on-disk layout?
4. Which RDD drivers (CDX, NTX, NSX) are still actively maintained for multi-user deployments?
5. What is the preferred contribution path for Grok Build–assisted fixes: individual PRs per issue, or a focused stack in priority order?

If the list finds this useful, I can follow up with GitHub issues and/or PRs for agreed items. The complete Grok Build report (with file:line references) is available on request.

Thank you for maintaining Harbour.

Best regards,
Antonio

Gerald Drouillard

unread,
8:32 AM (6 hours ago) 8:32 AM
to harbou...@googlegroups.com
On Sat, Jun 27, 2026 at 7:39 AM Antonio Linares <antonio....@gmail.com> wrote:

Questions for the list

1. Is the GC/thread ordering issue in garbage.c already known? Are there destructor or hb_xclean() paths that require other threads to run during cleanup?
True MT harbour dbfcdx apps always take more effort than expected.  Some areas of long running code without wait states do require hb_gcall() calls.  For me hb_curl required the most work to get it to work correctly when dealing with multiple calls or instances.

2. How many of the remaining items are already tracked internally?
IMO, harbour is very stable.

  
4. Which RDD drivers (CDX, NTX, NSX) are still actively maintained for multi-user deployments?
I have large decades old apps using DBFCDX.  For web apps with the DBF's on the same server as the CGI app, I think DBFCDX is very solid because you remove the network layer overhead and latency.  Pre-adding blank records during non peak times can really boost performance and stability.  Some SQL engines do something similar.
I converted a large DBFCDX app (cgi and console) to SQLRDD (pg 17), Data Volume: 254 GB of historical text data not including the images, Tables Migrated: 496,Records Processed: 515,198,426.  Performance Improvements
✓ Query response times will be maintained when concurrent load increases
 ✓ Concurrent user capacity increased from 50 to 500+
Modern Features Enabled
✓ Real-time data replication, onsite and off, and high availability
Security Enhancements
✓ Data is encrypted at rest
✓ Modernized the encryption of sensitive data
 
5. What is the preferred contribution path for Grok Build–assisted fixes: individual PRs per issue, or a focused stack in priority order?
I can see the size, scope and frequency of changes to all languages, apps and OS's increasing in the next months.  Harbour is very mature in that when something is modified the blast radius is usually very isolated.  Something at this level needs a new version, like how the Linux kernel is released.
 

If the list finds this useful, I can follow up with GitHub issues and/or PRs for agreed items. The complete Grok Build report (with file:line references) is available on request.

Thank you for all you do for the community Antonio.

Antonio Linares

unread,
9:00 AM (6 hours ago) 9:00 AM
to Harbour Developers
I have checked the same using Claude Code, just to check if they both report the same:

I'm Claude Code (an AI coding assistant from Anthropic) against a current checkout of the master branch. We performed a focused static audit of the core C sources — src/vm, src/rtl, src/rdd, src/common, src/pp, src/macro, src/compiler — looking specifically for memory-safety and arithmetic defects (buffer overruns, integer/cast issues, NULL derefs, leaks, signed/unsigned mistakes).

First, a caveat: the code is very mature and overwhelmingly correct. The vast majority of files came back clean. The findings below are the small set that survived manual verification against the source. I've verified the top items by hand, but I have not built or run a test for each, so please treat them as candidates for your review rather than confirmed reports.

Logic defects reachable in normal use

1. src/rdd/dbf1.c:525 — cast precedence in the AUTOINC blank-record path:
pPtr[ --ui ] = ( HB_BYTE ) nValue % 10 + '0';   /* parses as ((HB_BYTE)nValue) % 10 */
1. nValue is truncated to 8 bits before the modulo, so numeric auto-increment fields produce wrong ASCII digits once the counter passes 255. The two sibling routines (:1207, :1328) already do it correctly. Suggested fix: ( HB_BYTE ) ( nValue % 10 ) + '0'.
2. src/rdd/dbfnsx1.c:7704 and src/rdd/dbfntx1.c:7330 — DBOI_RELKEYPOS cast precedence:
ulRecNo = ( HB_ULONG ) dPos * ulRecCount + 1;   /* (HB_ULONG)dPos == 0 for 0 < dPos < 1 */
2. With no controlling order, any fractional position collapses to record 1 instead of a proportional record. Suggested fix: ( HB_ULONG ) ( dPos * ulRecCount ) + 1.
3. src/vm/hvm.c:11012 — copy/paste in hb_xvmGreaterEqualThenIntIs: the >= path raises subcode 1074 / operator "<=". The sibling at :10958 uses 1076 / ">=". Cosmetic but produces a misleading error object.

Crashes reachable from ordinary or hostile input

4. src/vm/hashes.c:223 — hb_hashResort() calls hb_xgrab( nLen * sizeof(...) ) with nLen == 0, which raises HB_EI_XGRABNULLSIZE and aborts the process. Reachable from documented PRG:
clipper
hb_HSort( hb_HAllocate( hb_Hash(), 10 ) )
hb_HKeepOrder( hb_HAllocate( hb_Hash(), 10 ), .F. )
4. Suggested fix: early return when pBaseHash->nLen == 0 (the only caller that frees pnPos does so after the call, so this is safe).
5. src/common/expropt2.c — constant folding of / (:206) and % (:147) guards only the zero divisor, not -1. Compiling ? 0x8000000000000000 / -1 (or % -1) evaluates LONG_MIN / -1 at compile time — signed overflow, SIGFPE on x86/x64, i.e. a compiler crash. The + - * folders promote to double and range-check; unary minus special-cases LONG_MIN; div/mod are the only unguarded arithmetic folders. Suggested fix: special-case divisor -1 (mod result is 0; div emits the value as a double when the dividend is LONG_MIN).
6. src/rdd/delim1.c:730-731 — for a trailing TIMESTAMP/TIME/MODTIME field, pFieldPtr[ pField->uiLen ] reads and writes one byte past the DELIM record buffer (allocated uiRecordLen + 1 then pRecord++). The equivalent SDF code is safe because SDF reserves the EOL bytes. Suggested fix: allocate one extra guard byte in hb_delimInitArea.

Hardening against crafted/corrupt input (lower priority)

- src/rtl/itemseri.c and src/rtl/hbjson.c (:579, :615) — unbounded recursion in deserialize / JSON-decode; deeply nested input overflows the C stack. hb_deserializeTest already prevents OOB reads, so only a depth cap is needed.
- src/rdd/dbfntx1.c — hb_ntxStrToNum copies an on-disk KeyLength into szBuffer[NTX_MAX_KEY + 1] with no upper-bound check; a crafted .ntx overflows the stack buffer. Validating key_size <= NTX_MAX_KEY in hb_ntxTagLoad would close it.
- src/rdd/sxtable.c:470,473 (_SXOPENINIT) — argument indices look crossed: hb_parl( 2 ) guarded by HB_ISLOG( 3 ), and hb_parc( 1 ) guarded by HB_ISCHAR( 4 ).
- src/rdd/dbfnsx1.c:2486, src/vm/memvars.c:1638 (RESTORE FROM), src/rtl/hbstrfmt.c (unbounded hb_StrFormat width) — pointer-arithmetic / unbounded-length issues that are latent or only triggerable by hostile files; details on request.

I can prepare patches for any of these if useful. Happy to provide the full per-file reasoning, repro cases, or narrow down anything that looks off.

Thanks for all the work on Harbour,

Antonio Linares

unread,
9:15 AM (6 hours ago) 9:15 AM
to Harbour Developers
Part 2 — Cross-check of the Grok Build report

We verified Grok's two "P0 runtime" items against the source. Both look like false positives, and the CI/dependency observations are the stronger part of that report.

- GC thread ordering (garbage.c, Grok P0): the order it flags is intentional. Destructors (clear(), line 686) run user-level PRG (__destruct) and must execute with the VM live, so hb_vmResumeThreads() (line 677) deliberately precedes them. The deleted blocks are already unreachable; s_bCollecting stays HB_TRUE through the whole teardown (until line 714), so concurrent collectors bail (lines 581/585); and the resurrection re-check at lines 697-707 handles a destructor re-referencing a block. The MTNOTE it cites concerns s_bCollecting, which is held throughout. We don't see the UAF/double-free. Confirmation from the maintainers would be welcome, but it reads as by-design.
- HRB loader OOB read (runner.c hb_hrbReadId, Grok P0): harmless in practice. hb_fileLoadData always NUL-terminates the buffer at [nBodySize] (filebuf.c:1615), and HB character strings are NUL-terminated too, so the > nBodySize check stops on that guard byte before any real over-read. >= would be marginally cleaner but fixes no actual bug.
- HRB 8-bit symbol scope; dynsymEval locking; filebuf.c/filesys.c I/O races: these correspond to existing FIXME markers / known design limits rather than new findings — worth tracking, but already on record.
- CI / build / dependencies: this is the most actionable part and matches what we'd flag too — the empty client_payload.branch checkout ref, contrib/make.hb treating hbmk2 exit codes 10/20/50 as non-fatal, the macos-ci.yml matrix.cpu cache key with no matching matrix dimension, stale bundled libs (e.g. SQLite metadata still says 3.8.2 while the bundled lib is 3.44.2), and untested tests/mt / tests/rddtest corpora.

---
Questions

1. Are items 1, 2, and 6 already known? They look like clear shipping defects to us.
2. For the GC ordering and HRB-loader points raised by Grok — can you confirm the GC resume order is intentional and the HRB buffer is always NUL-terminated, so we can lay those to rest?
3. Preferred contribution path — one PR per fix, or a small stack in priority order? I can prepare patches for any of items 1-6.

Full per-file reasoning and repro cases available on request.

Thanks for maintaining Harbour,
Reply all
Reply to author
Forward
0 new messages