Seeking feedback on a semantic fragmentation approach as a post-decryption reconstruction barrier for HNDL

198 views

Skip to first unread message

Aiden Tejada

unread,

Apr 9, 2026, 11:26:14 AMApr 9

to pqc-forum

Hi everyone,

I'm a first-year CS and Applied Mathematics student at Trinity College, and I'm working on something I'd genuinely love feedback on from people who actually know this field.

Quick background: I have no formal cryptography training. I've been self-studying, reading NIST documentation, and building independently. I'm sharing this here because I want to know if the direction is worth pursuing or if I'm missing something fundamental before I go further.

Most PQC solutions address Harvest Now Decrypt Later by making encryption harder to break. My question was: what happens if the encryption gets broken anyway, ten or twenty years from now? The data is still sitting there, decryptable.

I started thinking about whether I could add a layer that makes data structurally "useless": unrecoverable even after decryption. Not just harder to read, but impossible to reconstruct without a specific key that was never stored anywhere.

That's what I've been building. I call it SCRMBL.

How it works at a high level:

Data gets semantically split at meaningful contextual boundaries by a lightweight ML model rather than arbitrarily. Each fragment gets encrypted with a unique AES-256 key derived from a context key. Fragments are distributed across separate storage locations. Storage locations are also derived from the context key. Ideally, if a hacker were to steal the entire database, they'd be left with no map, no index, no metadata to connect the fragments to each other in storage.

The context key is a sequence of 12 words drawn from a library of approximately 300,000 English words. The same sequence always produces the same fragmentation map and the same per-fragment encryption keys. Different sequence or different order — completely different everything.

Per-fragment key derivation works like this:

Fragment_key_n = SHA-256(context_key_string + fragment_index_n)

The permutation space of 300,000^12 makes brute force infeasible, even accounting for Grover's algorithm, reducing the effective search space.

What I've actually built so far

I want to be clear, I'm at an early stage. What exists as working code right now is the key management infrastructure:

Cryptographically secure 12-word context key generation using Python's secrets module
AES-256-CBC encryption of the context key with unique IV per encryption
Shamir's Secret Sharing implementation (3-of-5 threshold) as a disaster recovery mechanism for the context key only. The shares are designed for physical distribution to separate trusted parties, not stored on the same system
A secure portal that holds the context key. The key is never in the hands of the end user. When a user authenticates biometrically, the portal releases the key to the local system for the duration of the reconstruction process only — then wipes it from memory

The fragmentation engine, per-fragment encryption, and distributed storage pipeline are in active development.

The questions I'm stuck on

Per-fragment key derivation uses SHA-256 which is fast by design. Even with 300,000^12 entropy in the context key — is this a meaningful vulnerability? Should I be using HKDF instead?
The reconstruction barrier claim depends on semantically fragmented plaintext being hard to reassemble without the map. ML models have reconstructed shredded documents. Is semantic fragmentation meaningfully harder to attack, or am I overstating the barrier?
The boundary detector must produce identical output years after the original fragmentation. Model versioning is my identified critical dependency. Is there prior work on immutable model deployment for long-lived data systems I should be studying?
Is there existing literature on using semantic fragmentation specifically as a security primitive? The closest I've found is in document reconstruction research, but I haven't found this combination addressed formally.

I have a meeting coming up with Trinity's CS Department Chair — a cryptography researcher with NSF funding —, and I want to walk in with as honest a picture of the gaps as possible.

I'm not here to pitch anything. I'm here because I want to know what I'm missing.