Remzar: Live ML-DSA / ML-KEM L1 Ledger for Verified Data.

203 views

Skip to first unread message

rem

unread,

Jun 27, 2026, 7:36:55 AMJun 27

to pqc-forum

Dear PQC Forum,

I wanted to share a concrete implementation report from a live blockchain system, since recent discussions have touched on post-quantum signatures, ML-DSA / ML-KEM deployment, blockchain throughput, larger message sizes, Merkle commitments, and whether these topics are still theoretical or already being implemented.

I am the developer of Remzar, a Rust Layer-1 blockchain. The node chain is live. This is not a proposal, not only a whitepaper, and not only a toy demo. I implemented the post-quantum pieces directly into the wallet layer, keypair layer, transaction batch layer, block metadata layer, block validation layer, and P2P handshake/sync layer.

Source code:

https://github.com/remzarchain/remzar

The implementation uses:

FIPS 204 / ML-DSA-65 through fips204::ml_dsa_65

FIPS 203 / ML-KEM-768 through fips203::ml_kem_768

BLAKE3 XOF with 64-byte output as the canonical chain hash / commitment width

ML-DSA-65 for wallet, guardian, block, and batch/Merkle-root signatures

ML-KEM-768 for post-quantum P2P session establishment

Merkle-root batching to reduce repeated ML-DSA-65 signature overhead

RocksDB-backed chain/state storage

bounded in-memory state retention

bounded P2P queues and pending request tables

defensive serialization, size, timestamp, replay, and panic-containment guardrails

I am not claiming that Remzar itself is NIST-certified, FIPS-validated, or endorsed by NIST. The accurate claim is that Remzar is a live Rust Layer-1 blockchain implementation that integrates implementations of the NIST-standardized ML-DSA and ML-KEM algorithm families into a working node chain.

1. Why this matters

The practical problem I ran into is that post-quantum cryptography is not just an algorithm-selection issue. In a blockchain/node system, the real problems are:

large signatures

large public keys

large private keys

large P2P handshake material

larger serialized blocks/batches

untrusted peer input

resource exhaustion

malformed wire messages

state growth

sync queues

pending request leaks

replay attacks

timestamp skew

node reboot repair

chain identity validation

deterministic consensus validation

most important: memory leaks

The core cryptographic primitive can be correct, but the node can still fail if the surrounding infrastructure is not bounded.

So the implementation work became two things at once:

1. Integrate ML-DSA-65 and ML-KEM-768.

2. Build guardrails so live nodes can survive malformed data, large PQC objects, network churn, sync retries, and long-running state growth.

2. ML-DSA-65 concrete byte sizes

The code uses fips204::ml_dsa_65.

The concrete ML-DSA-65 byte sizes used in the Remzar wallet and signature layers are:

ML-DSA-65 secret key: 4032 bytes

ML-DSA-65 public key: 1952 bytes

ML-DSA-65 signature: 3309 bytes

This matters because these are not small fields in a blockchain. A 3309-byte signature is very different from a classical 64-byte Ed25519-style signature when placed into transactions, blocks, P2P messages, or archived state.

The wallet file treats these values as consensus-relevant engineering facts, not comments only:

secret key bytes = ml_dsa_65::SK_LEN

public key bytes = ml_dsa_65::PK_LEN

signature bytes = ml_dsa_65::SIG_LEN

3. Wallet identity: compact address from a large PQ public key

A full ML-DSA-65 public key is 1952 bytes. I did not use the raw public key as the wallet address. Instead, the wallet address is a 64-byte commitment to the public key:

wallet_address = "r" || hex(BLAKE3-XOF(64)(ML-DSA-65 public_key_bytes))

This gives:

1 prefix character: "r"

128 lowercase hex characters

129 total characters

The math is:

BLAKE3-XOF(64) output = 64 bytes = 512 bits

64 bytes encoded as hex = 128 hex characters

"r" + 128 chars = 129-character Remzar wallet address

The design goal was:

Keep wallet identifiers compact and fixed-format.

Keep the full ML-DSA-65 public key available for verification.

Bind the wallet address to the public key.

Reject malformed or mismatched wallet/public-key pairs.

Use one canonical wallet format across the chain.

The validator enforces:

address length == 129

prefix == "r"

body length == 128

body characters are lowercase hex only

address matches BLAKE3-XOF(64)(public_key_bytes)

public key bytes parse as ML-DSA-65 public key bytes

This avoids placing a 1952-byte public key everywhere an address is needed, while still keeping a cryptographic binding between the compact address and the actual ML-DSA-65 public key.

4. ML-DSA-65 keypair hardening

The ML-DSA-65 keypair wrapper stores:

secret_bytes: [u8; ml_dsa_65::SK_LEN],

public_bytes: [u8; ml_dsa_65::PK_LEN],

The wrapper does several things beyond just calling the library:

redacts key material in Debug output

zeroizes secret and public bytes on drop

validates public key parsing

validates secret key parsing

derives public key bytes from secret key bytes

checks that stored public key == derived public key

checks that parsed public key serializes canonically

rejects malformed secret/public material

contains panics around key parsing and public-key derivation

uses a global panic-hook lock while suppressing panic noise during rejected key parsing

enforces a hard validation budget of 750 ms

supports fault-injection hooks through REMZAR_FAIL_* environment variables

The keypair validation flow is approximately:

start timer

parse public key bytes

reject if budget exceeded

parse secret key bytes

reject if budget exceeded

derive public key from secret key

reject if budget exceeded

compare stored public key to derived public key

compare parsed public key serialization to derived public key

reject if mismatch

This was important because untrusted key material should not be able to crash or stall a live node. The code treats malformed key bytes as invalid input, not as a reason to panic the process.

5. Secret-key storage: Argon2id + AES-256-GCM

The wallet stores encrypted raw ML-DSA-65 secret bytes, not plaintext private keys.

The storage pattern is:

plaintext = raw ML-DSA-65 secret key bytes

plaintext length = 4032 bytes

key = Argon2id(passphrase, salt)

encrypted_blob = salt || nonce || AES-256-GCM(ciphertext || tag)

The constants used are:

AES-256 key: 32 bytes

salt: 16 bytes

nonce: 12 bytes

GCM tag: 16 bytes

raw ML-DSA-65 secret: 4032 bytes

minimum encrypted blob for raw ML-DSA-65 secret: 4076 bytes

I added storage and input guardrails:

passphrase must not be empty

passphrase cap: 16 KiB

plaintext cap: 1 MiB

encrypted blob cap: 16 MiB

config must support at least 4032-byte raw ML-DSA-65 secrets

config must support the encrypted blob minimum

encrypted input must be at least salt + nonce + GCM tag

encrypted input is split only after layout validation

This is one of the areas where PQC changes normal assumptions. A 4032-byte private key is large enough that secret handling, storage layout, and configuration bounds must be explicit.

6. The ML-DSA-65 signature-size problem

The biggest practical blockchain issue is signature size.

For ML-DSA-65:

signature length = 3309 bytes

If a blockchain signs every transaction at a guardian/batch layer with one ML-DSA-65 signature each, the signature overhead is:

overhead(n) = n * 3309 bytes

For example:

100 transactions -> 330,900 bytes of signatures

1,000 transactions -> 3,309,000 bytes of signatures

10,000 transactions -> 33,090,000 bytes of signatures

That is before considering transaction payloads, public keys, P2P framing, storage, indexes, or archival overhead.

So my design does not try to “compress ML-DSA.” Instead, it moves the guardian/batch signature boundary to a compact commitment.

7. Merkle-root batch signing

For the batch/guardian layer, Remzar signs one 64-byte Merkle root instead of signing each transaction separately at that layer.

The flow is:

1. Serialize each transaction canonically.

2. Hash each serialized transaction with BLAKE3-XOF(64).

3. Build a Merkle tree from the 64-byte leaves.

4. Compute one 64-byte Merkle root.

5. Sign that Merkle root once with ML-DSA-65.

6. Store the 3309-byte guardian signature in the batch/block metadata.

Mathematically:

leaf_i = BLAKE3-XOF(64)(serialize(tx_i))

parent = BLAKE3-XOF(64)(left_64 || right_64)

root = MerkleRoot(leaf_0, leaf_1, ..., leaf_n)

guardian_signature = ML-DSA-65.Sign(sk, root)

Signature overhead becomes:

overhead_batch = 3309 bytes

instead of:

overhead_naive = n * 3309 bytes

The saving is:

saving(n) = (n - 1) * 3309 bytes

Examples:

n = 100:

saving = 99 * 3309 = 327,591 bytes

n = 1,000:

saving = 999 * 3309 = 3,305,691 bytes

n = 10,000:

saving = 9,999 * 3309 = 33,086,691 bytes

This is not a claim that Merkle-root signing replaces all transaction authorization semantics. The security boundary is:

The guardian/block signer signs a commitment to the exact serialized transaction set.

The Merkle root commits to the transaction ordering and contents.

Transaction validity, account authorization, double-spend detection, reward rules, committee rules, and block validation are handled by the validation layers.

8. Merkle tree details

Remzar uses 64-byte BLAKE3-XOF outputs as canonical hashes.

Leaf:

leaf = BLAKE3-XOF(64)(transaction_bytes)

Parent:

parent = BLAKE3-XOF(64)(left_64 || right_64)

Odd node count:

If a level has an odd number of nodes, duplicate the last node.

Empty batch:

If the batch is empty, inject a deterministic dummy leaf marker.

Merkle proof structure:

MerkleProof {

transaction_hash: Hash64,

sibling_hashes: Vec<Hash64>,

path: Vec<bool>,

merkle_root: Hash64,

}

Merkle proof guardrails:

encoded proof cap: 256 KiB

absolute proof depth cap: 4096

derived proof depth cap based on MAX_BATCH_ITEMS

sibling_hashes.len() must equal path.len()

hash count must not exceed MAX_BATCH_ITEMS

tree levels must not be empty

final Merkle level must contain exactly one root

This is important because Merkle proofs are externally supplied data in many systems. A proof verifier should not be an unbounded allocation or deep-recursion attack surface.

9. Batch-signature verification order

One subtle implementation detail: the batch verifier checks cheap invalid conditions before doing expensive hashing/Merkle work.

The verification order is:

1. Check signature length.

2. Check batch item count and byte bounds.

3. Hash each transaction.

4. Compute the Merkle root.

5. Convert signature bytes to fixed-size ML-DSA-65 signature array.

6. Verify ML-DSA-65 signature over the root.

The reason is simple:

malformed signature bytes should not force expensive batch hashing

oversized batch input should not force Merkle construction

invalid structure should fail before expensive cryptographic work

This is one of the practical lessons from deploying large PQC objects in node software: validation order matters.

10. Guardian signatures and block signatures

The guardian signature system uses the same model:

batch data -> transaction hashes -> Merkle root -> ML-DSA-65 signature

A block can also sign over serialized metadata and batch key data. The block signing path:

1. Serialize metadata + batch key.

2. Sign the serialized signing payload through GuardianSignature.

3. Enforce signature length == ml_dsa_65::SIG_LEN.

4. Copy into fixed [u8; ml_dsa_65::SIG_LEN].

5. Embed into metadata.

6. Recompute the block hash after embedding the signature.

This makes the block hash commit to the guardian signature and metadata together.

Block signature verification:

1. Serialize the same signing payload.

2. Verify GuardianSignature over that payload.

3. Recompute and compare block hash.

11. Block metadata structure

Block metadata contains:

index: u64

timestamp: u64

previous_hash: [u8; 64]

merkle_root: [u8; 64]

guardian_signature: [u8; ml_dsa_65::SIG_LEN]

puzzle_proof: Option<BlockPuzzleProof>

size: u64

The important thing here is that the guardian signature is a fixed-size ML-DSA-65 signature array, not an unbounded vector in metadata.

Structural metadata validation rejects:

genesis metadata with nonzero previous_hash

non-genesis metadata with zero previous_hash

zero Merkle root

all-0xFF Merkle root

all-0xFF previous_hash

all-zero guardian signature on non-genesis metadata

all-0xFF guardian signature

metadata size < 64

metadata size > MAX_BLOCK_SIZE

non-genesis merkle_root == previous_hash

timestamps before the project lower bound

genesis metadata with puzzle proof

puzzle proof height mismatch

puzzle proof previous-hash mismatch

This prevents obvious corrupt/sentinel values from entering consensus state.

12. Canonical 64-byte hash design

Remzar uses BLAKE3-XOF(64) as the canonical hash/commitment width.

That gives:

64 bytes = 512 bits

hex encoding = 128 lowercase hex characters

This same width is used for:

block hash

previous hash

Merkle root

genesis hash

wallet commitment

batch key digest

puzzle commitment

The benefit is consistency. The node does not mix 32-byte and 64-byte consensus identifiers internally except where explicit legacy compatibility exists.

13. Genesis and chain identity

The genesis block uses 64-byte canonical hashes.

Genesis validation requires:

prev_hash == [0u8; 64]

prev_hash != [0xFFu8; 64]

merkle_root != [0u8; 64]

merkle_root != [0xFFu8; 64]

The genesis preimage includes a zeroed guardian-signature-sized field:

ZERO_GUARDIAN_SIGNATURE: [u8; ml_dsa_65::SIG_LEN]

This keeps the genesis preimage structurally aligned with normal block hashing while remaining deterministic.

The P2P version handshake also carries the expected genesis hash. Peers are not admitted only because they speak the protocol; they must match the expected protocol version and genesis identity.

14. Transaction batch finalization

The batch finalization path does two size checks:

1. Sum of transaction serialized sizes.

2. Actual serialized bytes that will be stored/transmitted.

The second check matters. Logical size estimates can be wrong if serialization overhead or optional fields grow. The implementation verifies the actual postcard-serialized batch bytes before finalization.

The finalization path:

validate MAX_BLOCK_SIZE conversion

reject if total_size() > MAX_BLOCK_SIZE

serialize actual batch for storage

reject if actual serialized batch bytes > MAX_BLOCK_SIZE

sign batch with ML-DSA-65 guardian signature

compute 64-byte Merkle root

construct BlockMetadata

This prevents an attacker or bug from slipping oversized serialized data through a logical-size-only check.

15. Block serialization and padding safety

The block storage path also checks actual serialized bytes:

serialize block with postcard

reject if serialized block bytes > MAX_BLOCK_SIZE

On deserialization, the code:

validates storage length first

tries strict postcard decode

falls back to padded postcard decode only if needed

rejects non-zero trailing bytes

normalizes and validates after decode

can return both actual_size_bytes and stored_size_bytes

rejects actual_size_bytes == 0

The reason this exists is that long-running chain software eventually encounters old data layouts, padded records, migrations, or corrupted bytes. The node should distinguish:

actual serialized payload size

stored RocksDB byte length

trailing padding

non-zero trailing corruption

16. ML-KEM-768 P2P handshake

For P2P session establishment, I implemented an ML-KEM-768 handshake using fips203::ml_kem_768.

The suite is identified as:

suite_id: 0x0301

suite_name: "ML-KEM-768/FIPS203-0.4.3"

The P2P PQ constants include:

shared secret length: 32 bytes

maximum PQ wire payload: 16 KiB

nonce length: 32 bytes

default replay filter capacity: 4096

minimum replay filter capacity: 16

maximum replay filter capacity: 65,536

default message age window: 120 seconds

maximum tolerated future clock skew: 10 seconds

hard maximum configured message age: 10 minutes

The offer message is:

PqKemOffer {

suite_id: u16,

created_at_unix_secs: u64,

nonce: Vec<u8>,

ek: Vec<u8>,

}

The accept message is:

PqKemAccept {

suite_id: u16,

offer_nonce: Vec<u8>,

created_at_unix_secs: u64,

ct: Vec<u8>,

}

Offer validation checks:

suite id matches expected suite

timestamp is fresh

timestamp is not too far in the future

nonce length == 32

nonce is not all zero

encapsulation key length == EK_LEN

encapsulation key is not all zero

encapsulation key parses as ML-KEM-768 encapsulation key

Accept validation checks:

suite id matches expected suite

timestamp is fresh

timestamp is not too far in the future

offer_nonce length == 32

offer_nonce equals the expected original offer nonce

ciphertext length == CT_LEN

ciphertext is not all zero

ciphertext parses as ML-KEM-768 ciphertext

The replay filter is:

HashSet<[u8; 32]> for seen nonces

VecDeque<[u8; 32]> for eviction order

capacity clamped to [16, 65,536]

duplicate nonce => ReplayDetected

After a session key is used to mark the peer PQ-ready, the code zeroizes the temporary session key.

This matters because ML-KEM by itself is key establishment, not identity. I treat it as one part of a broader P2P admission flow that also includes protocol version, genesis hash, peer admission, pending request tracking, and sync state.

17. P2P admission and sync after PQ readiness

The node does not immediately sync from a peer just because the peer connects.

The high-level flow is:

1. Peer connects.

2. Version handshake validates protocol version, services, user agent shape, chain height, and genesis hash.

3. Connection/admission guards run.

4. PQ ML-KEM handshake runs.

5. Peer is marked PQ-ready.

6. Queued sync target resumes only after PQ readiness.

If the peer fails a protocol rule, the node:

reports misbehavior

cleans pending requests for that peer

clears PQ peer state

updates peerbook failure state

disconnects the peer

The implementation also checks response/request peer mismatch. If a response arrives from a peer different from the peer associated with the request ID, the node treats it as a protocol violation.

18. P2P memory/resource guardrails

This is an area I had to address directly because live P2P nodes can slowly leak memory through queues, pending request maps, retry tables, and peer side tables.

The sync builder defines caps such as:

MAX_PENDING_VERSIONS = 512

MAX_PENDING_PQ = 512

MAX_PENDING_BLOCKS = 512

MAX_PENDING_BATCHES = 512

MAX_BLOCK_QUEUE = 1024

MAX_BATCH_QUEUE = 1024

MAX_HEIGHT_POLL_PEERS = 128

MAX_AUTODIAL_PEERS_PER_TICK = 32

MAX_AUTODIAL_ADDRS_PER_PEER = 3

MAX_MULTIADDR_BYTES = 256

MAX_PEERBOOK_KAD_SEED_PEERS = 256

MAX_KAD_ADDRS_PER_PEER = 8

MAX_TRACKED_DIAL_ATTEMPTS = 4096

MAX_RUNTIME_PEER_SIDE_TABLES = 8192

There is housekeeping to:

trim block_queue to MAX_BLOCK_QUEUE

trim batch_queue to MAX_BATCH_QUEUE

prune old dial-attempt timestamps

clear advisory dial tracking if still too large

clear pq_ready_peers if the side table exceeds cap

clear admitted_peers if the side table exceeds cap

clear peer_ip if the side table exceeds cap

The PQ state cleanup also removes:

pq_initiators[peer]

pq_ready_peers[peer]

pending_pq entries for that peer

This directly addresses a common live-node issue: stale handshake state and pending request maps can grow forever if failed peers are not cleaned up.

19. Account-state memory growth and compact snapshots

I also addressed memory/state growth in the account model.

Originally, it is easy for an account-tree structure to retain too much historical block data in memory or serialize too much historical state. The current design separates compact chain state from historical blocks.

The in-memory InnerTree tracks:

balances

tip_height

tip_hash

prev_tip_hash

has_tip

recent blocks cache

total_issued_micro

rewards_issued_micro

reserved_issued_micro

The compact persisted AccountStateSnapshot tracks:

version

balances

tip_height

tip_hash

prev_tip_hash

has_tip

total_issued_micro

rewards_issued_micro

reserved_issued_micro

The important part:

blocks are skipped during serialization/deserialization in the current InnerTree

compact snapshots do not persist the full block history

legacy states that contain block history are migrated

legacy block history is drained down to the bounded recent cache

The relevant caps are:

MAX_RECENT_BLOCKS_IN_RAM = 512

MAX_PENDING_BLOCKS = 4096

MAX_PENDING_BLOCK_DISTANCE = 1024

MAX_ACCOUNT_STATE_SNAPSHOT_BYTES = 512 MiB

So instead of allowing the account model to keep every canonical block forever in RAM, the node keeps a bounded recent-block cache and relies on RocksDB for historical reads.

This is not “Rust memory unsafety.” It is more like preventing unbounded logical memory retention in a long-running node.

20. Recent block cache

When a block is remembered in state:

update prev_tip_hash

update tip_height

update tip_hash

set has_tip = true

push block into recent block cache

if recent block cache > 512, drain oldest blocks

The compact-state invariant checker rejects:

recent block cache len > 512

non-contiguous recent block cache

recent block previous_hash linkage failure

recent cache tip mismatch

balance sum overflow

balance sum > MAX_SUPPLY

total_issued_micro > MAX_SUPPLY

rewards_issued_micro > MAX_REWARD_SUPPLY

So the bounded cache is not only trimmed; it is also checked for structural consistency.

21. RSS threshold logging

The account-state code has RSS threshold guards for long-running node visibility:

RSS warning threshold: 512 MB

RSS critical threshold: 1024 MB

RSS emergency threshold: 2048 MB

The current action is observability/logging rather than killing the node. The purpose is to detect growth trends in live operation:

[RESOURCE][MEMORY][RSS_GUARD]

severity=warning

severity=critical

severity=emergency

The resource log also reports:

block height

balances length

recent block cache length

recent block cache cap

pending block count

pending block cap

tip height

has_tip

state_serializes_blocks=false

This helped make memory growth visible while keeping consensus behavior deterministic.

22. Rollback-on-failure for state application

When applying a block, the account tree snapshots live state before making changes:

snapshot_inner = current inner state clone

snapshot_pending = current pending blocks clone

Then it validates the block, checks idempotency, checks linkage, reads the batch, deserializes the batch, checks batch height, dry-runs the block and batch, applies NFT mint/transfer effects, commits compact state, flushes touched balances, and verifies the account column family against state.

If the operation fails:

restore snapshot_inner

restore snapshot_pending

This is important for live nodes because partial application is dangerous. A failed block apply should not leave balances, pending blocks, or NFT side effects half-advanced.

23. Canonical replay and pending block cleanup

During replay to a height, the implementation rebuilds state from canonical blocks/batches and then:

verifies compact state invariants

sets inner state to rebuilt state

retains only pending blocks above the replay height

That prevents old pending blocks at or below the canonical tip from remaining in memory after replay/reorg-style repair.

24. Amount parsing and overflow avoidance

The amount system uses micro-units:

1 Remzar = 100,000,000 micro-units

The UI float conversion is explicitly marked UI-only. Consensus construction uses string parsing.

The string parser rejects:

empty input

input longer than 64 characters

leading + or -

whitespace

scientific notation

multiple decimal points

fractional precision greater than 8 decimals

non-digit characters

u64 overflow

The arithmetic uses checked multiplication and checked addition when scaling.

This matters because blockchain amount parsing bugs are consensus bugs.

25. Supply invariants

The account-state invariant checker validates supply constraints:

sum(balances) must not overflow u64

sum(balances) <= MAX_SUPPLY

total_issued_micro <= MAX_SUPPLY

rewards_issued_micro <= MAX_REWARD_SUPPLY

That makes issuance accounting part of state validation, not only wallet/UI logic.

26. Fault-injection hooks

Many modules include runtime fault-injection hooks:

REMZAR_FAIL_<OPERATION>

Examples include:

keypair generation

keypair validation

Merkle computation

batch hashing

batch signing

batch verification

guardian signing

wallet validation

PQ operations

The point is to test failure paths intentionally. It is easy to write success-path crypto code; the harder part is making sure failure paths clean up state, reject safely, and do not leave stale pending data.

27. Panic containment

Several crypto and parsing operations are wrapped in catch_unwind.

The motivation is:

untrusted bytes should become validation errors

untrusted bytes should not become process-level panics

malformed crypto material should not crash the node

malformed verification material should not bypass cleanup

This is used around:

ML-DSA secret-key parsing

ML-DSA public-key derivation

ML-DSA wallet public-key parsing

batch hashing

batch verification

ML-KEM key/ciphertext parsing

Panic containment is not a substitute for correct libraries, but in a live node it is a useful defensive layer around untrusted input.

28. Why I used Merkle-root signing instead of per-transaction guardian signatures

The main tradeoff is:

Per-transaction guardian signature:

proof is local to each transaction

but overhead = n * 3309 bytes

Merkle-root guardian signature:

one signature commits to the full canonical batch

overhead = 3309 bytes

individual inclusion can be proven with Merkle proofs

transaction semantics still validated separately

In my case, the Merkle-root model was the practical solution because the guardian/block layer needs to commit to the batch, not necessarily duplicate the same signature object per transaction.

This is similar to the general blockchain idea that consensus signs/commits to blocks, while transactions have their own validity rules.

29. What I think I solved

I would describe the implementation results this way:

I did not solve post-quantum signature size by making ML-DSA smaller.

I solved the block/batch-level overhead problem by changing where the ML-DSA signature is placed.

I did not rely on one ML-DSA-65 signature per transaction at the guardian/block layer.

I use one ML-DSA-65 signature over a 64-byte Merkle commitment for the full batch/block.

I did not rely on unbounded state structures.

I added caps, compaction, recent-cache limits, replay cleanup, queue trimming, bounded pending maps, and RSS visibility.

I did not treat ML-KEM as identity.

I used ML-KEM-768 as a PQ shared-secret establishment layer inside a broader version/genesis/admission/sync flow.

I did not trust serialized sizes by estimation only.

I check the actual bytes that will be stored or transmitted.

I did not assume malformed crypto input is harmless.

I added exact length checks, parse checks, panic containment, all-zero checks, all-0xFF checks, validation budgets, and failure cleanup.

I did not treat the wallet address as a raw public key.

I bound a compact 64-byte BLAKE3-XOF commitment to the ML-DSA-65 public key.

I did not keep all historical blocks in account-state memory.

I moved to compact persisted state and bounded recent-block caching.

I did not treat TPS as a signature-only benchmark.

I measured hashing, transaction construction, serialization, Merkle aggregation, state application, block encode/decode, ML-DSA signing, ML-DSA verification, wallet signing, and wallet generation separately.

The overall result is that the post-quantum signature layer is no longer the linear per-transaction bottleneck in the block/guardian path. Signature work becomes fixed per block:

traditional guardian-per-transaction model:

signature_work = O(N)

signature_bytes = N * 3309

Remzar batch-signed model:

signature_work = O(1) per block

signature_bytes = 3309 per block

per_transaction_signature_cost = 3309 / N

For example, at 10,000 transactions per block:

naive guardian-per-transaction ML-DSA-65 signature bytes:

10,000 * 3309 = 33,090,000 bytes

Remzar batch-root ML-DSA-65 signature bytes:

3309 bytes

signature bytes avoided at guardian/block layer:

33,090,000 - 3309 = 33,086,691 bytes

So I did not reduce ML-DSA-65 itself. I changed the blockchain placement of the signature so one post-quantum signature commits to the full canonical transaction batch through a 64-byte Merkle root.

30. Conservative TPS and block-capacity results

I also ran a local TPS benchmark suite to measure the pipeline pieces separately.

Important qualification: these are conservative local benchmark results from the Rust test harness. They were not presented as a final release-optimized cargo build --release network benchmark, and they were not measured across a full distributed public network with real propagation, mempool contention, RocksDB write pressure, peer churn, and production hardware variation. Because of that, I treat these numbers as conservative CPU-side.

The block model tested is the actual batch-signing model:

1. Build transactions.

2. Serialize transactions.

3. Hash transactions into 64-byte transaction IDs.

4. Compute a 64-byte Merkle root over the transaction IDs.

5. Sign the Merkle root once with ML-DSA-65.

6. Verify the block with one ML-DSA-65 verification.

The consequence is:

signature cost per block = O(1)

signature cost per transaction = O(1/N)

So ML-DSA-65 signing is not measured as one signature per transaction. TPS is governed by the full transaction pipeline: build, serialize, hash, Merkle aggregation, block encode/decode, state application, storage, networking, and consensus settings.

Hashing and core primitives

The local benchmark measured:

Raw BLAKE3-XOF(64): 1,890,688 ops/sec

~56,720,640 ops / 30s

Data hash via postcard: 189,876 ops/sec

~5,696,280 ops / 30s

Batch hash structs: 831,031 ops/sec

~24,930,930 ops / 30s

Header hash: 669,052 ops/sec

~20,071,560 ops / 30s

Truncated hash: 379,217 ops/sec

~11,376,510 ops / 30s

This shows that the 64-byte hash/commitment layer has large headroom compared with the conservative public TPS target.

Merkle and block-root pipeline

The active Merkle consensus cap in the test was:

requested: 200,000

measured: 50,000

cap: 50,000

The important safety behavior is that above-cap Merkle input is rejected. It does not hang, and it does not panic.

Measured Merkle results:

Merkle root 64B:

50,000 txids in 0.065s

770,131 tx/sec

~23,103,930 tx / 30s

Merkle sweep:

50,000 txids in 0.026s

1,921,333 tx/sec

~57,639,990 tx / 30s

This means Merkle aggregation is not the limiting factor for a practical 30-second block target.

Transaction pipeline

The transaction pipeline measured:

Tx Build + Serialize:

131,595 tx/sec

~3,947,850 tx / 30s

Tx ID Hash hex:

345,387 tx/sec

~10,361,610 tx / 30s

Build Tx:

376,319 tx/sec

~11,289,570 tx / 30s

Serialize:

237,670 tx/sec

~7,130,100 tx / 30s

Tx Hash 64B:

1,827,198 tx/sec

~54,815,940 tx / 30s

State Apply:

614,052 tx/sec

~18,421,560 tx / 30s

These are local implementation measurements, but they show that the basic CPU-side transaction construction, serialization, hashing, and state-apply paths are far above a conservative 250+ TPS public target.

Block encode/decode

The block encode/decode test used 10,000 transaction IDs per block.

Block Encode:

77 blocks/sec

10,000 txids/block

~770,000 txids/sec

~23,100,000 txids / 30s

Block Decode:

19 blocks/sec

10,000 txids/block

~190,000 txids/sec

~5,700,000 txids / 30s

The encoded block-like object in the test was:

10,000 txids/block

640,131 bytes/block

That means the measured encode/decode path has substantial headroom relative to a practical 30-second block interval.

Batch-signed block assembly model

The most important benchmark is the batch-signed block assembly model, because it matches the design: many transactions, one Merkle root, one ML-DSA-65 block signature.

Measured with 10,000 transactions per block:

Block tx serialize + hash:

10,000 txs in 0.059s

168,870 tx/sec

~5,066,100 tx / 30s

Block Merkle root:

10,000 txids in 0.006s

1,683,162 tx/sec

~50,494,860 tx / 30s

One ML-DSA-65 block signature:

1 sign in 0.007s

One ML-DSA-65 block verification:

1 verify in 0.005s

Effective block assembly before fixed one-signature cost:

10,000 txs / 0.065s

153,472 tx/sec

~4,604,160 tx / 30s

This is the key result:

The local block assembly path is far above the conservative public TPS target.

The ML-DSA-65 block signature is a fixed per-block cost.

The signature cost does not scale linearly with transaction count.

Signature layer

The signature layer was measured separately.

The required signing rate for the block model is:

1 ML-DSA-65 block signature per 30-second block

Measured primitive path:

ML-DSA-65 sign microbench:

45 signatures/sec

~1,350 signatures / 30s

ML-DSA-65 verify microbench:

136 verifications/sec

~4,080 verifications / 30s

Measured wallet path:

Wallet sign path:

~0.50 signatures/sec

~15 signatures / 30s

Wallet verify path:

139 verifications/sec

~4,170 verifications / 30s

The wallet signing path is slower because it includes wallet secret decrypt/validation behavior. That is expected. It is not the same as the raw keypair primitive path.

The important point is:

Required block signing rate:

1 signature / 30s

Measured raw ML-DSA-65 signing capacity:

~1,350 signatures / 30s

Measured wallet signing path:

~15 signatures / 30s

Both are above the required block-signing rate for a one-signature-per-block model. Therefore, ML-DSA-65 signing is not the active TPS bottleneck in the batch-signed block model.

Conservative public target

The conservative public target I would state is:

~250+ TPS sustained target

30-second block interval

~2 MB practical block target

one ML-DSA-65 post-quantum signature per block

batch-signed Merkle-root block verification

A 30-second block window gives:

7,500 tx / 30s = 250 TPS

8,000 tx / 30s ≈ 267 TPS

10,000 tx / 30s ≈ 333 TPS

15,000 tx / 30s = 500 TPS

So depending on configured block size, transaction payload size, mempool policy, network propagation, RocksDB write path, and consensus parameters, the practical target range can be discussed as:

conservative public target: ~250+ TPS

higher configured target: ~333–500 TPS

But I would describe this carefully. The local CPU-side benchmark capacity is much higher than that, but production TPS should be determined by configured block size, block interval, mempool rules, RocksDB write path, network propagation, validation policy, consensus settings, and release hardware.

Why the TPS result matters for PQC

This matters because the post-quantum signature layer is usually assumed to be the bottleneck.

In a per-transaction signature model:

N transactions require N signature verifications.

Signature cost grows linearly with transaction count.

In Remzar’s batch-signed model:

N transactions are committed into one Merkle root.

The Merkle root is signed once.

The block requires one ML-DSA-65 verification.

Transaction inclusion is protected by the Merkle commitment.

Transaction validity is handled by the validation layers.

So for the guardian/block layer:

signature work per block = 1

signature verification per block = 1

signature bytes per block = 3309

That is the practical reason the system can target hundreds of TPS while still using ML-DSA-65 at the block/guardian layer.

31. Open question for the PQC community

The main area where I would appreciate technical feedback is long-running memory/resource behavior in a live PQC blockchain node.

I have been actively hunting down possible memory leaks and unbounded retention paths in the Remzar codebase. I believe I have addressed the major issues by adding bounded queues, capped pending request maps, compact account-state snapshots, bounded recent-block caching, replay cleanup, PQ peer-state cleanup, serialized-size limits, and RSS threshold visibility. However, I do not want to overclaim that every possible leak or retention edge case is solved.

My question to the community is:

For a live Rust blockchain node using ML-DSA-65, ML-KEM-768, Merkle batching, RocksDB storage, P2P sync, and long-running validator/miner processes, what memory-retention or resource-exhaustion patterns should I be most careful to audit further?

Areas I am especially interested in reviewing are:

stale P2P pending requests

stale PQ handshake/session state

peer side-table growth

mempool growth

block/batch sync queues

Merkle proof allocation bounds

RocksDB read/write buffering behavior

account-state snapshot size

recent-block cache retention

wallet/key material lifetime

panic/failure paths that may skip cleanup

reorg/replay paths that may leave stale state

long-running RSS growth under peer churn

I would welcome feedback on practical audit strategies, Rust tooling, benchmark patterns, or design changes that the PQC/blockchain community would recommend for proving that the node remains memory-bounded over long runtimes.

32. Summary

Remzar is now a live Rust Layer-1 blockchain node chain with post-quantum components integrated across the stack.

The implementation uses:

ML-DSA-65 for wallet/keypair/guardian/batch/block signatures

ML-KEM-768 for P2P post-quantum key establishment

BLAKE3-XOF(64) for 64-byte canonical hashes and commitments

Merkle-root batch signing to avoid repeated 3309-byte guardian signatures

Argon2id + AES-256-GCM for encrypted ML-DSA-65 secret storage

strict wallet/public-key/address binding

bounded Merkle proof sizes and proof depth

bounded batch sizes and serialized storage sizes

bounded P2P pending requests and sync queues

bounded account-state recent block cache

compact account-state snapshots instead of serializing all block history

RSS threshold logging for live memory observability

rollback-on-failure for block application

version/genesis/PQ readiness gates before sync

replay protection for PQ nonces

panic containment and exact byte-length checks around cryptographic parsing

conservative local TPS benchmarking across hashing, Merkle aggregation, block assembly, state apply, encoding/decoding, and signature paths

The key TPS finding is:

Remzar does not require one ML-DSA-65 signature per transaction at the block/guardian layer.

Remzar signs one 64-byte Merkle root per block/batch.

The measured local block assembly model reached ~153,472 tx/sec before the fixed one-signature block cost.

That equals ~4,604,160 tx / 30s of local CPU-side assembly capacity.

The conservative public target remains ~250+ TPS, with higher configured targets depending on block size, transaction size, network propagation, RocksDB write path, and release hardware.

The signature-layer finding is:

Required block signing rate:

1 ML-DSA-65 signature per 30 seconds

Measured ML-DSA-65 primitive signing:

~45 signatures/sec

~1,350 signatures / 30s

Measured ML-DSA-65 primitive verification:

~136 verifications/sec

~4,080 verifications / 30s

Measured wallet signing path:

~0.50 signatures/sec

~15 signatures / 30s

Measured wallet verification path:

~139 verifications/sec

~4,170 verifications / 30s

That means the ML-DSA-65 signature layer has enough headroom for the one-signature-per-block model. In this design, the active throughput constraints move to practical blockchain engineering: block size, mempool policy, storage, network propagation, validation rules, consensus settings, and release hardware.

My practical takeaway is that FIPS 203/204 integration into blockchain software is possible, but the hard part is not just calling ML-DSA or ML-KEM. The hard part is making the whole node survive the operational consequences of larger keys, larger signatures, larger wire objects, untrusted peers, sync churn, reboot repair, storage migration, long-running memory growth, and honest TPS reporting.

I am sharing this as an implementation data point and would welcome technical review from the PQC community.

Project links:

https://www.remzar.com/
https://github.com/remzarchain/remzar

Reply all

Reply to author

Forward

0 new messages