Dear PQC Forum,
I wanted to share a concrete implementation report from a live blockchain system, since recent discussions have touched on post-quantum signatures, ML-DSA / ML-KEM deployment, blockchain throughput, larger message sizes, Merkle commitments, and whether these topics are still theoretical or already being implemented.
I am the developer of Remzar, a Rust Layer-1 blockchain. The node chain is live. This is not a proposal, not only a whitepaper, and not only a toy demo. I implemented the post-quantum pieces directly into the wallet layer, keypair layer, transaction batch layer, block metadata layer, block validation layer, and P2P handshake/sync layer.
Source code:
https://github.com/remzarchain/remzar
The implementation uses:
FIPS 204 / ML-DSA-65 through fips204::ml_dsa_65
FIPS 203 / ML-KEM-768 through fips203::ml_kem_768
BLAKE3 XOF with 64-byte output as the canonical chain hash / commitment width
ML-DSA-65 for wallet, guardian, block, and batch/Merkle-root signatures
ML-KEM-768 for post-quantum P2P session establishment
Merkle-root batching to reduce repeated ML-DSA-65 signature overhead
RocksDB-backed chain/state storage
bounded in-memory state retention
bounded P2P queues and pending request tables
defensive serialization, size, timestamp, replay, and panic-containment guardrails
I am not claiming that Remzar itself is NIST-certified, FIPS-validated, or endorsed by NIST. The accurate claim is that Remzar is a live Rust Layer-1 blockchain implementation that integrates implementations of the NIST-standardized ML-DSA and ML-KEM algorithm families into a working node chain.
1. Why this matters
The practical problem I ran into is that post-quantum cryptography is not just an algorithm-selection issue. In a blockchain/node system, the real problems are:
large signatures
large public keys
large private keys
large P2P handshake material
larger serialized blocks/batches
untrusted peer input
resource exhaustion
malformed wire messages
state growth
sync queues
pending request leaks
replay attacks
timestamp skew
node reboot repair
chain identity validation
deterministic consensus validation
most important: memory leaks
The core cryptographic primitive can be correct, but the node can still fail if the surrounding infrastructure is not bounded.
So the implementation work became two things at once:
1. Integrate ML-DSA-65 and ML-KEM-768.
2. Build guardrails so live nodes can survive malformed data, large PQC objects, network churn, sync retries, and long-running state growth.
2. ML-DSA-65 concrete byte sizes
The code uses fips204::ml_dsa_65.
The concrete ML-DSA-65 byte sizes used in the Remzar wallet and signature layers are:
ML-DSA-65 secret key: 4032 bytes
ML-DSA-65 public key: 1952 bytes
ML-DSA-65 signature: 3309 bytes
This matters because these are not small fields in a blockchain. A 3309-byte signature is very different from a classical 64-byte Ed25519-style signature when placed into transactions, blocks, P2P messages, or archived state.
The wallet file treats these values as consensus-relevant engineering facts, not comments only:
secret key bytes = ml_dsa_65::SK_LEN
public key bytes = ml_dsa_65::PK_LEN
signature bytes = ml_dsa_65::SIG_LEN
3. Wallet identity: compact address from a large PQ public key
A full ML-DSA-65 public key is 1952 bytes. I did not use the raw public key as the wallet address. Instead, the wallet address is a 64-byte commitment to the public key:
wallet_address = "r" || hex(BLAKE3-XOF(64)(ML-DSA-65 public_key_bytes))
This gives:
1 prefix character: "r"
128 lowercase hex characters
129 total characters
The math is:
BLAKE3-XOF(64) output = 64 bytes = 512 bits
64 bytes encoded as hex = 128 hex characters
"r" + 128 chars = 129-character Remzar wallet address
The design goal was:
Keep wallet identifiers compact and fixed-format.
Keep the full ML-DSA-65 public key available for verification.
Bind the wallet address to the public key.
Reject malformed or mismatched wallet/public-key pairs.
Use one canonical wallet format across the chain.
The validator enforces:
address length == 129
prefix == "r"
body length == 128
body characters are lowercase hex only
address matches BLAKE3-XOF(64)(public_key_bytes)
public key bytes parse as ML-DSA-65 public key bytes
This avoids placing a 1952-byte public key everywhere an address is needed, while still keeping a cryptographic binding between the compact address and the actual ML-DSA-65 public key.
4. ML-DSA-65 keypair hardening
The ML-DSA-65 keypair wrapper stores:
secret_bytes: [u8; ml_dsa_65::SK_LEN],
public_bytes: [u8; ml_dsa_65::PK_LEN],
The wrapper does several things beyond just calling the library:
redacts key material in Debug output
zeroizes secret and public bytes on drop
validates public key parsing
validates secret key parsing
derives public key bytes from secret key bytes
checks that stored public key == derived public key
checks that parsed public key serializes canonically
rejects malformed secret/public material
contains panics around key parsing and public-key derivation
uses a global panic-hook lock while suppressing panic noise during rejected key parsing
enforces a hard validation budget of 750 ms
supports fault-injection hooks through REMZAR_FAIL_* environment variables
The keypair validation flow is approximately:
start timer
parse public key bytes
reject if budget exceeded
parse secret key bytes
reject if budget exceeded
derive public key from secret key
reject if budget exceeded
compare stored public key to derived public key
compare parsed public key serialization to derived public key
reject if mismatch
This was important because untrusted key material should not be able to crash or stall a live node. The code treats malformed key bytes as invalid input, not as a reason to panic the process.
5. Secret-key storage: Argon2id + AES-256-GCM
The wallet stores encrypted raw ML-DSA-65 secret bytes, not plaintext private keys.
The storage pattern is:
plaintext = raw ML-DSA-65 secret key bytes
plaintext length = 4032 bytes
key = Argon2id(passphrase, salt)
encrypted_blob = salt || nonce || AES-256-GCM(ciphertext || tag)
The constants used are:
AES-256 key: 32 bytes
salt: 16 bytes
nonce: 12 bytes
GCM tag: 16 bytes
raw ML-DSA-65 secret: 4032 bytes
minimum encrypted blob for raw ML-DSA-65 secret: 4076 bytes
I added storage and input guardrails:
passphrase must not be empty
passphrase cap: 16 KiB
plaintext cap: 1 MiB
encrypted blob cap: 16 MiB
config must support at least 4032-byte raw ML-DSA-65 secrets
config must support the encrypted blob minimum
encrypted input must be at least salt + nonce + GCM tag
encrypted input is split only after layout validation
This is one of the areas where PQC changes normal assumptions. A 4032-byte private key is large enough that secret handling, storage layout, and configuration bounds must be explicit.
6. The ML-DSA-65 signature-size problem
The biggest practical blockchain issue is signature size.
For ML-DSA-65:
signature length = 3309 bytes
If a blockchain signs every transaction at a guardian/batch layer with one ML-DSA-65 signature each, the signature overhead is:
overhead(n) = n * 3309 bytes
For example:
100 transactions -> 330,900 bytes of signatures
1,000 transactions -> 3,309,000 bytes of signatures
10,000 transactions -> 33,090,000 bytes of signatures
That is before considering transaction payloads, public keys, P2P framing, storage, indexes, or archival overhead.
So my design does not try to “compress ML-DSA.” Instead, it moves the guardian/batch signature boundary to a compact commitment.
7. Merkle-root batch signing
For the batch/guardian layer, Remzar signs one 64-byte Merkle root instead of signing each transaction separately at that layer.
The flow is:
1. Serialize each transaction canonically.
2. Hash each serialized transaction with BLAKE3-XOF(64).
3. Build a Merkle tree from the 64-byte leaves.
4. Compute one 64-byte Merkle root.
5. Sign that Merkle root once with ML-DSA-65.
6. Store the 3309-byte guardian signature in the batch/block metadata.
Mathematically:
leaf_i = BLAKE3-XOF(64)(serialize(tx_i))
parent = BLAKE3-XOF(64)(left_64 || right_64)
root = MerkleRoot(leaf_0, leaf_1, ..., leaf_n)
guardian_signature = ML-DSA-65.Sign(sk, root)
Signature overhead becomes:
overhead_batch = 3309 bytes
instead of:
overhead_naive = n * 3309 bytes
The saving is:
saving(n) = (n - 1) * 3309 bytes
Examples:
n = 100:
saving = 99 * 3309 = 327,591 bytes
n = 1,000:
saving = 999 * 3309 = 3,305,691 bytes
n = 10,000:
saving = 9,999 * 3309 = 33,086,691 bytes
This is not a claim that Merkle-root signing replaces all transaction authorization semantics. The security boundary is:
The guardian/block signer signs a commitment to the exact serialized transaction set.
The Merkle root commits to the transaction ordering and contents.
Transaction validity, account authorization, double-spend detection, reward rules, committee rules, and block validation are handled by the validation layers.
8. Merkle tree details
Remzar uses 64-byte BLAKE3-XOF outputs as canonical hashes.
Leaf:
leaf = BLAKE3-XOF(64)(transaction_bytes)
Parent:
parent = BLAKE3-XOF(64)(left_64 || right_64)
Odd node count:
If a level has an odd number of nodes, duplicate the last node.
Empty batch:
If the batch is empty, inject a deterministic dummy leaf marker.
Merkle proof structure:
MerkleProof {
transaction_hash: Hash64,
sibling_hashes: Vec<Hash64>,
path: Vec<bool>,
merkle_root: Hash64,
}
Merkle proof guardrails:
encoded proof cap: 256 KiB
absolute proof depth cap: 4096
derived proof depth cap based on MAX_BATCH_ITEMS
sibling_hashes.len() must equal path.len()
hash count must not exceed MAX_BATCH_ITEMS
tree levels must not be empty
final Merkle level must contain exactly one root
This is important because Merkle proofs are externally supplied data in many systems. A proof verifier should not be an unbounded allocation or deep-recursion attack surface.
9. Batch-signature verification order
One subtle implementation detail: the batch verifier checks cheap invalid conditions before doing expensive hashing/Merkle work.
The verification order is:
1. Check signature length.
2. Check batch item count and byte bounds.
3. Hash each transaction.
4. Compute the Merkle root.
5. Convert signature bytes to fixed-size ML-DSA-65 signature array.
6. Verify ML-DSA-65 signature over the root.
The reason is simple:
malformed signature bytes should not force expensive batch hashing
oversized batch input should not force Merkle construction
invalid structure should fail before expensive cryptographic work
This is one of the practical lessons from deploying large PQC objects in node software: validation order matters.
10. Guardian signatures and block signatures
The guardian signature system uses the same model:
batch data -> transaction hashes -> Merkle root -> ML-DSA-65 signature
A block can also sign over serialized metadata and batch key data. The block signing path:
1. Serialize metadata + batch key.
2. Sign the serialized signing payload through GuardianSignature.
3. Enforce signature length == ml_dsa_65::SIG_LEN.
4. Copy into fixed [u8; ml_dsa_65::SIG_LEN].
5. Embed into metadata.
6. Recompute the block hash after embedding the signature.
This makes the block hash commit to the guardian signature and metadata together.
Block signature verification:
1. Serialize the same signing payload.
2. Verify GuardianSignature over that payload.
3. Recompute and compare block hash.
11. Block metadata structure
Block metadata contains:
index: u64
timestamp: u64
previous_hash: [u8; 64]
merkle_root: [u8; 64]
guardian_signature: [u8; ml_dsa_65::SIG_LEN]
puzzle_proof: Option<BlockPuzzleProof>
size: u64
The important thing here is that the guardian signature is a fixed-size ML-DSA-65 signature array, not an unbounded vector in metadata.
Structural metadata validation rejects:
genesis metadata with nonzero previous_hash
non-genesis metadata with zero previous_hash
zero Merkle root
all-0xFF Merkle root
all-0xFF previous_hash
all-zero guardian signature on non-genesis metadata
all-0xFF guardian signature
metadata size < 64
metadata size > MAX_BLOCK_SIZE
non-genesis merkle_root == previous_hash
timestamps before the project lower bound
genesis metadata with puzzle proof
puzzle proof height mismatch
puzzle proof previous-hash mismatch
This prevents obvious corrupt/sentinel values from entering consensus state.
12. Canonical 64-byte hash design
Remzar uses BLAKE3-XOF(64) as the canonical hash/commitment width.
That gives:
64 bytes = 512 bits
hex encoding = 128 lowercase hex characters
This same width is used for:
block hash
previous hash
Merkle root
genesis hash
wallet commitment
batch key digest
puzzle commitment
The benefit is consistency. The node does not mix 32-byte and 64-byte consensus identifiers internally except where explicit legacy compatibility exists.
13. Genesis and chain identity
The genesis block uses 64-byte canonical hashes.
Genesis validation requires:
prev_hash == [0u8; 64]
prev_hash != [0xFFu8; 64]
merkle_root != [0u8; 64]
merkle_root != [0xFFu8; 64]
The genesis preimage includes a zeroed guardian-signature-sized field:
ZERO_GUARDIAN_SIGNATURE: [u8; ml_dsa_65::SIG_LEN]
This keeps the genesis preimage structurally aligned with normal block hashing while remaining deterministic.
The P2P version handshake also carries the expected genesis hash. Peers are not admitted only because they speak the protocol; they must match the expected protocol version and genesis identity.
14. Transaction batch finalization
The batch finalization path does two size checks:
1. Sum of transaction serialized sizes.
2. Actual serialized bytes that will be stored/transmitted.
The second check matters. Logical size estimates can be wrong if serialization overhead or optional fields grow. The implementation verifies the actual postcard-serialized batch bytes before finalization.
The finalization path:
validate MAX_BLOCK_SIZE conversion
reject if total_size() > MAX_BLOCK_SIZE
serialize actual batch for storage
reject if actual serialized batch bytes > MAX_BLOCK_SIZE
sign batch with ML-DSA-65 guardian signature
compute 64-byte Merkle root
construct BlockMetadata
This prevents an attacker or bug from slipping oversized serialized data through a logical-size-only check.
15. Block serialization and padding safety
The block storage path also checks actual serialized bytes:
serialize block with postcard
reject if serialized block bytes > MAX_BLOCK_SIZE
On deserialization, the code:
validates storage length first
tries strict postcard decode
falls back to padded postcard decode only if needed
rejects non-zero trailing bytes
normalizes and validates after decode
can return both actual_size_bytes and stored_size_bytes
rejects actual_size_bytes == 0
The reason this exists is that long-running chain software eventually encounters old data layouts, padded records, migrations, or corrupted bytes. The node should distinguish:
actual serialized payload size
stored RocksDB byte length
trailing padding
non-zero trailing corruption
16. ML-KEM-768 P2P handshake
For P2P session establishment, I implemented an ML-KEM-768 handshake using fips203::ml_kem_768.
The suite is identified as:
suite_id: 0x0301
suite_name: "ML-KEM-768/FIPS203-0.4.3"
The P2P PQ constants include:
shared secret length: 32 bytes
maximum PQ wire payload: 16 KiB
nonce length: 32 bytes
default replay filter capacity: 4096
minimum replay filter capacity: 16
maximum replay filter capacity: 65,536
default message age window: 120 seconds
maximum tolerated future clock skew: 10 seconds
hard maximum configured message age: 10 minutes
The offer message is:
PqKemOffer {
suite_id: u16,
created_at_unix_secs: u64,
nonce: Vec<u8>,
ek: Vec<u8>,
}
The accept message is:
PqKemAccept {
suite_id: u16,
offer_nonce: Vec<u8>,
created_at_unix_secs: u64,
ct: Vec<u8>,
}
Offer validation checks:
suite id matches expected suite
timestamp is fresh
timestamp is not too far in the future
nonce length == 32
nonce is not all zero
encapsulation key length == EK_LEN
encapsulation key is not all zero
encapsulation key parses as ML-KEM-768 encapsulation key
Accept validation checks:
suite id matches expected suite
timestamp is fresh
timestamp is not too far in the future
offer_nonce length == 32
offer_nonce equals the expected original offer nonce
ciphertext length == CT_LEN
ciphertext is not all zero
ciphertext parses as ML-KEM-768 ciphertext
The replay filter is:
HashSet<[u8; 32]> for seen nonces
VecDeque<[u8; 32]> for eviction order
capacity clamped to [16, 65,536]
duplicate nonce => ReplayDetected
After a session key is used to mark the peer PQ-ready, the code zeroizes the temporary session key.
This matters because ML-KEM by itself is key establishment, not identity. I treat it as one part of a broader P2P admission flow that also includes protocol version, genesis hash, peer admission, pending request tracking, and sync state.
17. P2P admission and sync after PQ readiness
The node does not immediately sync from a peer just because the peer connects.
The high-level flow is:
1. Peer connects.
2. Version handshake validates protocol version, services, user agent shape, chain height, and genesis hash.
3. Connection/admission guards run.
4. PQ ML-KEM handshake runs.
5. Peer is marked PQ-ready.
6. Queued sync target resumes only after PQ readiness.
If the peer fails a protocol rule, the node:
reports misbehavior
cleans pending requests for that peer
clears PQ peer state
updates peerbook failure state
disconnects the peer
The implementation also checks response/request peer mismatch. If a response arrives from a peer different from the peer associated with the request ID, the node treats it as a protocol violation.
18. P2P memory/resource guardrails
This is an area I had to address directly because live P2P nodes can slowly leak memory through queues, pending request maps, retry tables, and peer side tables.
The sync builder defines caps such as:
MAX_PENDING_VERSIONS = 512
MAX_PENDING_PQ = 512
MAX_PENDING_BLOCKS = 512
MAX_PENDING_BATCHES = 512
MAX_BLOCK_QUEUE = 1024
MAX_BATCH_QUEUE = 1024
MAX_HEIGHT_POLL_PEERS = 128
MAX_AUTODIAL_PEERS_PER_TICK = 32
MAX_AUTODIAL_ADDRS_PER_PEER = 3
MAX_MULTIADDR_BYTES = 256
MAX_PEERBOOK_KAD_SEED_PEERS = 256
MAX_KAD_ADDRS_PER_PEER = 8
MAX_TRACKED_DIAL_ATTEMPTS = 4096
MAX_RUNTIME_PEER_SIDE_TABLES = 8192
There is housekeeping to:
trim block_queue to MAX_BLOCK_QUEUE
trim batch_queue to MAX_BATCH_QUEUE
prune old dial-attempt timestamps
clear advisory dial tracking if still too large
clear pq_ready_peers if the side table exceeds cap
clear admitted_peers if the side table exceeds cap
clear peer_ip if the side table exceeds cap
The PQ state cleanup also removes:
pq_initiators[peer]
pq_ready_peers[peer]
pending_pq entries for that peer
This directly addresses a common live-node issue: stale handshake state and pending request maps can grow forever if failed peers are not cleaned up.
19. Account-state memory growth and compact snapshots
I also addressed memory/state growth in the account model.
Originally, it is easy for an account-tree structure to retain too much historical block data in memory or serialize too much historical state. The current design separates compact chain state from historical blocks.
The in-memory InnerTree tracks:
balances
tip_height
tip_hash
prev_tip_hash
has_tip
recent blocks cache
total_issued_micro
rewards_issued_micro
reserved_issued_micro
The compact persisted AccountStateSnapshot tracks:
version
balances
tip_height
tip_hash
prev_tip_hash
has_tip
total_issued_micro
rewards_issued_micro
reserved_issued_micro
The important part:
blocks are skipped during serialization/deserialization in the current InnerTree
compact snapshots do not persist the full block history
legacy states that contain block history are migrated
legacy block history is drained down to the bounded recent cache
The relevant caps are:
MAX_RECENT_BLOCKS_IN_RAM = 512
MAX_PENDING_BLOCKS = 4096
MAX_PENDING_BLOCK_DISTANCE = 1024
MAX_ACCOUNT_STATE_SNAPSHOT_BYTES = 512 MiB
So instead of allowing the account model to keep every canonical block forever in RAM, the node keeps a bounded recent-block cache and relies on RocksDB for historical reads.
This is not “Rust memory unsafety.” It is more like preventing unbounded logical memory retention in a long-running node.
20. Recent block cache
When a block is remembered in state:
update prev_tip_hash
update tip_height
update tip_hash
set has_tip = true
push block into recent block cache
if recent block cache > 512, drain oldest blocks
The compact-state invariant checker rejects:
recent block cache len > 512
non-contiguous recent block cache
recent block previous_hash linkage failure
recent cache tip mismatch
balance sum overflow
balance sum > MAX_SUPPLY
total_issued_micro > MAX_SUPPLY
rewards_issued_micro > MAX_REWARD_SUPPLY
So the bounded cache is not only trimmed; it is also checked for structural consistency.
21. RSS threshold logging
The account-state code has RSS threshold guards for long-running node visibility:
RSS warning threshold: 512 MB
RSS critical threshold: 1024 MB
RSS emergency threshold: 2048 MB
The current action is observability/logging rather than killing the node. The purpose is to detect growth trends in live operation:
[RESOURCE][MEMORY][RSS_GUARD]
severity=warning
severity=critical
severity=emergency
The resource log also reports:
block height
balances length
recent block cache length
recent block cache cap
pending block count
pending block cap
tip height
has_tip
state_serializes_blocks=false
This helped make memory growth visible while keeping consensus behavior deterministic.
22. Rollback-on-failure for state application
When applying a block, the account tree snapshots live state before making changes:
snapshot_inner = current inner state clone
snapshot_pending = current pending blocks clone
Then it validates the block, checks idempotency, checks linkage, reads the batch, deserializes the batch, checks batch height, dry-runs the block and batch, applies NFT mint/transfer effects, commits compact state, flushes touched balances, and verifies the account column family against state.
If the operation fails:
restore snapshot_inner
restore snapshot_pending
This is important for live nodes because partial application is dangerous. A failed block apply should not leave balances, pending blocks, or NFT side effects half-advanced.
23. Canonical replay and pending block cleanup
During replay to a height, the implementation rebuilds state from canonical blocks/batches and then:
verifies compact state invariants
sets inner state to rebuilt state
retains only pending blocks above the replay height
That prevents old pending blocks at or below the canonical tip from remaining in memory after replay/reorg-style repair.
24. Amount parsing and overflow avoidance
The amount system uses micro-units:
1 Remzar = 100,000,000 micro-units
The UI float conversion is explicitly marked UI-only. Consensus construction uses string parsing.
The string parser rejects:
empty input
input longer than 64 characters
leading + or -
whitespace
scientific notation
multiple decimal points
fractional precision greater than 8 decimals
non-digit characters
u64 overflow
The arithmetic uses checked multiplication and checked addition when scaling.
This matters because blockchain amount parsing bugs are consensus bugs.
25. Supply invariants
The account-state invariant checker validates supply constraints:
sum(balances) must not overflow u64
sum(balances) <= MAX_SUPPLY
total_issued_micro <= MAX_SUPPLY
rewards_issued_micro <= MAX_REWARD_SUPPLY
That makes issuance accounting part of state validation, not only wallet/UI logic.
26. Fault-injection hooks
Many modules include runtime fault-injection hooks:
REMZAR_FAIL_<OPERATION>
Examples include:
keypair generation
keypair validation
Merkle computation
batch hashing
batch signing
batch verification
guardian signing
wallet validation
PQ operations
The point is to test failure paths intentionally. It is easy to write success-path crypto code; the harder part is making sure failure paths clean up state, reject safely, and do not leave stale pending data.
27. Panic containment
Several crypto and parsing operations are wrapped in catch_unwind.
The motivation is:
untrusted bytes should become validation errors
untrusted bytes should not become process-level panics
malformed crypto material should not crash the node
malformed verification material should not bypass cleanup
This is used around:
ML-DSA secret-key parsing
ML-DSA public-key derivation
ML-DSA wallet public-key parsing
batch hashing
batch verification
ML-KEM key/ciphertext parsing
Panic containment is not a substitute for correct libraries, but in a live node it is a useful defensive layer around untrusted input.
28. Why I used Merkle-root signing instead of per-transaction guardian signatures
The main tradeoff is:
Per-transaction guardian signature:
proof is local to each transaction
but overhead = n * 3309 bytes
Merkle-root guardian signature:
one signature commits to the full canonical batch
overhead = 3309 bytes
individual inclusion can be proven with Merkle proofs
transaction semantics still validated separately
In my case, the Merkle-root model was the practical solution because the guardian/block layer needs to commit to the batch, not necessarily duplicate the same signature object per transaction.
This is similar to the general blockchain idea that consensus signs/commits to blocks, while transactions have their own validity rules.
29. What I think I solved
I would describe the implementation results this way:
I did not solve post-quantum signature size by making ML-DSA smaller.
I solved the block/batch-level overhead problem by changing where the ML-DSA signature is placed.
I did not rely on one ML-DSA-65 signature per transaction at the guardian/block layer.
I use one ML-DSA-65 signature over a 64-byte Merkle commitment for the full batch/block.
I did not rely on unbounded state structures.
I added caps, compaction, recent-cache limits, replay cleanup, queue trimming, bounded pending maps, and RSS visibility.
I did not treat ML-KEM as identity.
I used ML-KEM-768 as a PQ shared-secret establishment layer inside a broader version/genesis/admission/sync flow.
I did not trust serialized sizes by estimation only.
I check the actual bytes that will be stored or transmitted.
I did not assume malformed crypto input is harmless.
I added exact length checks, parse checks, panic containment, all-zero checks, all-0xFF checks, validation budgets, and failure cleanup.
I did not treat the wallet address as a raw public key.
I bound a compact 64-byte BLAKE3-XOF commitment to the ML-DSA-65 public key.
I did not keep all historical blocks in account-state memory.
I moved to compact persisted state and bounded recent-block caching.
I did not treat TPS as a signature-only benchmark.
I measured hashing, transaction construction, serialization, Merkle aggregation, state application, block encode/decode, ML-DSA signing, ML-DSA verification, wallet signing, and wallet generation separately.
The overall result is that the post-quantum signature layer is no longer the linear per-transaction bottleneck in the block/guardian path. Signature work becomes fixed per block:
traditional guardian-per-transaction model:
signature_work = O(N)
signature_bytes = N * 3309
Remzar batch-signed model:
signature_work = O(1) per block
signature_bytes = 3309 per block
per_transaction_signature_cost = 3309 / N
For example, at 10,000 transactions per block:
naive guardian-per-transaction ML-DSA-65 signature bytes:
10,000 * 3309 = 33,090,000 bytes
Remzar batch-root ML-DSA-65 signature bytes:
3309 bytes
signature bytes avoided at guardian/block layer:
33,090,000 - 3309 = 33,086,691 bytes
So I did not reduce ML-DSA-65 itself. I changed the blockchain placement of the signature so one post-quantum signature commits to the full canonical transaction batch through a 64-byte Merkle root.
30. Conservative TPS and block-capacity results
I also ran a local TPS benchmark suite to measure the pipeline pieces separately.
Important qualification: these are conservative local benchmark results from the Rust test harness. They were not presented as a final release-optimized cargo build --release network benchmark, and they were not measured across a full distributed public network with real propagation, mempool contention, RocksDB write pressure, peer churn, and production hardware variation. Because of that, I treat these numbers as conservative CPU-side.
The block model tested is the actual batch-signing model:
1. Build transactions.
2. Serialize transactions.
3. Hash transactions into 64-byte transaction IDs.
4. Compute a 64-byte Merkle root over the transaction IDs.
5. Sign the Merkle root once with ML-DSA-65.
6. Verify the block with one ML-DSA-65 verification.
The consequence is:
signature cost per block = O(1)
signature cost per transaction = O(1/N)
So ML-DSA-65 signing is not measured as one signature per transaction. TPS is governed by the full transaction pipeline: build, serialize, hash, Merkle aggregation, block encode/decode, state application, storage, networking, and consensus settings.
Hashing and core primitives
The local benchmark measured:
Raw BLAKE3-XOF(64): 1,890,688 ops/sec
~56,720,640 ops / 30s
Data hash via postcard: 189,876 ops/sec
~5,696,280 ops / 30s
Batch hash structs: 831,031 ops/sec
~24,930,930 ops / 30s
Header hash: 669,052 ops/sec
~20,071,560 ops / 30s
Truncated hash: 379,217 ops/sec
~11,376,510 ops / 30s
This shows that the 64-byte hash/commitment layer has large headroom compared with the conservative public TPS target.
Merkle and block-root pipeline
The active Merkle consensus cap in the test was:
requested: 200,000
measured: 50,000
cap: 50,000
The important safety behavior is that above-cap Merkle input is rejected. It does not hang, and it does not panic.
Measured Merkle results:
Merkle root 64B:
50,000 txids in 0.065s
770,131 tx/sec
~23,103,930 tx / 30s
Merkle sweep:
50,000 txids in 0.026s
1,921,333 tx/sec
~57,639,990 tx / 30s
This means Merkle aggregation is not the limiting factor for a practical 30-second block target.
Transaction pipeline
The transaction pipeline measured:
Tx Build + Serialize:
131,595 tx/sec
~3,947,850 tx / 30s
Tx ID Hash hex:
345,387 tx/sec
~10,361,610 tx / 30s
Build Tx:
376,319 tx/sec
~11,289,570 tx / 30s
Serialize:
237,670 tx/sec
~7,130,100 tx / 30s
Tx Hash 64B:
1,827,198 tx/sec
~54,815,940 tx / 30s
State Apply:
614,052 tx/sec
~18,421,560 tx / 30s
These are local implementation measurements, but they show that the basic CPU-side transaction construction, serialization, hashing, and state-apply paths are far above a conservative 250+ TPS public target.
Block encode/decode
The block encode/decode test used 10,000 transaction IDs per block.
Block Encode:
77 blocks/sec
10,000 txids/block
~770,000 txids/sec
~23,100,000 txids / 30s
Block Decode:
19 blocks/sec
10,000 txids/block
~190,000 txids/sec
~5,700,000 txids / 30s
The encoded block-like object in the test was:
10,000 txids/block
640,131 bytes/block
That means the measured encode/decode path has substantial headroom relative to a practical 30-second block interval.
Batch-signed block assembly model
The most important benchmark is the batch-signed block assembly model, because it matches the design: many transactions, one Merkle root, one ML-DSA-65 block signature.
Measured with 10,000 transactions per block:
Block tx serialize + hash:
10,000 txs in 0.059s
168,870 tx/sec
~5,066,100 tx / 30s
Block Merkle root:
10,000 txids in 0.006s
1,683,162 tx/sec
~50,494,860 tx / 30s
One ML-DSA-65 block signature:
1 sign in 0.007s
One ML-DSA-65 block verification:
1 verify in 0.005s
Effective block assembly before fixed one-signature cost:
10,000 txs / 0.065s
153,472 tx/sec
~4,604,160 tx / 30s
This is the key result:
The local block assembly path is far above the conservative public TPS target.
The ML-DSA-65 block signature is a fixed per-block cost.
The signature cost does not scale linearly with transaction count.
Signature layer
The signature layer was measured separately.
The required signing rate for the block model is:
1 ML-DSA-65 block signature per 30-second block
Measured primitive path:
ML-DSA-65 sign microbench:
45 signatures/sec
~1,350 signatures / 30s
ML-DSA-65 verify microbench:
136 verifications/sec
~4,080 verifications / 30s
Measured wallet path:
Wallet sign path:
~0.50 signatures/sec
~15 signatures / 30s
Wallet verify path:
139 verifications/sec
~4,170 verifications / 30s
The wallet signing path is slower because it includes wallet secret decrypt/validation behavior. That is expected. It is not the same as the raw keypair primitive path.
The important point is:
Required block signing rate:
1 signature / 30s
Measured raw ML-DSA-65 signing capacity:
~1,350 signatures / 30s
Measured wallet signing path:
~15 signatures / 30s
Both are above the required block-signing rate for a one-signature-per-block model. Therefore, ML-DSA-65 signing is not the active TPS bottleneck in the batch-signed block model.
Conservative public target
The conservative public target I would state is:
~250+ TPS sustained target
30-second block interval
~2 MB practical block target
one ML-DSA-65 post-quantum signature per block
batch-signed Merkle-root block verification
A 30-second block window gives:
7,500 tx / 30s = 250 TPS
8,000 tx / 30s ≈ 267 TPS
10,000 tx / 30s ≈ 333 TPS
15,000 tx / 30s = 500 TPS
So depending on configured block size, transaction payload size, mempool policy, network propagation, RocksDB write path, and consensus parameters, the practical target range can be discussed as:
conservative public target: ~250+ TPS
higher configured target: ~333–500 TPS
But I would describe this carefully. The local CPU-side benchmark capacity is much higher than that, but production TPS should be determined by configured block size, block interval, mempool rules, RocksDB write path, network propagation, validation policy, consensus settings, and release hardware.
Why the TPS result matters for PQC
This matters because the post-quantum signature layer is usually assumed to be the bottleneck.
In a per-transaction signature model:
N transactions require N signature verifications.
Signature cost grows linearly with transaction count.
In Remzar’s batch-signed model:
N transactions are committed into one Merkle root.
The Merkle root is signed once.
The block requires one ML-DSA-65 verification.
Transaction inclusion is protected by the Merkle commitment.
Transaction validity is handled by the validation layers.
So for the guardian/block layer:
signature work per block = 1
signature verification per block = 1
signature bytes per block = 3309
That is the practical reason the system can target hundreds of TPS while still using ML-DSA-65 at the block/guardian layer.
31. Open question for the PQC community
The main area where I would appreciate technical feedback is long-running memory/resource behavior in a live PQC blockchain node.
I have been actively hunting down possible memory leaks and unbounded retention paths in the Remzar codebase. I believe I have addressed the major issues by adding bounded queues, capped pending request maps, compact account-state snapshots, bounded recent-block caching, replay cleanup, PQ peer-state cleanup, serialized-size limits, and RSS threshold visibility. However, I do not want to overclaim that every possible leak or retention edge case is solved.
My question to the community is:
For a live Rust blockchain node using ML-DSA-65, ML-KEM-768, Merkle batching, RocksDB storage, P2P sync, and long-running validator/miner processes, what memory-retention or resource-exhaustion patterns should I be most careful to audit further?
Areas I am especially interested in reviewing are:
stale P2P pending requests
stale PQ handshake/session state
peer side-table growth
mempool growth
block/batch sync queues
Merkle proof allocation bounds
RocksDB read/write buffering behavior
account-state snapshot size
recent-block cache retention
wallet/key material lifetime
panic/failure paths that may skip cleanup
reorg/replay paths that may leave stale state
long-running RSS growth under peer churn
I would welcome feedback on practical audit strategies, Rust tooling, benchmark patterns, or design changes that the PQC/blockchain community would recommend for proving that the node remains memory-bounded over long runtimes.
32. Summary
Remzar is now a live Rust Layer-1 blockchain node chain with post-quantum components integrated across the stack.
The implementation uses:
ML-DSA-65 for wallet/keypair/guardian/batch/block signatures
ML-KEM-768 for P2P post-quantum key establishment
BLAKE3-XOF(64) for 64-byte canonical hashes and commitments
Merkle-root batch signing to avoid repeated 3309-byte guardian signatures
Argon2id + AES-256-GCM for encrypted ML-DSA-65 secret storage
strict wallet/public-key/address binding
bounded Merkle proof sizes and proof depth
bounded batch sizes and serialized storage sizes
bounded P2P pending requests and sync queues
bounded account-state recent block cache
compact account-state snapshots instead of serializing all block history
RSS threshold logging for live memory observability
rollback-on-failure for block application
version/genesis/PQ readiness gates before sync
replay protection for PQ nonces
panic containment and exact byte-length checks around cryptographic parsing
conservative local TPS benchmarking across hashing, Merkle aggregation, block assembly, state apply, encoding/decoding, and signature paths
The key TPS finding is:
Remzar does not require one ML-DSA-65 signature per transaction at the block/guardian layer.
Remzar signs one 64-byte Merkle root per block/batch.
The measured local block assembly model reached ~153,472 tx/sec before the fixed one-signature block cost.
That equals ~4,604,160 tx / 30s of local CPU-side assembly capacity.
The conservative public target remains ~250+ TPS, with higher configured targets depending on block size, transaction size, network propagation, RocksDB write path, and release hardware.
The signature-layer finding is:
Required block signing rate:
1 ML-DSA-65 signature per 30 seconds
Measured ML-DSA-65 primitive signing:
~45 signatures/sec
~1,350 signatures / 30s
Measured ML-DSA-65 primitive verification:
~136 verifications/sec
~4,080 verifications / 30s
Measured wallet signing path:
~0.50 signatures/sec
~15 signatures / 30s
Measured wallet verification path:
~139 verifications/sec
~4,170 verifications / 30s
That means the ML-DSA-65 signature layer has enough headroom for the one-signature-per-block model. In this design, the active throughput constraints move to practical blockchain engineering: block size, mempool policy, storage, network propagation, validation rules, consensus settings, and release hardware.
My practical takeaway is that FIPS 203/204 integration into blockchain software is possible, but the hard part is not just calling ML-DSA or ML-KEM. The hard part is making the whole node survive the operational consequences of larger keys, larger signatures, larger wire objects, untrusted peers, sync churn, reboot repair, storage migration, long-running memory growth, and honest TPS reporting.
I am sharing this as an implementation data point and would welcome technical review from the PQC community.
Project links:
https://www.remzar.com/
https://github.com/remzarchain/remzar