BIP-? : Free seed mnemonics for steganography and attack-resistance

116 views
Skip to first unread message

Aneesh Karve

unread,
Jun 8, 2024, 1:55:49 PMJun 8
to Bitcoin Development Mailing List
By adding a BIP-39 language called "Free" we can have nice things from PBKDF2(), including offline and steganographic seeds, in a way that's fully compatible with BIP-39. For instance users can store strong seeds in an easy-to-conceal deck of cards.

Proposal inline below, with link to a reference implementation in Python. Online spec here https://github.com/akarve/bip-keychain/blob/main/pre-bips/steganographic-seeds.md.

I could use your help on a proper seed complexity measure.

Let me know your thoughts.

# Free seed mnemonics for steganography and attack-resistance

> "Anyone who considers arithmetical methods of producing  random digits is,  
> of course, in a state of sin." —John Von Neumann

# Abstract

Bitcoin seed mnemonics face attackers ranging from casual thieves to state actors.
We propose backward-compatible, non-breaking changes to BIP-39 master seed derivation
to unlock broader mnemonic options

from PBKDF2().
Users of this BIP can generate and store seeds offline with common physical objects
that are plausibly deniable: playing cards, chess boards, and paper napkins to name
a few. As a result seed mnemonics gain greater portability, memorability, entropy,
and steganography as compared to BIP-39 mnemonics.


# Motivation

BIP-39 mnemonic seed phrases have the following shortcomings:

1. BIP-39 seed words, if found by an attacker, are instantly recognizable
as a Bitcoin passphrase to be scrutinized or seized outright.

1. BIP-39 "mnemonics" are not particularly easy to remember.

1. Computing BIP-39's checksum bits necessitates a computer, making offline
seeds impossible.

1. Some hardware vendors support independent sources of entropy such as die rolls
but, unfortunately for the security, convenience, and trust of the users, vary
in how they convert user entropy to the proper binary seed.

    1. Users are required to run their own vendor-specific code to verify that the
    vendor has actually used their provided entropy. Said verification often _still_
    requires blind trust (how does the user know that entropy produced the right
    binary seed?) and is prohibitively technical for many users.

    1. It is cumbersome to manually enter the results of 100 six-sided die rolls,
    the minimum number of rolls to surpass 256 bits of entropy.
   
    1. Die rolls are poor for storing secrets since there are usually fewer dice
    than rolls and since dice are easily mixed up.
   
        > Compare the effort and portability of these 100 rolls
        > to the far easier and more portable shuffled deck of cards.

The above weaknesses cause all Bitcoin wallets to lose,
**due to no fundamental limitation whatsoever**,
the following benefits:

1. Portability in physical media.

1. Portability and memorability in the human brain.

1. Repudiation in high-risk situations where individuals or the Bitcoin protocol
are under attack.

1. The ability to generate, with no electronics, a cryptographically
strong seed that will later work with many hardware wallets.

1. Moore's-law-resistant mnemonics that encode far more than 256 bits of entropy.

    > Although more than 256 bits of entropy exceeds today's ECDSA private keys,
    > it rightfully leaves the door open for stronger keys in the future
    > and further permits today's mnemonics to be repurposed, without change,
    > for tomorrow's larger keys.

**As result, Bitcoin users seeking to evade oppressive governments and other attackers
have fewer defenses than the protocol naturally affords.**

Importantly, the above weaknesses can be remedied with
_little to no change to the BIP-39 seed phrase implementations_ already ubiquitous
in hardware and software wallets.


# Risks, remedies, and alternatives

## Risks

1. Giving users more sources and options for mnemonics increases the
risk that these users provide weak inputs that contain too little entropy or weak
entropy (e.g. common phrases).

    > Implementers must mitigate this risk with an easy-to-implement entropy
    > measure and message to the user.

1. BIP-39 includes checksum bits in the final word, offering some protection
against erroneous entry. The present proposal eliminates both the advantages (integrity)
and disadvantages (cannot be computed by hand) of the BIP-39 checksum bits
in favor of a much broader set of steganographic mnemonics that can be stored,
generated, and carried in situations of urgency and scarcity.

    > BIP-39 checksum validation _shall remain in place_ (unchanged) for BIP-39
    > mnemonics.

    > Advanced users might choose to implement their own checksums or error-correcting
    > codes.


# Specification

**The present spec is fully backwards-compatible with BIP-39 mnemonics**.
We introduce a new input "language" `Free` that admits a superset of BIP-39 mnemonics.
Wallets can and should continue to validate BIP-39 mnemonics as in the past.
`Free` should be treated as a new input language, similar to English, French, or
any of the BIP-39 languages.

`Free` should allow at a minimum the ASCII printable characters, minus capital
letters.


BIP-39 derives binary seeds by applying `PBKDF2()` to a mnemonic and passhprase
as follows:

```
PKBDF2(
    password=bin(nfkd(mnemonic)),
    salt=bin(nfkd("mnemonic" || passphrase)),
    hash_name="HMAC-SHA512",
    iterations=2048,
)
```

`nfkd()` converts the input string to Unicode normalization form KD.

Fortunately `PBKDF2()` does not limit the domain either the `password` or `salt`
argument. Existing implementations are therefore easy to update.
Applications must _not change_ passphrase entry, validation, normaliztion, or application.

Applications should regard the _mnemonic_ as a raw string to permit the widest
possible array of input characters. (See the following section for details.)

In the interest of backward compatibility we propose that existing BIP-39 implementations
treat `Free` as the tenth input "language" with the following differences from the 9
BIP-39 languages:

1. If they do not already, lower the case of all input letters.
Strangely BIP-39 is silent on the subject of case.

1. If they do not already, `strip` the `Free` mnemonic of surrounding whitespace
and split it on the regular expression `\s+`.

1. Apply `nfkd()` to the `Free` mnemonic.


```
if language == "Free":
    norm_mnemonic = lower(nfkd(split("\s+", mnemonic)))
    validate(norm_mnemonic)
    PKBDF2(
        password=bin(norm_mnemonic),
        salt=bin(nfkd("mnemonic" || passphrase))),
        hash_name="HMAC-SHA512",
        iterations=2048,
    )
```


The output of PKBDF2 is converted to a master seed as in BIP-39.


## `validate()`

`validate()` must at a minimum estimate the complexity `complexity()` of
the user proposed mnemonic and must refuse the mnemonic if the entropy is less
than a threshold (we recommend a threshold of 0.5).

Implementations must know the cardinality `C` of the mnemonic character set.
Applications must support at a bare minium an input cardinality of **74**
(the number of printable ASCII characters) but higher values for `C` are both
permissible and recommended.  As suggested below, the higher the cardinality of
the input set, the greater the steganographic potential.

    0123456789abcdefghijklmnopqrstuvwxyz!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c

The Shannon Entropy `SE` of a string is as follows:

$$ se(X) := - \sum_{i} p(x_i) \log_2 p(x_i) $$

As an optimization for fixed `X` with all unique entries and cardinality `C`:

$$ SE(X) := log_2(C)$$

> Intuitively the above is the (fractional) number of bits needed to represent all
> characters in the `universe`.

The  relative entropy is simply the following:

$$ re(mnemonic\_tokens, universe) := SE(mnemonic) / SE(universe)$$

`universe` is the list of all possible input tokens. `mnemonic_tokens` is a tokenized
list of the inputs. Tokens may vary in length per the universe and application,
though applications can start withe one token per ASCII printable character.

`re()` ranges from 0 to 1 when `mnemonic_tokens` are all unique. An `re()` of 0.5
reflects that the user has provided enough information as providing one instance
of half of all input tokens.

`re()` alone is not a complete measure of password complexity since it does
not take order into account. For instance the string `"abc...xyz"` and its
reverse both have hight relative entropy but are highly predictable.

To correct for this we can use the Hamming Distance, `hd()`, which counts the
number of characters that are not in sorted order:

$$
\text{Hamming Distance} = \sum_{i=1}^{n} (s_i \neq t_i)
$$

Since undesirable order might be forwards of backwards we take the Relative
Absolute Hamming Distance `rahd()`:

```
rahd() := min(hd(norm_mnemonic), hd(norm_mnemonic.reverse())) / len(norm_mnemonic)
```

As with `re()`, `rahd()` ranges from 0 to 1.

```
validate := F(re(norm_mnemonic) + rahd(norm_mnemonic)) / 2
```

If `validate()` returns a complexity less than a given threshold (TBD) the wallet
should warn the user.

### TODO: examples of `validate()` for representative inputs

* [ ] Dice, cards, chess, bad + good text passwords
* [ ] Show how this leads to standard dice verification and fingerprinting
across all hardware vendors (phew)


# Reference implementation

* https://github.com/akarve/bipsea

## Example

```sh
bipsea validate -f free -m "$(cat input.txt)" | bipsea xprv
```


# Example steganographic mnemonics in `Free`


## Playing cards

A common deck of 52 cards encodes approximately 225 bits of entropy,
more entropy than 21 BIP-39 words. Such decks can be carried on one's person
without raising an eyebrow.

Users might enter cards in deck order as follows:

```
2S 10S kC 10H 5S ...
```


## Chess, three different ways


### Fictional move order

A chess board contains 64 totally ordered squares each of which can be addressed
in algebraic notation of the form `{a-h}{1-8}`. A move specifies one of five piece
types (`R, N, B, Q, K`) followed by a square.
`Nf1` is an example of a single knight move.
A series of 42 chess moves written on an easy-to-repudiate, easy-to-obfuscate,
piece of paper encodes at least as much entropy as 24 BIP-39 seed words.

$$ \log_2(69^{42}) \approx 256 $$

Ensuring that such moves comprise a valid chess game (and thus greater steganography)
is a hard problem and is neither required nor recommended in the context of this BIP.
It is not recommended since it constrains the potential entropy in unclear and
hard-to-reckon ways.


### PGN files

Nevertheless, high steganography in a chess game can be achieved with file formats
that support comments, such as the common PGN format. Observe the following snippet
from a PGN file and note the opportunities for arbitrary comments.


```
[Event "Third Rosenwald Trophy"]
[Site "New York, NY USA"]
[Date "1956.10.17"]
[EventDate "1956.10.07"]
[Round "8"]
[Result "0-1"]
[White "Donald Byrne"]
[Black "Robert James Fischer"]
[ECO "D92"]
[WhiteElo "?"]
[BlackElo "?"]
[PlyCount "82"]

1. Nf3 Nf6 2. c4 g6 3. Nc3 Bg7 4. d4 O-O 5. Bf4 d5 6. Qb3 dxc4
7. Qxc4 c6 8. e4 Nbd7 9. Rd1 Nb6 10. Qc5 Bg4 11. Bg5 {11. Be2
followed by 12. O-O would have been more prudent. The bishop
move played allows a sudden crescendo of tactical points to be
uncovered by Fischer. -- Wade} Na4 {!}
```


### Marked game boards

Alternatively a user might choose to subtly mark the 64 squares of two chess boards
to represent a 1 or 0 in each of 128 unique positions, storing 128 bits of entropy
(equivalent to 12 BIP-39 seed words). Random bits can be generated with a coin.


## Any board game

One can imagine steganographic secrets similar to chess for Monopoly, Go, or any
board game.


## Dice, but different

We noted above that if the user were to roll and then store 100 dice that it would
be impractical to retain the original order.
We observe that there are 21 small writing surfaces (the solid dots on each face)
on a six-sided die. If the user were to inscribe a single random digit into each dot
he would obtain approximately 70 bits of ordered entropy. Three such dice would be
easy to retain and order and provide greater entropy than 18 BIP-39 seed words.


## A paper napkin

In addition to a literal napkin sketch (with phone numbers, measurements,
or harmless notes) users without access to coins, dice, game boards or electronics
could generate "poor man's entropy" by dropping a stone onto a napkin divided into
equal-sized quadrants to generate entropy.

Said "poor man's entropy" is not recommended but nevertheless illustrates
the vast expansion in capability and steganography that obtains from this BIP.
Reply all
Reply to author
Forward
0 new messages