Some questions about schematic v9 format

Salvador E. Tropea

unread,

Jan 15, 2025, 8:03:43 AMJan 15

to KiCad Developers

Hi All!

Given the lack of documentation, I have some questions:

1) Why the data for embedded files seems to be so different than the
data for images? I mean, images are base64 encoded and stored as (data
STRING) with the string separated in chunks, before KiCad 8 it wasn't an
string, so it looked as keywords. KiCad 8 fixed it. And now embedded
files are (data |KEYWORDS|) ... Why the |? Why not strings? Can someone
explain it?

2) The checksum seems to be a really rare one, is it MurMur Hash 3 with
seed 0xABBA2345? I can't find a popular command line tool to verify it.
I tried the "mmh3" Python module using `mmh3.hash128(c,
seed=0xABBA2345)` (with c as the bytes from the file decoded and
uncompressed) and couldn't reproduce the checksum. I guess this rare
hash is fast, is it worst? I mean: we have various robust and popular
hashes, why this?

BTW: I find strange that after choosing to embed fonts the dialog
doesn't immediatly show them. They are there when I save, but I think
they should be there before.

Regards, SET

Seth Hillbrand

unread,

Jan 15, 2025, 1:06:47 PMJan 15

to dev...@kicad.org

The data for embedded files follows the SEXPR format (https://datatracker.ietf.org/doc/draft-rivest-sexp/). Base64 is supposed to be bracketed by the pipes. This allows third-party sexpr parsers to more easily handle our data format when we follow conventions. We did not do this for the images and that was an oversight. Eventually, images will be added to the embedded files format and the distinction will go away.

We use MurMur3 hash -- unmodified from the source at https://github.com/aappleby/smhasher. You might look at things like https://stackoverflow.com/questions/75921577/murmur3-hash-compatibility-between-go-and-python to determine why your method is different. Yes it is fast. No it is not worst. We do have robust and popular hashes. This is one of them. That is why we use it.

Bug reports for preferred behavior are great to receive at GitLab.

Seth

Seth Hillbrand
Lead Developer
+1-530-302-5483‬
Long Beach, CA
www.kipro-pcb.com in...@kipro-pcb.com

--
You received this message because you are subscribed to the Google Groups "KiCad Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to devlist+u...@kicad.org.
To view this discussion visit https://groups.google.com/a/kicad.org/d/msgid/devlist/b4479a17-148b-490d-8058-4c82225e4e11%40inti.gob.ar.

Salvador E. Tropea

unread,

Jan 17, 2025, 4:58:29 AMJan 17

to dev...@kicad.org

Hi Seth:

On 15/1/25 15:06, 'Seth Hillbrand' via KiCad Developers wrote:

The data for embedded files follows the SEXPR format (https://datatracker.ietf.org/doc/draft-rivest-sexp/). Base64 is supposed to be bracketed by the pipes. This allows third-party sexpr parsers to more easily handle our data format when we follow conventions. We did not do this for the images and that was an oversight. Eventually, images will be added to the embedded files format and the distinction will go away.

Ok, note that images already changed in the past, a pitty they didn't get the correct format. (Note: data was "xxxx" and changed to xxxx)

This is not the only thing that is constantly changing, things like "hide -> (hide yes)" or "(uuid xxxx) -> (uuid "xxxx")" pop quite often. I guess somebody should be in charge of approving the way things are implemented in the file formats. Not to mention document it before a release, and I mean document the new release not the previous.

BTW: This is related to the popularity issue, if the change of format from custom to Sexp had been from custom to JSON (IMHO far more popular than Sexp) these errors would not happened. You have plenty of libs and tools to implement and verify JSON.

We use MurMur3 hash -- unmodified from the source at https://github.com/aappleby/smhasher. You might look at things like https://stackoverflow.com/questions/75921577/murmur3-hash-compatibility-between-go-and-python to determine why your method is different. Yes it is fast. No it is not worst. We do have robust and popular hashes. This is one of them. That is why we use it.

I see we have a quite different idea of what is popular. Let me clarify, if you get a minimal Linux core, lets say the docker image for "debian:bookworm-slim" (a slim version of Debian Bookworm intended to be the base for other docker images) you'll find MD5, SHA256, SHA512, SHA224, SHA384 and a few more hashes implemented with command line commands. If you take a language like Python (included in KiCad) and take a look at the standard hashlib module you'll find SHA1, SHA224, SHA256, SHA384, SHA512, SHA-3 and MD5. These are popular hash algorithms.

Now if you take a look at MMH3 ... even the command line tool is rare and hard to find! Not supported by the core Python, more than one competing modules at PyPi, the most popular implements MMH2, not MMH3. The one that implements MMH3 isn't popular enough to be part of Debian. For me this isn't a popular hash.

The compression used (Zstandard) is becoming popular, but isn't really popular. If you use Base64 + GZip + MD5 your data can be processed by a shell script on most (if not all) modern Unix style OSs and you don't need extra dependencies for Python.

Bug reports for preferred behavior are great to receive at GitLab.

You mean the image data vs embedded file inconsistency?

Regards, SET

To view this discussion visit https://groups.google.com/a/kicad.org/d/msgid/devlist/CAFdeG-p5CHrbVHcANSKjU3TSRarkzSb8LbnAsN2pC79xqisk8g%40mail.gmail.com.

Seth Hillbrand

unread,

Jan 17, 2025, 1:07:01 PMJan 17

to dev...@kicad.org

We started with SHA256 (see https://gitlab.com/kicad/code/kicad/-/blob/master/include/embedded_files.h?ref_type=heads#L67).

The problem with cryptographic hashes is that they are slow. By design. This affected the load time for KiCad files. So we decided to use a known-good, fast hash function. It is used internally by libstdc++, nginx, npm, Elasticsearch and a number of other projects. The lack of commandline tool was not a consideration and even if it was, I do not think that the tradeoff between slower load times for everyone and easy commandline decoding for the limited number of people who would want to do that would have been a good choice. Similarly with zstd. Gzip output was about 40% larger for many files. There's just not a good reason to make everyone deal with larger file sizes in order to accommodate a niche use case.

You seem to be annoyed by the updating KiCad file format. The changes that you called out were specifically made in order to standardize the format. This is well-documented on the dev-docs page.

The choice of SEXPR was made in 2011 and committed in 2012. JSON was standardized in 2013. JSON libraries for C++ weren't released until 2015. SEXPR had been well-established for many years including usage in other EDA tools. It was the right choice for the time and it remains a decent platform. Everyone likes to Monday-morning quarterback old decisions but it shows a lack of historical understanding around the project.

Seth

Seth Hillbrand
Lead Developer
+1-530-302-5483‬
Long Beach, CA
www.kipro-pcb.com in...@kipro-pcb.com

To view this discussion visit https://groups.google.com/a/kicad.org/d/msgid/devlist/7edff2ca-38f5-447d-8247-d8f9b6ec3ccb%40inti.gob.ar.

Reply all

Reply to author

Forward