RE: cdf-use-cases CVR Comment: Deprecate XML, make JSON the recommended format.

13 views
Skip to first unread message

David RR Webber (XML)

unread,
May 24, 2021, 11:12:06 PM5/24/21
to Ray Lutz, John Dziurlaj, cdf-ball...@list.nist.gov, cdf-us...@list.nist.gov
Ray,

With respect XML can impart way more into schema and metadata than the simple JSON syntax allows.

Yes - the JSON can be swapped in for storage structure purposes - "on the wire" - (XSLT v3 has a function to do that BTW) - but only so far that if you really need to know what is going on - then you can refer back to the XML to find out.

Notice how theoretically computer languages work - you have the Backus-Naur syntax to understand how things can be coded.

Without the formal syntax that XML provides - you have too many unknown unknowns.

Duncan Buell already pointed out several examples of lack of rigour providing actual faults and failures in operational vendor products. Using JSON alone makes it far more open to such.

Thanks, David


-------- Original Message --------
Subject: cdf-use-cases CVR Comment: Deprecate XML, make JSON the
recommended format.
From: Ray Lutz <ray...@citizensoversight.org>
Date: Wed, May 19, 2021 8:03 pm
To: John Dziurlaj <jo...@turnout.rocks>,
"cdf-ball...@list.nist.gov" <cdf-ball...@list.nist.gov>,
"cdf-us...@list.nist.gov" <cdf-us...@list.nist.gov>

Submitted comment:
https://github.com/usnistgov/CastVoteRecords/issues/29

Deprecate XML, make JSON the recommended format.

The two formats, XML and JSON are not very different in structure. JSON has now become the industry leader while XML is no longer recommended. It is senseless to support two formats. Writers of CVRs should have one format. If they are currently writing XML, then it can be easily converted. At this stage in the standard adoption curve, there is no rationale to support two formats.

--Ray Lutz

On 5/19/2021 7:28 AM, 'John Dziurlaj' via cdf-ballot-styles wrote:
The Common Data Format Research Group is requesting feedback by July 1st on the four published Common Data Formats (CDFs) for Election Data. The CDFs cover key areas of interoperability such as Election Results Reporting, Cast Vote Records, Election Event Logging and Voter Records Interchange.
NIST is using GitHub as a collaboration platform for CDF development. Each CDF specification has its own GitHub “repository”, one per specification.
Each Common Data Format has its of set of “issues” associated with its repository. Make sure issues are opened in the appropriate repository. Comments about CDF development in general should use the “Voting” repository, but only if the issue falls outside the other CDFs.
Here are links to each of the CDF repositories:
Use the “Issues” tab to view current issues. Review existing issues to determine if your issue has already been raised. If not, you may use the green “New issue” button to create a new one. Feedback should be actionable and refer to one or more files in the repository. A free GitHub account is required to open issues and comment.
Please provide feedback by July 1.
Regards,
John Dziurlaj
CDF Research Group Chair
 
--
To unsubscribe from this group, send email to cdf-ballot-sty...@list.nist.gov
 
View this group at https://list.nist.gov/cdf-ballot-styles
---
To unsubscribe from this group and stop receiving emails from it, send an email to cdf-ballot-sty...@list.nist.gov.

-- ------- Ray Lutz Citizens' Oversight Projects (COPs) http://www.citizensoversight.org 619-820-5321

--
To unsubscribe from this group, send email to cdf-use-case...@list.nist.gov
 
View this group at https://list.nist.gov/cdf-use-cases
---
To unsubscribe from this group and stop receiving emails from it, send an email to cdf-use-case...@list.nist.gov.

Ray Lutz

unread,
May 25, 2021, 12:38:29 AM5/25/21
to cdf-us...@list.nist.gov
Hi David:

I understand that opinion. But the richness of XML is also its downside. I have many other concerns with the existing NIST standard. But my position on this stands. XML provides no more formal syntax than does JSON. I have worked with BNF syntax for years, ASN.1, etc. The highly nested structure that results from unrestricted use of UML and similar models results in overly complex structures that are not compatible with current trends in NoSQL databases and result in "anti-patterns" that are difficult to process. The highly encapsulated realization -- which is a drawback to both XML and JSON if not coupled with rigorous restrictions to their use -- results in a data tree which must be flattened for any reasonable implementation.

But It is foolish to have two different formats for a new standard. Choose just one. The trend is very strong in favor of JSON.

This is a pretty good comparison article:
https://www.toptal.com/web/json-vs-xml-part-1#:~:text=JSON%20is%20more%20popular%20than,any%20other%20data%20interchange%20format.

Developer communities insist that JSON became more popular than XML because of its concise declarative scope and simple semantics. Douglas Crockford himself summarizes some of JSON’s advantages on JSON.org: “JSON is easier for both humans and machines to understand, since its syntax is minimal and its structure is predictable.”20 Other critics of XML have focused on XML’s verbosity as the “the angle bracket tax.”21 The XML format requires each opening tag to be matched with a closing tag, resulting in redundant information that can make an XML document significantly larger than an equivalent JSON document when uncompressed. And, perhaps more importantly, developers say: “it also makes an XML document harder to read.”

I believe that if we are to have TWO formats, then a very strong justification will have to be made for that.
As it is, the NIST CVR standard is both highly wasteful by using terms like "ManifestationId" and ""OutstackConditionIds", which happens to be almost always empty, while trying to be specifically adapted for use by relational databases by using ContestId and CandidateIds, PartyID to link to the other tables. This is asking for trouble per the current conversation on the list about how the definitions of such indexes can differ between users. Also, this makes it harder for humans to read and understand the data so as to proof it. PartyId is a number. How about REP and DEM, GRE, LIB, etc. which are already the normal abbreviations. Small abbrevations using characters are easier to read and promote standard written designations as well. Using numbers is a mistake.

Such things are not fixed by either XML or JSON, but by the design of the data, and right now the structure of the data does not sufficiently help to avoid such failures, but instead actually invite those mistakes. But I want to focus only on the encoding, and it is simply silly to have two supported formats.

Some of the "flexibility" of XML should not be used, in my opinion, such as the use of attributes within tags.

My recommendation would be to un-nest the records as much as possible. For example, NIST CVR proposes that the tabulator encompasses the session and cards. Another way to do this is to treat the tabulator as a metadata value in a single flat record, along with the session and boil it down to as flat a record as possible. I suggest that more redundancy be included in the linkages (not just integers that can be easily confused), and to encapsulate the set of files, which is done but is not part of the standard.

One of the more difficult issues is the naming of contests and ballot options. Between jurisdictions, it varies widely. Even within jurisdictions, there can be differences that are confusing. Contest names like "Constitutional Amendment I" and Constitutional Amendment II" have a lot of characters, but almost no difference and it is easy to mix them up. Assigning them to numbers does almost no good unless we also can equate the names.

So I am curious, what is it about this application that makes XML essential?

--Ray Lutz

Jared Marcotte

unread,
May 25, 2021, 11:02:55 AM5/25/21
to Ray Lutz, cdf-us...@list.nist.gov

Hi all –

 

I hope this finds everyone well!

 

It’s been great to see some lively discussions on the mailing list. However, for the sake of transparency and continuity, please keep the discussions of open issues on the specific GitHub issue—this one, for this particular discussion. This makes the conversation visible to those outside this email list and helps maintainers track the discussions. If you prefer using email instead of commenting directly on GitHub, it is possible to comment on an issue you are subscribed to by replying to the email GitHub sends (NB: you’ll still need to sign up for a free GitHub account associated with the email address you plan to use).

 

@Ray, the second part of your email seems to be a different issue—i.e., the development of the CDFs in general. Can you open a separate issue for this under Voting repo? Thanks!

 

Cheers.

 

- jkm

Ray Lutz

unread,
May 25, 2021, 12:19:06 PM5/25/21
to cdf-us...@list.nist.gov
Regarding the use of GitHub for discussions of this nature:
1. I use GitHub a lot for development, and even then, I find that using issues tends to get people focused too closely on a given issue rather than the big picture. If you continue this for a while, people fix bugs and make corrections to fix the local issue, but never give the larger picture much thought. Therefore, I tend not to use the issues system much even when working on software unless the code is well past the development phase. As a result, I find that googledocs has been a valuable way to work, but those get unwieldy when they get to over 100 pages.

2. The email list is poorly coupled with GitHub Issues. I see posts there that do not appear on the list. For example, JDziurlaj changed added "breaking" and "enhancement" with no discussion. Those labels are incorrect. When you deprecate something, such as XML, it does not break anything. It normally means that it still exists but is not recommended for new implementations.

3. As another example, jungshadow made a comment on github but it was not reflected to the list. Perhaps there is a way to subscribe the mailing list to the GitHub issues so that any updates to GitHub issues will be at least announced on the list.

I suggest that we follow the example of the IETF. The list is where all discussions are held. If you want to do something on GitHub or somewhere else, it should be reflected on the list. You can provide archives for anyone to review. Votes, if such things are held, are held on the list, and not at in-person meetings which are hard to attend.

You will find that I have a pretty robust list of concerns regarding the CVR format, and I am posting just this first one to see how it goes. So it appears that if we don't solve the issue of getting the github issues submissions coupled with the lists, I will have to post everything twice.

--Ray

David RR Webber (XML)

unread,
May 25, 2021, 2:02:18 PM5/25/21
to cdf-us...@list.nist.gov
Ray,

That Toptal piece is not exactly objective - and the structure of JSON is anything but predictable! And the old chestnut of JSON being more compact has been debunked many years ago.

Case in point - I just implemented a REST interface for a high profile US gov project - where we had to embed XML inside the JSON outer wrapper because the middleware could not handle generating the JSON as needed from the data source. 

WRT your point on XML object complexity - there are standards out there with 15+ levels of indirection (I won't name names) - ours is definitely not one of them! Sensible reuse of components aids reuse and consistency.

Mentioning data content consistency - XML provides ways to implement that and also there is available mature open source tooling that quickly checks for that.

Since you are curious - nothing is ever essential - however - from the perspective of creating reliable standards that can be implemented consistently, predictably and content verified - key needs for election systems - one has to give the nod to XML over JSON at this point as the reference syntax of choice. And especially for archival and audit purposes. And as previously noted the JSON can be easily rendered from the XML with a simple transformation as needed.
Reply all
Reply to author
Forward
0 new messages