Good Morning,
I wanted to inform you all that several resources have been placed onto the NIST GitHub repository for the Ballot Definition CDF. BallotDefinition/develop holds the work in progress for the NIST Ballot Definition CDF and contains the machine readable schemas (XSD, JSON Schema) and human readable documentation of the UML Model. JSON and XML examples, and NIST publication documents will eventually reside here as well. We will discuss these in more detail on next week’s call.
Have a great weekend!
John Dziurłaj /d͡ʑurwaj/
Sr. Solutions Architect, The Turnout
Cell 330-714-8935
--
To unsubscribe from this group, send email to cdf-ballot-sty...@list.nist.gov
View this group at https://list.nist.gov/cdf-ballot-styles
---
To unsubscribe from this group and stop receiving emails from it, send an email to cdf-ballot-sty...@list.nist.gov.
-- ------- Ray Lutz Citizens' Oversight Projects (COPs) http://www.citizensoversight.org 619-820-5321
--
To unsubscribe from this group, send email to cdf-ballot-sty...@list.nist.gov
View this group at https://list.nist.gov/cdf-ballot-styles
---
To unsubscribe from this group and stop receiving emails from it, send an email to cdf-ballot-sty...@list.nist.gov.
Good Morning Ray,
There is not currently any example JSON files available.
However, the mapping of the mCDF CSC message to the CVR UML
Model is given in Appendix D of the draft mCDF format (refer to the CDF mapping
columns). I am reattaching the draft here.
The UML can then map forward to JSON using the CVR CDF JSON schema.
Regards,
John Dziurłaj /d͡ʑurwaj/
Sr.
Solutions Architect, The Turnout
SEL|1GO|1AEF^^^1~1AAR^^^2;
SEL|3AS|1CDY;
SEL|4SS|1DNT;
SEL|5TS|1EJM;
SEL|6RC|1FMZ;
SEL|8SR34|1GCB;
SEL|9CC|1HDW~1HSK;
SEL|10SB|1IMC;
SEL|11JS1|1JSK;
SEL|11JS2|1KJO;
SEL|12CA9|1LTL;
SEL|13CP1|1MTO;
SEL|18CP6|1RRM;
It appears that the three letter SEL| introduces each selection
And then a number, which appears to be the contest number. I imagine this is the same as the index from the ContestManifest.
Then there is a two-character indicator, like
'GO', 'AS', 'SS' etc. I don't know what this is.
Then an optional number.
Then another vertical bar, following by a number and three characters, and usually it is ended by a semicolon, but it might be a lot longer.
You say:
The SEL segments convey contest option selections made by the voter during a voting session. Each SEL segment represents a single contest, but potentially multiple selections. Contest::Code is optional, however, if omitted, the ContestSelection::Code must be unique across all contests.
You provide this in the text:
Can you please explain how the values map to
this list. I am guessing here.
The first number after SEL| is "CVR::Contest::Code." This may be the ContestID from the ContestManifest.
But then I am lost.
What does 'GO' indicate? And the rest??
--Ray
Thanks VERY much for digging in on these important questions Ray and Carl.
John
John McCarthy Volunteer Advisor (he/him) |
jo...@verifiedvoting.org• 510.666.5309 |
verifiedvoting.org
![]() ![]() ![]() ![]() |
![]() |
.
See responses below.
John Dziurłaj
-----Original Message-----
From: cdf-ball...@list.nist.gov <cdf-ball...@list.nist.gov> On Behalf Of Carl Hage
Sent: Wednesday, July 20, 2022 6:53 PM
To: cdf-ball...@list.nist.gov
Subject: cdf-ballot-styles JSON version of mCDF
On 7/20/22 7:26 AM, 'John Dziurłaj' via cdf-ballot-styles wrote:
> There is not currently any example JSON files available.
I was looking into the mCDF, so write a quickie perl program to decode the proposed mCDF and emit json. I was unable to create an mCDF example until I purchased a new computer that could run the latest Adobe. I used the example PDF file to create an mCDF and mCDF encoded as a pair of QR codes. [I used perl since it's the easiest to write and this is not intended for production or wide reuse.]
It would be useful to post sample mCDF, not just posting hidden software within a PDF that only runs with new versions of a proprietary app.
JD: The CDF Prototype does not require any version of Acrobat greater than X and should work with DC as well. Code examples will be provided in the mCDF GitHub repository soon.
Attached is a sample mCDF I created (arbitrary choices). Since the example PDF did not handle RCV, I guessed at the format and manually entered it. There is a converted JSON and the perl script I used to read the mCDF.
JD: There is a version that does handle RCV, available here.
My quickie program omits some features like handling escape characters in the mCDF and encoding \ and " in JSON strings. Also mCDF split into multiple files (for QR encoding).
Also, I now realize Code is a repeatable ExternalIdentifier, but I only emit as a string (by default). I added options to allow a string array and an option for full ExternalIdentifiers.
I attached a separate file mcdfrc.json that has Code emitted as an array and also with full indent (other than code array). Probably, only the mcdfrx.json is (sort-of) compliant with the spec.
Attached files:
- mcdf.txt (generated using the mCDFPrototype.pdf)
- mcdfr.txt (manually extended with the RCV contest)
- mcdfr.json (mcdfr.txt converted to simplified json)
- mcdfrc.json (mcdfr.txt in indented JSON with repeatable codes)
- mcdfrx.json (mcdfr.txt with Code as ExternalIdentifier array)
- parsemcdf.pl - Script to process mCDF
> However, the mapping of the mCDF CSC message to the CVR UML Model is
> given in Appendix D of the draft mCDF format (refer to the CDF mapping
> columns). I am reattaching the draft here.
Note: In my program I assumed the crossed out items are deleted, e.g.
there is no CVR::CVRContestSelection::OptionPosition. Thus my RCV addition doesn't match the example in the .docx (with extra ^).
> The UML can then map forward to JSON using the CVR CDF JSON schema
> <https://github.com/usnistgov/CastVoteRecords>.
The JSON schema on git is for the wrong version of the CVR CDF. It took a while for me to figure this out. DO NOT USE IT!
It might be useful to make a separate json schema for the mCDF.
----
Here are some comments I have while decoding the mCDF:
Since there is no json schema, I had to sort of invent some names that didn't match.
JD: There are no plans to create a separate JSON Schema for the mCDF Profile. The purpose of the mCDF Profiles is to map a fragment of content to a larger CDF instance. So a mCDF CSC message is expected to be map to and become part of a larger CVR instance (as currently defined in SP 1500-103). I will work on some documentation to further clarify this.
The Codes is repeatable, meaning it's an array in JSON. But this makes no sense to me within an mCDF that is tied to a ballot definition-- it should be 1..1, the Code matches the corresponding ID assigned in the ballot definition CDF (referenced via URI) for an election, contest, selection, etc. I added a -c option to emit an array.
JD: The codes are meant to be flexible in case systems at different levels need to interoperate, i.e. there could be a local level and state level coding scheme. I do not think this will be used in most cases in the mCDF, but since we’re mapping existing constructs of the CDFs, we must honor their structure.
The use of repeatable ExternalIndentifier objects seems excessively verbose. The attached mcdfx.json shows this (even worse with standard indent).
I did not add an option to emit {"Value":"1GO",""Type":"local"} in lieu of "1GO". I suggest not to use repeatable external identifiers in mCDF.
The ElectionScope is really the Code (ExternalIndentifier) of a GpUnit, not a GpUnit, since in the CVR::GpUnit has a Code that matches an ID in the definition (e.g. ElectionScopeID. The ElectionScope is a BallotDefinition.ReportingUnit, that has the GpUnit::Code (ExternalIdentifier), which I presume is the intent in the ELE. The heading 4.13.1 has subcomponents of Code, but applies to the CVR::Election::ElectionScope field 1 not Code field 2.
In all the mCDF examples, a single Code string is supplied with implied type "local" (where local is meaningless).
It would be better to identify which ID the Code matches rather than using some ambiguous coding of external identifiers.
A persistent ID (across files and across versions within some collection of election admin data) is not precisely defined in most all CDF. It seems that the JSON schema has a reserved "@id" and xml has the "id"
attribute, but has restricted content (letter/underscore, followed by letters and digits) so might not be able to represent all IDs.
JD: XML has an ObjectId attribute. They are not meant to represent durable data points (but exist to wire the file together). Thus, ExternalIdentifiers are used.
The ID in CDF files are scoped only to that file. But since the mCDF references a URI, we can assume Code applies to the IDs in that file.
A persistent ID is a required feature for interpreting election data files and is one of the flaws in CDF. It is possible to represent a persistent ID using an ExternalIdentifier, but there is no semantics for creating the ID used by an election admin EMS. The type "local" is ambiguous. Type "Other" could be used, but this is extremely verbose, usage arbitrary and inherently not standardized.
JD: I will investigate this in more detail.
The whole model of ExternalIdentifier in VIP etc. is a poor choice. A better approach is what all other domains use-- a prefix, e.g. f:06085 for fips, ds:0025 for Dominion Democracy Suite used by the election admin. A section of the CDF would be needed to define the prefixes used in all IDs. XML has IDREFS, a space separated list, which is a better model than verbose array.
JD: We are building the CDFs so that they work together well. A change like this would require modifications to all CDFs, which is not feasible at this point. If you’d like this to be considered for future “generations” of CDFs, please open it on the Voting GitHub repo.
Note usually IDs used within an EMS are unique only to a Class, e.g.
ContestId, and for a Selection may only be unique within a contest, but the xml id attribute must be unique within the file. It is not clear to me the restrictions on JSON @id.
JD: They should be the same, as there may be a need to map from JSON to XML.
Good Afternoon Carl,
Please see the responses below.
-----Original Message-----
From: cdf-ball...@list.nist.gov <cdf-ball...@list.nist.gov> On Behalf Of Carl Hage
Sent: Tuesday, August 2, 2022 10:58 PM
To: cdf-ball...@list.nist.gov
Subject: Re: cdf-ballot-styles Draft Ballot Definition CDF resources
On 7/8/22 5:54 AM, 'John Dziurlaj' via cdf-ballot-styles wrote:
> I wanted to inform you all that several resources have been placed
> onto the NIST GitHub repository for the Ballot Definition CDF.
> BallotDefinition/develop
> <https://github.com/usnistgov/BallotDefinition/tree/develop> holds the
> work in progress for the NIST Ballot Definition CDF and contains the
> machine readable schemas (XSD, JSON Schema) and human readable
> documentation
> <https://github.com/usnistgov/BallotDefinition/blob/develop/BallotDefi
> nition_UML_Model_Documentation.md>
> of the UML Model.
John, Thanks for creating the BallotDefinition github repository. It's a great start!
Here are a few quick comments:
It would be useful to indicate the area(s) on a page for a header, contest, and contest selection. The BoundedObject seems useful for that but doesn't seem to be represented. A Boundary could be added for a PhysicalContestOption, perhaps this is the intent.
Is Geometry supposed to be a BoundedObject? E.g. for an OptionPosition?
JD: The current design is that a Geometry provides additional details about an area identified by a BoundedObject (e.g. OptionPosition, FiducialMark). It is referenced by those objects.
For a contest, we might have a single rectangle enclosing the contest title, subtitle, etc. and all options. But so,etimes there are too many candidates and the selections need to be split across multiple columns (or sides). Each split might have a separate rectangular area with continuation subheader. A set of BoundedObjects could be given to specify the contest areas, or alternatively we could invent a ContestOptionGroup object that is a BoundedObject to represent a collection of Options within a rectangular area, and might have a continuation heading. If there are multiple areas given for a contest, the options contained would need to be determined by geometric intersection.
In some cases the contest or contest selection areas have a border, so we could use a geometry with border width to represent the bounding area.
We need the areas occupied for a contest so scanners can associate extra markings in the contest or contest selection areas (outside the marked option positions). When adjudicating with scanned image presentations, we need to be able to cut the applicable areas.
JD: We can add this as an optional property (0..*) from PhysicalContest to BoundedObject.
Besides the shape for an option position, it would be useful to have an enum that defines the marking style, e.g. filled area, a horizontal line (e.g. connecting arrow-style fiduccials), or perhaps an X (pair of diagonal lines).
JD: How would scanners use such information?
---
If there is a printable (text representation) of the contest selections vs the full face ballot, it would be useful to be able to define the locations for the selections made. In this representation we have a contest title, but then only list the selection(s) made, not all options. To be able to scan the printable text, we need to identify the locations on the printout for the contest title and an area for each selection allowed. For a contest option, we could define an area on a master ballot with the response text to be inserted (source area) into a response position (one or more target areas).
JD: Are you suggesting a ballot definition be created for the text appearing on ballot summary cards?
An OCR or QR code verifier could match option selections by a bitmap comparison with a reference as long as the reference and target positions are defined.
A ballot scanner could perform a bitmap comparison with the imageURL on a full face ballot and identify extraneous marks.
---
Eventually we need to be able to define fiducial marks that code for ballot style, precinct, sheet/side, etc. There could be bar codes inserted with some referenced set of standards, e.g. the UPC and ISBN style bar codes. A common option may be a set of locations with a mark
(rectangle) present or not present that represents a binary code associated with a definition.
JD: We will use mCDF for ballot style identification in all cases (including OMR), the difference is which segments get output. An OMR ballot will not need the mCDF segment for representation of contest selections, for example.