Draft Ballot Definition CDF resources

18 views
Skip to first unread message

John Dziurlaj

unread,
Jul 8, 2022, 8:55:00 AM7/8/22
to cdf-ball...@list.nist.gov

Good Morning,

 

I wanted to inform you all that several resources have been placed onto the NIST GitHub repository for the Ballot Definition CDF. BallotDefinition/develop holds the work in progress for the NIST Ballot Definition CDF and contains the machine readable schemas (XSD, JSON Schema) and human readable documentation of the UML Model. JSON and XML examples, and NIST publication documents will eventually reside here as well. We will discuss these in more detail on next week’s call.

 

Have a great weekend!

 

John Dziurłaj /d͡ʑurwaj/

 

Sr. Solutions Architect, The Turnout
Cell 330-714-8935

 

Ray Lutz

unread,
Jul 12, 2022, 12:12:02 PM7/12/22
to cdf-ball...@list.nist.gov
John:
I am still looking to find the json equivalent of the example you provided for the micro CDF proposal to encode the voter's selections in a QRCode.
If I can get that, then I can prepare a comparison between your proposal and the one using the JSON/CBOR/COSE/Zlib/Base45 pipeline.
--Ray
--
To unsubscribe from this group, send email to cdf-ballot-sty...@list.nist.gov
 
View this group at https://list.nist.gov/cdf-ballot-styles
---
To unsubscribe from this group and stop receiving emails from it, send an email to cdf-ballot-sty...@list.nist.gov.

-- 
-------
Ray Lutz
Citizens' Oversight Projects (COPs)
http://www.citizensoversight.org
619-820-5321

Ray Lutz

unread,
Jul 19, 2022, 10:37:09 AM7/19/22
to cdf-ball...@list.nist.gov
[REPOSTED]


John:
I am still looking to find the json equivalent of the example you provided for the micro CDF proposal to encode the voter's selections in a QRCode.
If I can get that, then I can prepare a comparison between your proposal and the one using the JSON/CBOR/COSE/Zlib/Base45 pipeline.
--Ray

On 7/8/2022 5:54 AM, 'John Dziurlaj' via cdf-ballot-styles wrote:
--
To unsubscribe from this group, send email to cdf-ballot-sty...@list.nist.gov
 
View this group at https://list.nist.gov/cdf-ballot-styles
---
To unsubscribe from this group and stop receiving emails from it, send an email to cdf-ballot-sty...@list.nist.gov.

John Dziurłaj

unread,
Jul 20, 2022, 10:26:14 AM7/20/22
to cdf-ballot-styles, ray...@citizensoversight.org

Good Morning Ray,

There is not currently any example JSON files available.

However, the mapping of the mCDF CSC message to the CVR UML Model is given in Appendix D of the draft mCDF format (refer to the CDF mapping columns). I am reattaching the draft here. 

The UML can then map forward to JSON using the CVR CDF JSON schema.

 Regards,

 John Dziurłaj /d͡ʑurwaj/

 Sr. Solutions Architect, The Turnout

mCDF Special Publication Draft - WG.docx

Ray Lutz

unread,
Jul 20, 2022, 11:43:01 AM7/20/22
to cdf-ball...@list.nist.gov
Hi John:

Sure, I'm willing to try to work with that. Perhaps you can explain the syntax a bit more. For this, I will pull out the example you use in your draft document, and let's concentrate on just the selections:

SEL|1GO|1AEF^^^1~1AAR^^^2;
SEL|3AS|1CDY;
SEL|4SS|1DNT;
SEL|5TS|1EJM;
SEL|6RC|1FMZ;
SEL|8SR34|1GCB;
SEL|9CC|1HDW~1HSK;
SEL|10SB|1IMC;
SEL|11JS1|1JSK;
SEL|11JS2|1KJO;
SEL|12CA9|1LTL;
SEL|13CP1|1MTO;
SEL|18CP6|1RRM;

It appears that the three letter SEL| introduces each selection

And then a number, which appears to be the contest number. I imagine this is the same as the index from the ContestManifest.

Then there is a two-character indicator, like 'GO', 'AS', 'SS' etc. I don't know what this is.

Then an optional number.

Then another vertical bar, following by a number and three characters, and usually it is ended by a semicolon, but it might be a lot longer.

You say:

The SEL segments convey contest option selections made by the voter during a voting session. Each SEL segment represents a single contest, but potentially multiple selections. Contest::Code is optional, however, if omitted, the ContestSelection::Code must be unique across all contests.


You provide this in the text:


Can you please explain how the values map to this list. I am guessing here.
The first number after SEL| is "CVR::Contest::Code."
This may be the ContestID from the ContestManifest.

But then I am lost. What does 'GO' indicate? And the rest??

--Ray

Carl Hage

unread,
Jul 20, 2022, 7:53:24 PM7/20/22
to cdf-ball...@list.nist.gov
On 7/20/22 7:26 AM, 'John Dziurłaj' via cdf-ballot-styles wrote:
> There is not currently any example JSON files available.

I was looking into the mCDF, so write a quickie perl program to decode
the proposed mCDF and emit json. I was unable to create an mCDF example
until I purchased a new computer that could run the latest Adobe. I used
the example PDF file to create an mCDF and mCDF encoded as a pair of QR
codes. [I used perl since it's the easiest to write and this is not
intended for production or wide reuse.]

It would be useful to post sample mCDF, not just posting hidden software
within a PDF that only runs with new versions of a proprietary app.

Attached is a sample mCDF I created (arbitrary choices). Since the
example PDF did not handle RCV, I guessed at the format and manually
entered it. There is a converted JSON and the perl script I used to read
the mCDF.

My quickie program omits some features like handling escape characters
in the mCDF and encoding \ and " in JSON strings. Also mCDF split into
multiple files (for QR encoding).

Also, I now realize Code is a repeatable ExternalIdentifier, but I only
emit as a string (by default). I added options to allow a string array
and an option for full ExternalIdentifiers.

I attached a separate file mcdfrc.json that has Code emitted as an array
and also with full indent (other than code array). Probably, only the
mcdfrx.json is (sort-of) compliant with the spec.

Attached files:
- mcdf.txt (generated using the mCDFPrototype.pdf)
- mcdfr.txt (manually extended with the RCV contest)
- mcdfr.json (mcdfr.txt converted to simplified json)
- mcdfrc.json (mcdfr.txt in indented JSON with repeatable codes)
- mcdfrx.json (mcdfr.txt with Code as ExternalIdentifier array)
- parsemcdf.pl - Script to process mCDF

> However, the mapping of the mCDF CSC message to the CVR UML Model is
> given in Appendix D of the draft mCDF format (refer to the CDF mapping
> columns). I am reattaching the draft here.

Note: In my program I assumed the crossed out items are deleted, e.g.
there is no CVR::CVRContestSelection::OptionPosition. Thus my RCV
addition doesn't match the example in the .docx (with extra ^).

> The UML can then map forward to JSON using the CVR CDF JSON schema
> <https://github.com/usnistgov/CastVoteRecords>.

The JSON schema on git is for the wrong version of the CVR CDF. It took
a while for me to figure this out. DO NOT USE IT!

It might be useful to make a separate json schema for the mCDF.

----

Here are some comments I have while decoding the mCDF:

Since there is no json schema, I had to sort of invent some names that
didn't match.

The Codes is repeatable, meaning it's an array in JSON. But this makes
no sense to me within an mCDF that is tied to a ballot definition-- it
should be 1..1, the Code matches the corresponding ID assigned in the
ballot definition CDF (referenced via URI) for an election, contest,
selection, etc. I added a -c option to emit an array.

The use of repeatable ExternalIndentifier objects seems excessively
verbose. The attached mcdfx.json shows this (even worse with standard
indent).

I did not add an option to emit {"Value":"1GO",""Type":"local"} in lieu
of "1GO". I suggest not to use repeatable external identifiers in mCDF.

The ElectionScope is really the Code (ExternalIndentifier) of a GpUnit,
not a GpUnit, since in the CVR::GpUnit has a Code that matches an ID in
the definition (e.g. ElectionScopeID. The ElectionScope is a
BallotDefinition.ReportingUnit, that has the GpUnit::Code
(ExternalIdentifier), which I presume is the intent in the ELE. The
heading 4.13.1 has subcomponents of Code, but applies to the
CVR::Election::ElectionScope field 1 not Code field 2.

In all the mCDF examples, a single Code string is supplied with implied
type "local" (where local is meaningless).

It would be better to identify which ID the Code matches rather than
using some ambiguous coding of external identifiers.

A persistent ID (across files and across versions within some collection
of election admin data) is not precisely defined in most all CDF. It
seems that the JSON schema has a reserved "@id" and xml has the "id"
attribute, but has restricted content (letter/underscore, followed by
letters and digits) so might not be able to represent all IDs.

The ID in CDF files are scoped only to that file. But since the mCDF
references a URI, we can assume Code applies to the IDs in that file.

A persistent ID is a required feature for interpreting election data
files and is one of the flaws in CDF. It is possible to represent a
persistent ID using an ExternalIdentifier, but there is no semantics for
creating the ID used by an election admin EMS. The type "local" is
ambiguous. Type "Other" could be used, but this is extremely verbose,
usage arbitrary and inherently not standardized.

The whole model of ExternalIdentifier in VIP etc. is a poor choice. A
better approach is what all other domains use-- a prefix, e.g. f:06085
for fips, ds:0025 for Dominion Democracy Suite used by the election
admin. A section of the CDF would be needed to define the prefixes used
in all IDs. XML has IDREFS, a space separated list, which is a better
model than verbose array.

Note usually IDs used within an EMS are unique only to a Class, e.g.
ContestId, and for a Selection may only be unique within a contest, but
the xml id attribute must be unique within the file. It is not clear to
me the restrictions on JSON @id.


mcdf.txt
mcdfr.txt
mcdfr.json
mcdfrc.json
mcdfrx.json
parsemcdf.pl

John McCarthy

unread,
Jul 20, 2022, 10:29:42 PM7/20/22
to Carl Hage, Ray Lutz, cdf-ball...@list.nist.gov

Thanks VERY much for digging in on these important questions Ray and Carl.

John

--
John McCarthy Volunteer Advisor (he/him)
jo...@verifiedvoting.org 510.666.5309
verifiedvoting.org
verified voting logo

.

John Dziurlaj

unread,
Aug 2, 2022, 8:20:49 AM8/2/22
to cdf-ball...@list.nist.gov

See responses below.

 

John Dziurłaj

 

-----Original Message-----
From: cdf-ball...@list.nist.gov <cdf-ball...@list.nist.gov> On Behalf Of Carl Hage
Sent: Wednesday, July 20, 2022 6:53 PM
To: cdf-ball...@list.nist.gov
Subject: cdf-ballot-styles JSON version of mCDF

 

On 7/20/22 7:26 AM, 'John Dziurłaj' via cdf-ballot-styles wrote:

> There is not currently any example JSON files available.

 

I was looking into the mCDF, so write a quickie perl program to decode the proposed mCDF and emit json. I was unable to create an mCDF example until I purchased a new computer that could run the latest Adobe. I used the example PDF file to create an mCDF and mCDF encoded as a pair of QR codes. [I used perl since it's the easiest to write and this is not intended for production or wide reuse.]

 

It would be useful to post sample mCDF, not just posting hidden software within a PDF that only runs with new versions of a proprietary app.

 

JD: The CDF Prototype does not require any version of Acrobat greater than X and should work with DC as well. Code examples will be provided in the mCDF GitHub repository soon.

 

Attached is a sample mCDF I created (arbitrary choices). Since the example PDF did not handle RCV, I guessed at the format and manually entered it. There is a converted JSON and the perl script I used to read the mCDF.

 

JD: There is a version that does handle RCV, available here.

 

My quickie program omits some features like handling escape characters in the mCDF and encoding \ and " in JSON strings. Also mCDF split into multiple files (for QR encoding).

 

Also, I now realize Code is a repeatable ExternalIdentifier, but I only emit as a string (by default). I added options to allow a string array and an option for full ExternalIdentifiers.

 

I attached a separate file mcdfrc.json that has Code emitted as an array and also with full indent (other than code array). Probably, only the mcdfrx.json is (sort-of) compliant with the spec.

 

Attached files:

  - mcdf.txt (generated using the mCDFPrototype.pdf)

  - mcdfr.txt (manually extended with the RCV contest)

  - mcdfr.json (mcdfr.txt converted to simplified json)

  - mcdfrc.json (mcdfr.txt in indented JSON with repeatable codes)

  - mcdfrx.json (mcdfr.txt with Code as ExternalIdentifier array)

  - parsemcdf.pl - Script to process mCDF

 

> However, the mapping of the mCDF CSC message to the CVR UML Model is

> given in Appendix D of the draft mCDF format (refer to the CDF mapping

> columns). I am reattaching the draft here.

 

Note: In my program I assumed the crossed out items are deleted, e.g.

there is no CVR::CVRContestSelection::OptionPosition. Thus my RCV addition doesn't match the example in the .docx (with extra ^).

 

> The UML can then map forward to JSON using the CVR CDF JSON schema

> <https://github.com/usnistgov/CastVoteRecords>.

 

The JSON schema on git is for the wrong version of the CVR CDF. It took a while for me to figure this out. DO NOT USE IT!

 

It might be useful to make a separate json schema for the mCDF.

 

----

 

Here are some comments I have while decoding the mCDF:

 

Since there is no json schema, I had to sort of invent some names that didn't match.

 

JD: There are no plans to create a separate JSON Schema for the mCDF Profile. The purpose of the mCDF Profiles is to map a fragment of content to a larger CDF instance. So a mCDF CSC message is expected to be map to and become part of a larger CVR instance (as currently defined in SP 1500-103). I will work on some documentation to further clarify this.

 

The Codes is repeatable, meaning it's an array in JSON. But this makes no sense to me within an mCDF that is tied to a ballot definition-- it should be 1..1, the Code matches the corresponding ID assigned in the ballot definition CDF (referenced via URI) for an election, contest, selection, etc. I added a -c option to emit an array.

 

JD: The codes are meant to be flexible in case systems at different levels need to interoperate, i.e. there could be a local level and state level coding scheme. I do not think this will be used in most cases in the mCDF, but since we’re mapping existing constructs of the CDFs, we must honor their structure.

 

The use of repeatable ExternalIndentifier objects seems excessively verbose. The attached mcdfx.json shows this (even worse with standard indent).

 

I did not add an option to emit {"Value":"1GO",""Type":"local"} in lieu of "1GO". I suggest not to use repeatable external identifiers in mCDF.

 

The ElectionScope is really the Code (ExternalIndentifier) of a GpUnit, not a  GpUnit, since in the CVR::GpUnit has a Code that matches an ID in the definition (e.g. ElectionScopeID. The ElectionScope is a BallotDefinition.ReportingUnit, that has the GpUnit::Code (ExternalIdentifier), which I presume is the intent in the ELE. The heading 4.13.1 has subcomponents of Code, but applies to the CVR::Election::ElectionScope field 1 not Code field 2.

 

In all the mCDF examples, a single Code string is supplied with implied type "local" (where local is meaningless).

 

It would be better to identify which ID the Code matches rather than using some ambiguous coding of external identifiers.

 

A persistent ID (across files and across versions within some collection of election admin data) is not precisely defined in most all CDF. It seems that the JSON schema has a reserved "@id" and xml has the "id"

attribute, but has restricted content (letter/underscore, followed by letters and digits) so might not be able to represent all IDs.

 

JD: XML has an ObjectId attribute. They are not meant to represent durable data points (but exist to wire the file together). Thus, ExternalIdentifiers are used.

 

The ID in CDF files are scoped only to that file. But since the mCDF references a URI, we can assume Code applies to the IDs in that file.

 

A persistent ID is a required feature for interpreting election data files and is one of the flaws in CDF. It is possible to represent a persistent ID using an ExternalIdentifier, but there is no semantics for creating the ID used by an election admin EMS. The type "local" is ambiguous. Type "Other" could be used, but this is extremely verbose, usage arbitrary and inherently not standardized.

 

JD: I will investigate this in more detail.

 

The whole model of ExternalIdentifier in VIP etc. is a poor choice. A better approach is what all other domains use-- a prefix, e.g. f:06085 for fips, ds:0025 for Dominion Democracy Suite used by the election admin. A section of the CDF would be needed to define the prefixes used in all IDs. XML has IDREFS, a space separated list, which is a better model than verbose array.

 

JD: We are building the CDFs so that they work together well. A change like this would require modifications to all CDFs, which is not feasible at this point. If you’d like this to be considered for future “generations” of CDFs, please open it on the Voting GitHub repo.

 

Note usually IDs used within an EMS are unique only to a Class, e.g.

ContestId, and for a Selection may only be unique within a contest, but the xml id attribute must be unique within the file. It is not clear to me the restrictions on JSON @id.

 

JD: They should be the same, as there may be a need to map from JSON to XML.

Carl Hage

unread,
Aug 2, 2022, 7:15:36 PM8/2/22
to cdf-ball...@list.nist.gov
On 8/2/22 5:20 AM, 'John Dziurlaj' via cdf-ballot-styles wrote:
> See responses below.

Thanks for your comments....

> JD: There are no plans to create a separate JSON Schema for the mCDF
> Profile. The purpose of the mCDF Profiles is to map a fragment of
> content to a larger CDF instance. So a mCDF CSC message is expected to
> be map to and become part of a larger CVR instance (as currently defined
> in SP 1500-103). I will work on some documentation to further clarify this.

I made it into JSON in response to Ray Lutz, and also as a way to view
the decoded mCDF. Yes, I understand this could be mapped to a full CVR.

> The Codes is repeatable, meaning it's an array in JSON. But this makes
> no sense to me within an mCDF that is tied to a ballot definition-- ...
...
> A persistent ID is a required feature for interpreting election data
> files and is one of the flaws in CDF. It is possible to represent a
> persistent ID using an ExternalIdentifier, but there is no semantics for
> creating the ID used by an election admin EMS. The type "local" is
> ambiguous. Type "Other" could be used, but this is extremely verbose,
> usage arbitrary and inherently not standardized.
>
> JD: I will investigate this in more detail.

The mCDF needs to be interpreted in conjunction with a separate full
declaration of the ballot. That related definition CDF can have multiple
external identifiers, but there needs to be a defined way to link the
mCDF Code to the separate full ballot CDF. I suppose it could
alternatively be a partial CVR CDF, but I was assuming it's a ballot
definition CDF.

It might be reasonable to use the "local-level" external identifier type
as the ID matched between ballot or CVR CDF and the mCDF, but then that
association must be explicitly defined. Also, there should to be a
requirement that this ID is persistent across an election cycle.

The point is that an mCDF QR code should be readable by external
software if supplied with a ballot definition mCDF. There needs to be a
defined required association of IDs. If you have a defined required
association, then one is sufficient, since the associated ballot
definition file can contain the remaining external Identifiers.

Carl Hage

unread,
Aug 2, 2022, 10:57:37 PM8/2/22
to cdf-ball...@list.nist.gov
On 7/8/22 5:54 AM, 'John Dziurlaj' via cdf-ballot-styles wrote:
> I wanted to inform you all that several resources have been placed onto
> the NIST GitHub repository for the Ballot Definition CDF.
> BallotDefinition/develop
> <https://github.com/usnistgov/BallotDefinition/tree/develop> holds the
> work in progress for the NIST Ballot Definition CDF and contains the
> machine readable schemas (XSD, JSON Schema) and human readable
> documentation
> <https://github.com/usnistgov/BallotDefinition/blob/develop/BallotDefinition_UML_Model_Documentation.md>
> of the UML Model.

John, Thanks for creating the BallotDefinition github repository. It's a
great start!

Here are a few quick comments:

It would be useful to indicate the area(s) on a page for a header,
contest, and contest selection. The BoundedObject seems useful for that
but doesn't seem to be represented. A Boundary could be added for a
PhysicalContestOption, perhaps this is the intent.

Is Geometry supposed to be a BoundedObject? E.g. for an OptionPosition?

For a contest, we might have a single rectangle enclosing the contest
title, subtitle, etc. and all options. But so,etimes there are too many
candidates and the selections need to be split across multiple columns
(or sides). Each split might have a separate rectangular area with
continuation subheader. A set of BoundedObjects could be given to
specify the contest areas, or alternatively we could invent a
ContestOptionGroup object that is a BoundedObject to represent a
collection of Options within a rectangular area, and might have a
continuation heading. If there are multiple areas given for a contest,
the options contained would need to be determined by geometric intersection.

In some cases the contest or contest selection areas have a border, so
we could use a geometry with border width to represent the bounding area.

We need the areas occupied for a contest so scanners can associate extra
markings in the contest or contest selection areas (outside the marked
option positions). When adjudicating with scanned image presentations,
we need to be able to cut the applicable areas.

Besides the shape for an option position, it would be useful to have an
enum that defines the marking style, e.g. filled area, a horizontal line
(e.g. connecting arrow-style fiduccials), or perhaps an X (pair of
diagonal lines).

---

If there is a printable (text representation) of the contest selections
vs the full face ballot, it would be useful to be able to define the
locations for the selections made. In this representation we have a
contest title, but then only list the selection(s) made, not all
options. To be able to scan the printable text, we need to identify the
locations on the printout for the contest title and an area for each
selection allowed. For a contest option, we could define an area on a
master ballot with the response text to be inserted (source area) into a
response position (one or more target areas).

An OCR or QR code verifier could match option selections by a bitmap
comparison with a reference as long as the reference and target
positions are defined.

A ballot scanner could perform a bitmap comparison with the imageURL on
a full face ballot and identify extraneous marks.

---

Eventually we need to be able to define fiducial marks that code for
ballot style, precinct, sheet/side, etc. There could be bar codes
inserted with some referenced set of standards, e.g. the UPC and ISBN
style bar codes. A common option may be a set of locations with a mark
(rectangle) present or not present that represents a binary code
associated with a definition.

John Dziurlaj

unread,
Aug 10, 2022, 1:00:41 PM8/10/22
to Carl Hage, cdf-ball...@list.nist.gov

Good Afternoon Carl,

 

Please see the responses below.

 

-----Original Message-----
From: cdf-ball...@list.nist.gov <cdf-ball...@list.nist.gov> On Behalf Of Carl Hage

Sent: Tuesday, August 2, 2022 10:58 PM
To: cdf-ball...@list.nist.gov
Subject: Re: cdf-ballot-styles Draft Ballot Definition CDF resources

 

On 7/8/22 5:54 AM, 'John Dziurlaj' via cdf-ballot-styles wrote:

> I wanted to inform you all that several resources have been placed

> onto the NIST GitHub repository for the Ballot Definition CDF.

> BallotDefinition/develop

> <https://github.com/usnistgov/BallotDefinition/tree/develop> holds the

> work in progress for the NIST Ballot Definition CDF and contains the

> machine readable schemas (XSD, JSON Schema) and human readable

> documentation

> <https://github.com/usnistgov/BallotDefinition/blob/develop/BallotDefi

> nition_UML_Model_Documentation.md>

> of the UML Model.

 

John, Thanks for creating the BallotDefinition github repository. It's a great start!

 

Here are a few quick comments:

 

It would be useful to indicate the area(s) on a page for a header, contest, and contest selection. The BoundedObject seems useful for that but doesn't seem to be represented. A Boundary could be added for a PhysicalContestOption, perhaps this is the intent.

 

Is Geometry supposed to be a BoundedObject? E.g. for an OptionPosition?

 

JD: The current design is that a Geometry provides additional details about an area identified by a BoundedObject (e.g. OptionPosition, FiducialMark). It is referenced by those objects.

 

For a contest, we might have a single rectangle enclosing the contest title, subtitle, etc. and all options. But so,etimes there are too many candidates and the selections need to be split across multiple columns (or sides). Each split might have a separate rectangular area with continuation subheader. A set of BoundedObjects could be given to specify the contest areas, or alternatively we could invent a ContestOptionGroup object that is a BoundedObject to represent a collection of Options within a rectangular area, and might have a continuation heading. If there are multiple areas given for a contest, the options contained would need to be determined by geometric intersection.

 

In some cases the contest or contest selection areas have a border, so we could use a geometry with border width to represent the bounding area.

 

We need the areas occupied for a contest so scanners can associate extra markings in the contest or contest selection areas (outside the marked option positions). When adjudicating with scanned image presentations, we need to be able to cut the applicable areas.

 

JD: We can add this as an optional property (0..*) from PhysicalContest to BoundedObject.

 

Besides the shape for an option position, it would be useful to have an enum that defines the marking style, e.g. filled area, a horizontal line (e.g. connecting arrow-style fiduccials), or perhaps an X (pair of diagonal lines).

 

JD: How would scanners use such information?

 

---

 

If there is a printable (text representation) of the contest selections vs the full face ballot, it would be useful to be able to define the locations for the selections made. In this representation we have a contest title, but then only list the selection(s) made, not all options. To be able to scan the printable text, we need to identify the locations on the printout for the contest title and an area for each selection allowed. For a contest option, we could define an area on a master ballot with the response text to be inserted (source area) into a response position (one or more target areas).

 

JD: Are you suggesting a ballot definition be created for the text appearing on ballot summary cards?

 

An OCR or QR code verifier could match option selections by a bitmap comparison with a reference as long as the reference and target positions are defined.

 

A ballot scanner could perform a bitmap comparison with the imageURL on a full face ballot and identify extraneous marks.

 

---

 

Eventually we need to be able to define fiducial marks that code for ballot style, precinct, sheet/side, etc. There could be bar codes inserted with some referenced set of standards, e.g. the UPC and ISBN style bar codes. A common option may be a set of locations with a mark

(rectangle) present or not present that represents a binary code associated with a definition.

 

JD: We will use mCDF for ballot style identification in all cases (including OMR), the difference is which segments get output. An OMR ballot will not need the mCDF segment for representation of contest selections, for example.

Carl Hage

unread,
Aug 10, 2022, 5:44:59 PM8/10/22
to cdf-ball...@list.nist.gov
On 8/10/22 10:00 AM, 'John Dziurlaj' via cdf-ballot-styles wrote:
> -----Original Message-----
> From: cdf-ball...@list.nist.gov <cdf-ball...@list.nist.gov>
> On Behalf Of Carl Hage
> Sent: Tuesday, August 2, 2022 10:58 PM
...
> We need the areas occupied for a contest so scanners can associate extra
> markings in the contest or contest selection areas ...
>
> JD: We can add this as an optional property (0..*) from PhysicalContest
> to BoundedObject.

Sounds good to me.

> Besides the shape for an option position, it would be useful to have an
> enum that defines the marking style, e.g. filled area, a horizontal line
> (e.g. connecting arrow-style fiduccials), or perhaps an X (pair of
> diagonal lines).
>
> JD: How would scanners use such information?

I think I've only seen 2 styles-- fill in circle/oval and the arrow
fiducials connected by a line. Instructions say to complete the arrow
with a _single_ line, e.g. with ball point pen. The image shows a single
width line connecting the left and right.

With the oval, the scanner can measure the fill percentage (e.g. the
Dominion scanner CVR has the mark percent). With the line, the fill
percent doesn't matter but if we can check each horizontal pixel column
for a minimum width line somewhere in the vertical range of the mark
area. Percentage might be a percentage of the horizontal width that has
an intersecting line.

An intelligent scanner might detect an X or circle and flag as ambiguous
mark. Also improper cross-out.

> JD: Are you suggesting a ballot definition be created for the text
> appearing on ballot summary cards?

Yes. (Somehow.) Since the voter verified ballot is on the text not QR
code, it would be useful for the scanner to recognize or validate the text.

One way might be to define the area (in points or whatever) within a
reference PDF or other document converted to a bitmap image (print to
PNG or TIFF). For a full-face ballot, the location on the reference is
the same as the aligned scan. For a ballot summary, the target location
would normally hold one selection per vote. (A ballot measure might have
the Yes/No to the right of the contest title, elected office, candidate
name would be below.)

The target location would be supplied one per allowed vote on a contest.
Each selection would need to have a source location in a
template/reference document, could be a full face ballot, but maybe the
summary is a different font.

I think scanners can use bitmap comparisons rather than general OCR with
font definitions etc. since the possible selections are limited, and the
source is a bitmap. There are various algorithms, but generally you
would verify edges match within a few pixels.

> JD: We will use mCDF for ballot style identification in all cases
> (including OMR), the difference is which segments get output. An OMR
> ballot will not need the mCDF segment for representation of contest
> selections, for example.

Normally the precinct and ballot style included in the preprinted full
face ballot (mail ballot). There is human readable text, but scanners
usually use a code in the fiducials (aka timing marks, but they aren't
the old style timing marks, just alignment marks).

If we want to write generic ballot scanning/auditing, then the coding of
the marks should be defined. I think all the cases I have seen
(personally) a mark (rectangle) at a location is present or not present
to represent a code. Maybe some ballots have line width or 2D bar codes.

I presume the old-fashioned timing-mark scanners could use marks
adjacent to a timing mark to read a code, so maybe that is the
legacy/origin.

One possible implementation would be to assign a code bit (e.g. integer)
to a fiducial mark. Then a property of ballot style or precinct could
include a code value. Common across all ballot styles would be a bit
range defined associated with the mark type, e.g. enum for ballot-style,
precint, sheet, party, check-digit, etc.

A 1D or 2D bar code might reference an external standard, but many codes
map decimal digits, so the common definition of the code would contain
the character or digit range associated with a property such as precinct.

Carl Hage

unread,
Aug 10, 2022, 6:06:09 PM8/10/22
to cdf-ball...@list.nist.gov
On 8/10/22 2:44 PM, Carl Hage wrote:
> Yes. (Somehow.) Since the voter verified ballot is on the text not QR
> code, it would be useful for the scanner to recognize or validate the text.

I forgot to mention that if the scanner does not read the text, then a
human-read RLA could validate the codes. But it seems that it should not
be hard to add definitions that enable the scanner to validate the text.

Some people criticize bar codes on ballots because they aren't
voter-verified, and when used to count ballots the counting is not using
the voter-verified text. Enabling the scanner to validate the text
addresses this criticism.

If the sample bitmap for text on a ballot or ballot-summary is defined,
then the scanner can also detect extraneous markings, possibly requiring
attention. Hopefully the alignment/fiducial marks alone can detect a
ripped or folded ballot, but bitmap mismatches might also identify
coffee spills, etc.

Duncan Buell

unread,
Aug 10, 2022, 10:34:34 PM8/10/22
to Carl Hage, cdf-ball...@list.nist.gov
Enabling the scanners does not address the issue of the voter being able to “verify”.  As long as the barcodes are used to tally, there is no semantic understanding of the word “verify” that says that that’s what voters can do.  That’s just a basic reality.

If the barcodes are used to tally and produce the “official” results, then the barcodes are by definition the “official” data, and the text is irrelevant and voters are being lied to if they are told it matters that they read the text.

If the barcodes are used to tally, but the text that voters can read is the “official” ballot, then the results are official results from unofficial data, and that should be viewed as a serious problem.

I have recently been pointed to the federal law that says that the states/jurisdications are required to state what is the “official” vote.  Apparently most states using BMDs have chosen not to do this.  (One wonders whether the above argument is because they have chosen to try to avoid complying with federal law.)

This is a problem, and it isn’t going away.

Duncan Buell (he/his)
dunca...@gmail.com
(All emails eventually go to the same place)

(For informational purposes)
Chair Emeritus — NCR Chair in Computer Science and Engineering
Dept. of Computer Science and E
University of South Carolina
Columbia SC 29208




Herb Deutsch

unread,
Aug 15, 2022, 2:34:28 AM8/15/22
to Carl Hage, cdf-ball...@list.nist.gov
Trying to define the marking style is problematic.  Each vendor has different rules for detection of a valid mark, no mark or a marginal mark.  The ES&S equipment even uses a patented pattern recognition approach to be able to detect valid marks that do not satisfy threshold requirements.  

Carl Hage

unread,
Aug 15, 2022, 6:58:56 PM8/15/22
to cdf-ball...@list.nist.gov
On 8/14/22 11:34 PM, Herb Deutsch wrote:
> Trying to define the marking style is problematic.  Each vendor has
> different rules for detection of a valid mark, no mark or a marginal
> mark.  The ES&S equipment even uses a patented pattern recognition
> approach to be able to detect valid marks that do not satisfy threshold
> requirements.

Note the definition of mark style is separate from the algorithm used to
identify "mark, no mark, ambiguous mark", and rules for detection. There
are various patents for dealing with ambiguous marks I've heard of, but
that is also independent of the mark style.

So far, I have only heard of 2 styles in use:

1) fill in the circle/oval (or possibly rectangle)
2) draw a line inside the mark area connecting the left and right sides
of the mark area. For this, "line" means a single width pen stroke.

The second style might create ambiguous marks for a scanner algorithm
that assumed the first style. A vendor-supplied scanner would likely
implement only one of these styles, the one used by that vendor.

Generic scan software (e.g. used by an audit or external verification)
would need to be configured to select style 1 or 2. I suppose this could
be done manually per election rather than be specified in the ballot
definition.

I don't know if there are other mark styles in use, e.g. X in a box.

John Dziurlaj

unread,
Aug 16, 2022, 7:33:31 AM8/16/22
to Carl Hage, cdf-ball...@list.nist.gov
Do you know of any currently sold machines that use the "complete the arrow" style ballots, or if this marking method is required in state statue? My understanding is this was more of a technical nuance (of BRC era machines) than a desired marking method.

I would imagine the specification / configuration of the mark detection algorithms would lie with the scanner, not the ballot definition.

John.

-----Original Message-----
From: cdf-ball...@list.nist.gov <cdf-ball...@list.nist.gov> On Behalf Of Carl Hage
Sent: Monday, August 15, 2022 6:59 PM
To: cdf-ball...@list.nist.gov

Edwin Smith

unread,
Aug 16, 2022, 10:45:38 AM8/16/22
to John Dziurlaj, Carl Hage, cdf-ball...@list.nist.gov

Carl Hage

unread,
Aug 16, 2022, 10:33:55 PM8/16/22
to cdf-ball...@list.nist.gov
On 8/16/22 4:33 AM, 'John Dziurlaj' via cdf-ballot-styles wrote:
> Do you know of any currently sold machines that use the "complete the arrow" style ballots, or if this marking method is required in state statue? My understanding is this was more of a technical nuance (of BRC era machines) than a desired marking method.

My county, Santa Clara CA, used to have that style a couple of years
ago, but they have since changed vendors and now it uses Dominion
Imagecast with oval fills.

I think the prior machines were Sequoia (acquired by Dominion).

I looked at the vendors for California counties and don't see a county
that seems to use the arrows this November.

Arrow style or oval style is not mandated-- the state mandate is on
approved vendors.

Attached is an example blank ballot (for my city Sunnyvale). Note the
instructions at the top that says in bold "using one thin line."

I don't know how they code the ballot style and precinct, but I think
this is not included on the attached sample ballot PDF. I think the
fiducials in the margin are missing.

I am attaching a San Francisco ballot that has some of the fiducials-- I
think it's the same vendor as Santa Clara, now replaced with newer
Dominion in 2019 and later. RCV was a pain with the arrow style (only 3
rankings allowed), but now (2020+) with ovals there is a grid and more
rankings.

With the demise of Sequoia, maybe there aren't any arrow-style marks
left. The old scanners were custom machines, the new Dominion Imagecast
uses COTS scanners and software.
scl-159b.pdf
BT_01_S.pdf

John Dziurlaj

unread,
Aug 17, 2022, 10:12:21 AM8/17/22
to Carl Hage, cdf-ball...@list.nist.gov
Hi Carl,

On these kinds of ballots the sides of the arrows doubled as the timing marks to find the contest option. I added local fiducials at the contest level, but these exist at the contest option level. If desired, I can add fiducials there as well.

John.

-----Original Message-----
From: cdf-ball...@list.nist.gov <cdf-ball...@list.nist.gov> On Behalf Of Carl Hage
Sent: Tuesday, August 16, 2022 10:34 PM
To: cdf-ball...@list.nist.gov

Carl Hage

unread,
Aug 17, 2022, 7:00:34 PM8/17/22
to cdf-ball...@list.nist.gov
On 8/17/22 7:12 AM, 'John Dziurlaj' via cdf-ballot-styles wrote:
> On these kinds of ballots the sides of the arrows doubled as the timing marks to find the contest option. I added local fiducials at the contest level, but these exist at the contest option level. If desired, I can add fiducials there as well.

Yes, I think the arrow ends are just fiducials, so your schema covers
it. Though they are at the contest selection level, I don't think it's
necessary to store them there vs just on the page at an (x,y,w,h). If we
have an (x,y,w,h) for a contest selection and one or more (x,y,w,h) for
a contest, then we could easily enough filter the fiducials or parts by
(x,y).

If I were writing scan software I would make a loop over the Y range,
then X. When we have a fiducial, locate the edges and use it to make a
delta-x, delta-y adjustment (actually a line dy=ax+b). Then when the
scan reaches a mark area, use the adjusted coordinates. A border line
can also function as a fiducial, horizontal for Y and vertical or end for X.

Fiducials with a COTS page scanner aren't used to trigger a mark as in
old scanners using a timing mark, rather they compensate for page skew
and feed slip.

With possible extraneous marks, the fiducial edges might be obscured, so
the matching algorithm needs some clever logic.

Mark areas on the opposite side of the page can bleed through,
especially with a sharpie, so the scanner needs to compute bleed-though
locations and exclude them as extraneous marks.


Reply all
Reply to author
Forward
0 new messages