Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Check Digit Algorithm

785 views
Skip to first unread message

Randy Gress

unread,
May 9, 2017, 11:24:30 AM5/9/17
to
I have an AlphaNumeric Scanline that currently we use a MOD 10 check digit on. We recently switched companies that scans and verify these Scanlines and because of some issues with there scanning equipment we discovered that our check digit routine is not very robust (in certain cases 5 and S in the same position would evaluate to the same check digit).

So I have been asked to research better check digit algorithms for our AlphaNumeric Scanline. My first thought is to use a two numeric Check Digit using MOD 100. But in all my searches I can't find any references to doing it this way. I have found some references to using a MOD 97 check digit because it is a Prime number. But I can't find anything that states why that is good.

I also found references to MOD25, MOD30 and MOD36 check digit routines that utilize just one character with both Alpha and Numerics used in the check digit.

So does anyone know of a good 2 numeric Check Digit algorithm for an AlphaNumeric Scanline?

Thanks
Randy
Message has been deleted

robin....@gmail.com

unread,
May 9, 2017, 12:06:51 PM5/9/17
to
1. What encoding systemn are you using for alphabetic characters?
2. Have you checked out wiki?
3. check out Verhoeff and Luhn algorithms. Might be some modification
for alphabetic data that will work.

robin....@gmail.com

unread,
May 9, 2017, 12:14:49 PM5/9/17
to
On Wednesday, May 10, 2017 at 1:24:30 AM UTC+10, Randy Gress wrote:
> So does anyone know of a good 2 numeric Check Digit algorithm for an AlphaNumeric Scanline?

You will find a PL/I implementation of the Luhn algorithm in Rosetta Code.

Randy Gress

unread,
May 9, 2017, 1:04:53 PM5/9/17
to
On Tuesday, May 9, 2017 at 11:06:51 AM UTC-5, robin....@gmail.com wrote:
> On Wednesday, May 10, 2017 at 1:24:30 AM UTC+10, Randy Gress wrote:
> > I have an AlphaNumeric Scanline that currently we use a MOD 10 check digit on. We recently switched companies that scans and verify these Scanlines and because of some issues with there scanning equipment we discovered that our check digit routine is not very robust (in certain cases 5 and S in the same position would evaluate to the same check digit).
> >
> > So I have been asked to research better check digit algorithms for our AlphaNumeric Scanline. My first thought is to use a two numeric Check Digit using MOD 100. But in all my searches I can't find any references to doing it this way. I have found some references to using a MOD 97 check digit because it is a Prime number. But I can't find anything that states why that is good.
> >
> > I also found references to MOD25, MOD30 and MOD36 check digit routines that utilize just one character with both Alpha and Numerics used in the check digit.
> >
> > So does anyone know of a good 2 numeric Check Digit algorithm for an AlphaNumeric Scanline?
>
> 1. What encoding systemn are you using for alphabetic characters?

A = 10 through Z = 35.

> 2. Have you checked out wiki?
Yes, What I found was sufficiently vague enough concerning AlphaNumerics that's why I came here.

> 3. check out Verhoeff and Luhn algorithms. Might be some modification
> for alphabetic data that will work.

Currently our existing Check Digit is based on the Luhn algorithm with modifications which has issues.


I was hoping if someone has a better encoding system for the alphas and also if there is better weighting than the 12 12 12 system for Luhn. I have found references to 5327 5327 5327 and 371 371 371 weighting but don't know if they are more reliable than the 12 12 12 weighting.

I was also wanted to know if anyone has any info on MOD(97) vs MOD(100) two character check digits.


David W Noon

unread,
May 9, 2017, 1:09:57 PM5/9/17
to
On Tue, 9 May 2017 08:24:29 -0700 (PDT), in message
<a6126129-9f36-4069...@googlegroups.com>, Randy Gress
(randal...@gmail.com) wrote:

> I have an AlphaNumeric Scanline that currently we use a MOD 10 check digit on.

Modulo 10 is a poor choice. You get much better cycle length if your
congruence class is of prime order.

> We recently switched companies that scans and verify these Scanlines and
> because of some issues with there scanning equipment we discovered that
> our check digit routine is not very robust (in certain cases 5 and S in
> the same position would evaluate to the same check digit).
>
> So I have been asked to research better check digit algorithms for our
> AlphaNumeric Scanline. My first thought is to use a two numeric Check
> Digit using MOD 100. But in all my searches I can't find any references
> to doing it this way. I have found some references to using a MOD 97
> check digit because it is a Prime number. But I can't find anything
> that states why that is good.

This comes from Group Theory. A cyclic group -- typically a congruence
class -- is guaranteed to have a maximum cycle length if the group is of
prime cardinality. This property is used in pseudo-random number
generation, to maximise cycle length in the stream of PRNs.

> I also found references to MOD25, MOD30 and MOD36 check digit routines
> that utilize just one character with both Alpha and Numerics used in the
> check digit.
>
> So does anyone know of a good 2 numeric Check Digit algorithm for an
> AlphaNumeric Scanline?

Here are a couple of functions I have just knocked up to generate a
single alphanumeric check digit, modulo 11, or a 2-digit check digit,
modulo 67.

=========================================================================
/* Function to generate a Modulo 11 checksum. */
(NOSIZE,NOFOFL):
MOD11:
PROC(str) RETURNS(CHAR(1)) REORDER;
DCL str CHAR(*) NONASGN INONLY;

DCL (p,check_sum) BIN FIXED(32,0) UNSIGNED INIT(0),
check_digits CHAR(11) STATIC NONASGN INIT('0123456789X'),
(LENGTH,MOD,RANK,SUBSTR) BUILTIN;

DO p = 1 UPTHRU LENGTH(str);
check_sum += RANK(SUBSTR(str,p,1));
END;

RETURN(SUBSTR(check_digits,MOD(check_sum,11)+1,1));
END MOD11;

/* Function to generate a Modulo 67 checksum. */
(NOSIZE,NOFOFL):
MOD67:
PROC(str) RETURNS(CHAR(2)) REORDER;
DCL str CHAR(*) NONASGN INONLY;

DCL (p,check_sum) BIN FIXED(32,0) UNSIGNED INIT(0),
check_digits PIC '99',
(LENGTH,MOD,RANK,SUBSTR) BUILTIN;

DO p = 1 UPTHRU LENGTH(str);
check_sum += RANK(SUBSTR(str,p,1));
END;

check_digits = MOD(check_sum,67);
RETURN(check_digits);
END MOD67;
=========================================================================

Note that these are untested, but should get you on your way to an
improved solution.

Note also that I am assuming you are using IBM Enterprise PL/I. If you
are using an older compiler you might need to adjust the syntax to cope.
--
Regards,

Dave [RLU #314465]
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
dwn...@spamtrap.ntlworld.com (David W Noon)
Remove spam trap to reply by e-mail.
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*



robin....@gmail.com

unread,
May 10, 2017, 1:44:43 AM5/10/17
to
On Wednesday, May 10, 2017 at 3:09:57 AM UTC+10, David W Noon wrote:
> On Tue, 9 May 2017 08:24:29 -0700 (PDT), in message
> <a6126129-9f36-4069...@googlegroups.com>, Randy Gress
Dave,
Algorithms for such checking give different weights to characters in
various positions, in order to detect transpositions of characters, etc.
The Luhn, Verhoeff, and others do precisely this.

David W Noon

unread,
May 10, 2017, 9:05:38 AM5/10/17
to
On Tue, 9 May 2017 22:44:42 -0700 (PDT), in message
<c2754d37-1f62-4913...@googlegroups.com>, Robin Vowels
(robin....@gmail.com) wrote:

[snip]
> Algorithms for such checking give different weights to characters in
> various positions, in order to detect transpositions of characters, etc.
> The Luhn, Verhoeff, and others do precisely this.

For custom encodings, including weighting, simply replace the call to
RANK() with a call to a custom function that applies weights and/or
checks for restrictions in character set (e.g. ensure every character is
alphanumeric or even numeric).

The code I posted will checksum arbitrary strings using the old Modulo
11 (or any other low prime) technique from the 1960s. But the RANK()
calls are easily replaced to exploit more modern algorithms.

John W Kennedy

unread,
May 10, 2017, 12:57:27 PM5/10/17
to
On 5/10/17 9:05 AM, David W Noon wrote:
> On Tue, 9 May 2017 22:44:42 -0700 (PDT), in message
> <c2754d37-1f62-4913...@googlegroups.com>, Robin Vowels
> (robin....@gmail.com) wrote:
>
> [snip]
>> Algorithms for such checking give different weights to characters in
>> various positions, in order to detect transpositions of characters, etc.
>> The Luhn, Verhoeff, and others do precisely this.
>
> For custom encodings, including weighting, simply replace the call to
> RANK() with a call to a custom function that applies weights and/or
> checks for restrictions in character set (e.g. ensure every character is
> alphanumeric or even numeric).
>
> The code I posted will checksum arbitrary strings using the old Modulo
> 11 (or any other low prime) technique from the 1960s.

Not really. Even the built-in check-digit features on IBM keypunches of
the era applied different weights to catch transpositions, in either
modulus 10 or modulus 11 (the modulus-11 feature excluded all numbers
with a check "digit" of 10, rather than the modern practice of
substituting the letter X).

> But the RANK()
> calls are easily replaced to exploit more modern algorithms.

Well, yes, but it’s rather messy. Indeed, even with a trivial,
unweighted algorithm, you should really have RANK(x)-RANK('0') to get a
portable result.

Also, many weighting algorithms are more easily handled if you do the
calculation from right to left.

--
John W. Kennedy
"The blind rulers of Logres
Nourished the land on a fallacy of rational virtue."
-- Charles Williams. "Taliessin through Logres: Prelude"

David W Noon

unread,
May 10, 2017, 1:30:02 PM5/10/17
to
On Wed, 10 May 2017 12:57:00 -0400, in message
<NfednUIZGbrv247E...@giganews.com>, John W Kennedy
(john.w....@gmail.com) wrote:
> On 5/10/17 9:05 AM, David W Noon wrote:
[snip]
>> The code I posted will checksum arbitrary strings using the old Modulo
>> 11 (or any other low prime) technique from the 1960s.
>
> Not really. Even the built-in check-digit features on IBM keypunches of
> the era applied different weights to catch transpositions, in either
> modulus 10 or modulus 11 (the modulus-11 feature excluded all numbers
> with a check "digit" of 10, rather than the modern practice of
> substituting the letter X).

That would be an early version of the Luhn algorithm, I guess. After
all, Luhn was an IBM researcher.

>> But the RANK() calls are easily replaced to exploit more modern
>> algorithms.
>
> Well, yes, but it’s rather messy. Indeed, even with a trivial,
> unweighted algorithm, you should really have RANK(x)-RANK('0') to get a
> portable result.

Subtracting RANK('0') would be fine for all-numeric strings, but when
you allow alphabetics it will give negative values on an EBCDIC platform.

> Also, many weighting algorithms are more easily handled if you do the
> calculation from right to left.

Some of the modern algorithms separate odd/even character positions, so
one needs to do 2 passes, each of half length. Checksumming is a
non-trivial problem and the algorithms are many and varied.

The most interesting algorithm I've seen is the Damm algorithm:

<https://en.wikipedia.org/wiki/Damm_algorithm>

robin....@gmail.com

unread,
May 10, 2017, 10:36:46 PM5/10/17
to
On Thursday, May 11, 2017 at 3:30:02 AM UTC+10, David W Noon wrote:
> On Wed, 10 May 2017 12:57:00 -0400, in message
> <NfednUIZGbrv247E...@giganews.com>, John W Kennedy
> (john.....@gmail.com) wrote:
> > On 5/10/17 9:05 AM, David W Noon wrote:
> [snip]
> >> The code I posted will checksum arbitrary strings using the old Modulo
> >> 11 (or any other low prime) technique from the 1960s.
> >
> > Not really. Even the built-in check-digit features on IBM keypunches of
> > the era applied different weights to catch transpositions, in either
> > modulus 10 or modulus 11 (the modulus-11 feature excluded all numbers
> > with a check "digit" of 10, rather than the modern practice of
> > substituting the letter X).
>
> That would be an early version of the Luhn algorithm, I guess. After
> all, Luhn was an IBM researcher.
>
> >> But the RANK() calls are easily replaced to exploit more modern
> >> algorithms.
> >
> > Well, yes, but it’s rather messy. Indeed, even with a trivial,
> > unweighted algorithm, you should really have RANK(x)-RANK('0') to get a
> > portable result.
>
> Subtracting RANK('0') would be fine for all-numeric strings, but when
> you allow alphabetics it will give negative values on an EBCDIC platform.

Transforming the 36 alphanumeric characters to 1-36 is trivially done
in PL/I if speed is not an issue; but for speed, some lengthier code
is needed.

robin....@gmail.com

unread,
May 10, 2017, 10:39:52 PM5/10/17
to
On Thursday, May 11, 2017 at 2:57:27 AM UTC+10, John W. Kennedy wrote:
> On 5/10/17 9:05 AM, David W Noon wrote:
> > On Tue, 9 May 2017 22:44:42 -0700 (PDT), in message
> > <c2754d37-1f62-4913...@googlegroups.com>, Robin Vowels
> > (r....@gmail.com) wrote:
> >
> > [snip]
> >> Algorithms for such checking give different weights to characters in
> >> various positions, in order to detect transpositions of characters, etc.
> >> The Luhn, Verhoeff, and others do precisely this.
> >
> > For custom encodings, including weighting, simply replace the call to
> > RANK() with a call to a custom function that applies weights and/or
> > checks for restrictions in character set (e.g. ensure every character is
> > alphanumeric or even numeric).
> >
> > The code I posted will checksum arbitrary strings using the old Modulo
> > 11 (or any other low prime) technique from the 1960s.
>
> Not really. Even the built-in check-digit features on IBM keypunches of
> the era applied different weights to catch transpositions, in either
> modulus 10 or modulus 11 (the modulus-11 feature excluded all numbers
> with a check "digit" of 10, rather than the modern practice of
> substituting the letter X).

Are you sure? Ddn't exist on any of the IBM keypunches we had.

robin....@gmail.com

unread,
May 10, 2017, 10:45:24 PM5/10/17
to
On Thursday, May 11, 2017 at 3:30:02 AM UTC+10, David W Noon wrote:
>
> The most interesting algorithm I've seen is the Damm algorithm:
>
> <https://en.wikipedia.org/wiki/Damm_algorithm>

The table looks like it could be extended to alphanumeric,
but the article does not indicate how the entries are
generated for even the numeric version.

John W Kennedy

unread,
May 11, 2017, 10:22:34 AM5/11/17
to
On 5/10/17 1:30 PM, David W Noon wrote:
> On Wed, 10 May 2017 12:57:00 -0400, in message
> <NfednUIZGbrv247E...@giganews.com>, John W Kennedy
> (john.w....@gmail.com) wrote:
>> On 5/10/17 9:05 AM, David W Noon wrote:
> [snip]
>>> The code I posted will checksum arbitrary strings using the old Modulo
>>> 11 (or any other low prime) technique from the 1960s.
>>
>> Not really. Even the built-in check-digit features on IBM keypunches of
>> the era applied different weights to catch transpositions, in either
>> modulus 10 or modulus 11 (the modulus-11 feature excluded all numbers
>> with a check "digit" of 10, rather than the modern practice of
>> substituting the letter X).
>
> That would be an early version of the Luhn algorithm, I guess. After
> all, Luhn was an IBM researcher.
>
>>> But the RANK() calls are easily replaced to exploit more modern
>>> algorithms.
>>
>> Well, yes, but it’s rather messy. Indeed, even with a trivial,
>> unweighted algorithm, you should really have RANK(x)-RANK('0') to get a
>> portable result.
>
> Subtracting RANK('0') would be fine for all-numeric strings, but when
> you allow alphabetics it will give negative values on an EBCDIC platform.

It won’t be portable if you don’t normalize it somehow.

>> Also, many weighting algorithms are more easily handled if you do the
>> calculation from right to left.
>
> Some of the modern algorithms separate odd/even character positions, so
> one needs to do 2 passes, each of half length. Checksumming is a
> non-trivial problem and the algorithms are many and varied.
>
> The most interesting algorithm I've seen is the Damm algorithm:
>
> <https://en.wikipedia.org/wiki/Damm_algorithm>


--

John W Kennedy

unread,
May 11, 2017, 10:40:32 AM5/11/17
to
It was an optional feature (or, rather, there were two optional
features, mutually exclusive, one for mod-10 and one for mod-11). It’s
right there in the manuals for both the 029 and the 129.

David W Noon

unread,
May 11, 2017, 12:34:44 PM5/11/17
to
On Wed, 10 May 2017 19:45:23 -0700 (PDT), in message
<0f56fa27-9ace-4b4e...@googlegroups.com>, Robin Vowels
I thought about reading Damm's original paper, but since it's his Ph.D.
thesis it could be a bit hefty.

Randy Gress

unread,
May 11, 2017, 3:28:06 PM5/11/17
to
After more searching we found there is an ISO Specification for Check digits, ISO/IEC 7064:2003
http://webstore.ansi.org/RecordDetail.aspx?sku=ISO%2FIEC+7064%3A2003

I think we are going to use the MOD 37,36 Check digit routine which will give us a single Alpha Numeric check digit.

From what I understand it is similar to the Verhoeff and DAMM algorithms. My only concern is trying to communicate the specifications to our Vendors. The code isn't very long, only about 25 lines, and the code isn't very complicated. The issue is understanding what it is doing and why.





Peter Flass

unread,
May 11, 2017, 5:05:57 PM5/11/17
to
Just refer them to the standard...

--
Pete

robin....@gmail.com

unread,
May 11, 2017, 8:44:39 PM5/11/17
to
Are you able to post it?

Randy Gress

unread,
May 12, 2017, 11:43:45 AM5/12/17
to
Here is the procedure to calculate the check digit.

ISO7064Mod37_36_CheckDigit: procedure (CheckStr) returns(CHAR(1));
DCL CheckStr CHAR(*) VAR;
DCL ScanLine CHAR(132) VAR;
DCL ScanLinePos FIXED BIN(31) INIT (0);
DCL CurChar CHAR(1) INIT('');
DCL CharSet CHAR(36) INIT('0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ');
DCL CharSetPos FIXED BIN(31) INIT (0);
DCL Radix FIXED BIN(31) INIT (36);
DCL CalcNbr FIXED BIN(31) INIT (36);

ScanLine = CheckStr;

If (TRIM(ScanLine) = '') Then Return(''); /* ScanLine is blank */

ScanLine = translate(trim(ScanLine), upcase, lrcase);

do ScanLinePos = 1 to length(ScanLine);
CurChar = substr(ScanLine, ScanLinePos, 1);
if (CurChar ^= ' ') then /* Ignore Blanks */
do;
CharSetPos = index(CharSet, CurChar);
if (CharSetPos = 0) Then Return(''); /* Char in ScanLine is not valid */
CharSetPos = CharSetPos -1; /* We need zero based not one based */
CalcNbr = CalcNbr + CharSetPos;
If (CalcNbr > Radix) Then
CalcNbr = CalcNbr - Radix;
CalcNbr = CalcNbr * 2;
If CalcNbr >= Radix + 1 Then
CalcNbr = CalcNbr - (Radix + 1);
end;
end;
CalcNbr = (Radix + 1) - CalcNbr;
If (CalcNbr = Radix) Then
CalcNbr = 0;

Return(substr(CharSet, CalcNbr+1, 1));
End ISO7064Mod37_36_CheckDigit;

robin....@gmail.com

unread,
May 13, 2017, 10:19:01 AM5/13/17
to
Interesting. Thanks for posting.

Were any claims made for the algorithm in the standard?

Have you done any tests on it to see how effective it is
on detecting transpositions, wrong digits, etc, compared to Lunh?

Randy Gress

unread,
May 19, 2017, 12:46:37 PM5/19/17
to
There is a table in the Standards Doc for the different algorithms in the standard. Here is what they have for the algorithm we are using.


1. Single substitution — substitution of a single character for another
0.0%

2. Single transposition — the transposition of single characters, adjacent (d = 1)
0.16%

3. Single transposition — the transposition of single characters, with one character in between (d = 2)
1.7%

4. Double substitution — two separate single substitution errors in the same string
2.8%

5. Circular shift — circular shift of the strings to the left or right [the detection rate shown is for circular shifts of moderate distances only (d < 10)];
0.0%

6. Other — all errors not defined above
2.8%

7. Residual error - The residual error gives a typical range of undetected errors of all types per 100,000 errors.
180–740

Randy Gress

unread,
May 19, 2017, 12:49:11 PM5/19/17
to
On Saturday, May 13, 2017 at 9:19:01 AM UTC-5, robin....@gmail.com wrote:
By the way. Here is the introduction to the standard:

Introduction
The need for standardization of check character systems was determined by the following considerations:
a) of the multitude of systems in use, many have very similar characteristics, and much of the variety fails to
provide any significant benefit;
b) few of the existing systems have been thoroughly verified mathematically and several have serious defects;
c) the variety of systems undermines the economics of products which generate or validate check characters, and
frequently prevents the checking of interchanged data.
Therefore a small set of compatible systems were selected to cope with various application needs; they were
validated, and within the constraints of each application, offer high protection against typical transcription and
keying errors.
Existing check character systems as specified in ISO 2108, ISO 2894 and ISO 6166 are used in special application
fields (ISO 2894 has been withdrawn). These do not however, achieve the error detection rate of the systems
specified in this International Standard.
Annex A summarizes the criteria to be considered when selecting a check character system specified in this
International Standard for a particular application.
Annex B provides an example of a method by which this standard may be applied to an alphabet that has more
than 26 characters.

I would post the entire standard but it states, "Single user license only. Copying and networking prohibited."
0 new messages