Issues with Harry Leo Bell, Jr card

19 views
Skip to first unread message

Mark Aubrey

unread,
Dec 11, 2019, 11:54:30 AM12/11/19
to SABR_TSN_Cards
The link to bbref inside https://digital.la84.org/digital/collection/p17103coll3/id/8419/rec/5 is incorrect.

It should point to https://www.baseball-reference.com/register/player.fcgi?id=bell--001har

Also, the OCR / transcription on the card is wrong in several places.  What caught my eye was "Knoxwille" instead of Knoxville.

There are other OCR / transcription errors but I don't know the exact way to offer up proposed corrections.

Published:
1. Irish
2. Name Position Bats Throws 3. Bell, Harry Leo,Jr. 2B R R 4. Born-Place Date Married 5. Pasadena,Calif. March 12,1928 6. Addess o20 Baverford Road Hoish Weich 7. Haverford,Pa. 6'l 19 8. Teams Ployed Wirh W.Chester State TeactottrPrste 9. Trenton 5/31/50rel.to Sunbury 10/6/50- fes. for 51 10. Rel.to Knoxwille 5/15/51 Rel.to Sunbury 6/5/51- 11. eleto Concord 6/5/51,REs roR 9s2- Rel.to Raleigh 12. 1/17/52-Nctr. list W/289/52- 1953- 1954- Rel.to 13. Fayettevl 12/26/53-54- 1. U.S.Army March 19,1945-Dec.9,1946 Suggested:
1. Irish 2. Name Position Bats Throws 3. Bell, Harry Leo,Jr. 2B R R 4. Born-Place Date Married 5. Pasadena,Calif. March 12,1928 6. Address 626 Haverford Road Height Weight 7. Haverford, Pa. 6'l 190 8. Teams Played With W.Chester State Teach.Coll.B.S.De. 9. Trenton 5/31/50 rel.to Sunbury 10/6/50- res. for 51 10. Rel.to Knoxville 5/15/51 Rel.to Sunbury 6/5/51- 11. Rel. to Concord 6/5/51,RES. FOR 1952- Rel. to Raleigh 12. 1/17/52 -Restr. list 4/28/52- 1953- 1954- Rel.t o 13. Fayetteville 12/26/53-54- 1. U.S.Army March 19,1945-Dec.9,1946.
Also, the Published Career is incorrect. Those teams belong to the other Harry Bell (https://www.baseball-reference.com/register/player.fcgi?id=bell--002har_
It should read: 1950 1. Trenton | Interstate 1951 1. Knoxville | Tri-State 1951 2. Sunbury | Interstate 1951 3. Concord | North Carolina State
Thank you, Mark Aubrey

F. X. Flinn

unread,
Dec 11, 2019, 12:36:56 PM12/11/19
to Mark Aubrey, SABR_TSN_Cards
Mark, thanks for these notes. As you know from the collection description, 98% of the matchups between the data from the card scans and the data in the SABR player biographical database (seen publicly on BBRef) were algorithmic results. The best results came when the player name and birthdate matched perfectly. The most common bad results were when we matched looking the last name and initials. The metadata would be built and then a process would run that would seek to identify duplicated player register matches and compare the text in the card career vs the text in the published career using algorithms designed to catch plagiarism in college term papers. The no matches had their links thrown out and any remaining duplicates were examined manually. Of course, this still leaves plenty of room for errors. We've already had a handful identified and fixed in the first week.

The process of fixing these records is straightforward, I have access to two different tools that allow me to edit the metadata, one online, one a Windows desktop application. Once I make the edits, they queue up and are processed each day at 1 am EST (+5 UST) and become available at some point during a 2 1/4 to 6 1/2 hour indexing process. Assuming I get to these edits sometime today they will show up tomorrow morning. Generally speaking, I'm not looking to clean up the player career scan results but so long as you have provided them I'll make them while in the record.

In case you haven't already tried, I'd look for Knoxville players with a lot of Knoxville variants -- I have seen many o's show up as c's, x's as y's, u's (and w's as you found), l's as 1's and i's. In advanced search, if you check only the TSN collection you can specify the player career field and set up searches like field contains Kncx or Knoy or Kncy etc. Also try using wildcard characters * and ? -- I haven't done enough to be sure of what really works, sometimes they seem to work correctly other times not. I need to ask my OCLC folks about that and report back. 

F. X. Flinn
FXFlinn@gmail
.com
| c:802-369-0069


--
You received this message because you are subscribed to the Google Groups "SABR_TSN_Cards" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sabr_tsn_card...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sabr_tsn_cards/420762e6-de68-4b25-a95a-36356c9e22c0%40googlegroups.com.

Mark Aubrey

unread,
Dec 12, 2019, 11:46:38 AM12/12/19
to SABR_TSN_Cards
F.X.,

Thank you for the description.  I haven't used all the wild cards yet, as most of what I'm looking for can be easily found.

We could spend hours and days correcting all the OCR nits and looking in the weeds for a "c" that should be an "o". 

What type of higher level issues are you wanting to focus on?  Incorrect names?  Missing or incorrect links?

Thanks.
Mark

F. X. Flinn

unread,
Dec 12, 2019, 11:53:09 AM12/12/19
to SABR_TSN_Cards
On Thu, Dec 12, 2019 at 11:46 AM Mark Aubrey <mark....@gmail.com> wrote:
F.X.,

Thank you for the description.  I haven't used all the wild cards yet, as most of what I'm looking for can be easily found.

We could spend hours and days correcting all the OCR nits and looking in the weeds for a "c" that should be an "o". 

Yeah, we don't want to spend any time on that at all.
 

What type of higher level issues are you wanting to focus on?  Incorrect names?  Missing or incorrect links?

Those are the areas that need some attention, and I need to put a form or spreadsheet up on this group to minimize efforts of those seeking to make corrections; I also need to clear assignment of trusted editors with LA84 and provide accounts for them to do item level editing of the collection. 
Reply all
Reply to author
Forward
0 new messages