Parsing Behaviour in Source Citations

21 views
Skip to first unread message

David Crouch

unread,
Oct 16, 2023, 3:05:19 PM10/16/23
to TMG to GEDCOM
Hi,

I have a source template, as follows:

<[AUTHOR], >"[RECORD TITLE]," [PAGES], [ENTRY]; imaged, "Medway Council Heritage Services catalogues," [RECORD TYPE], [ITAL:]Medway Council Heritage Services[:ITAL] ([URL] : accessed [ACCESS DATE]), [FILE REFERENCE]<; citing  <[REPOSITORY REFERENCE] >< held by the [REPOSITORY]>>.

And here is an example of how I used it:
{AUTHOR}
St. Margaret of Antioch Parish (Darenth)
[RECORD TITLE]
Register of Burials [1941-1992]

N.B. I interpret the guidance from Elizabeth Shown Mills that I put the dates in rounded brackets if that is actually part of the title and I put the dates in square brackets when I am annotating to title to help the reader to select the correct register from a series of registers.

The TMG Preview of the citation is:

 St. Margaret of Antioch Parish (Darenth), "Register of Burials [1941-1992]," p. 52, entry no. 416, George Odel Levings, 03 Jan 1959; imaged, "Medway Council Heritage Services catalogues," database and images, Medway Council Heritage Services (https://cityark.medway.gov.uk/Details/archive/110003192 : accessed 22 Aug 2022), PDF file P109-1E-01.pdf, p. 28 of 54; citing  P109/1E/1  held by the Medway Archives Centre.

When I produce a website using Second Site, the source entry is:

[S2249] St. Margaret of Antioch Parish (Darenth), "Register of Burials [1941-1992]," p. 52, entry no. 416, George Odel Levings, 03 Jan 1959; imaged, "Medway Council Heritage Services catalogues," database and images, Medway Council Heritage Services (https://cityark.medway.gov.uk/Details/archive/110003192 : accessed 22 Aug 2022), PDF file P109-1E-01.pdf, p. 28 of 54; citing P109/1E/1 held by the Medway Archives Centre.

which looks perfectly reasonable.

When I convert the same database to GEDCOM, I get the following source entry:

0 @S2249@ SOUR
1 REFN 2249
1 TITL St. Margaret of Antioch Parish (Darenth), "Register of Burials unknown 1941-1992," p. 52, entry no. 416, George Odel Levings, 03 Jan 1959; imaged, "Medway Council Heritage Services catalogues," database and images, <i>Medway Council Heritag
2 CONC e Services</i> (https://cityark.medway.gov.uk/Details/archive/110003192 : accessed 22 Aug 2022), PDF file P109-1E-01.pdf, p. 28 of 54; citing  P109/1E/1 held by the Medway Archives Centre.
1 REPO @R42@
2 CALN P109/1E/1
1 OBJE @O9195@
1 OBJE @O9196@

I am having two problems:
1. The square brackets around the dates in the register title have been replaced with "unknown 1941-1992", and

2. Despite having the Printer Code for Italics set to "Remove printer code", I am getting <i> </i> around the website name.

I have tried to figure out both of these behaviours without success so I am hoping someone can provide some guidance.

Thanks!

David

John Cardinal

unread,
Oct 16, 2023, 3:20:44 PM10/16/23
to t2...@googlegroups.com

David,

 

In TMG, [ and ] have a special meaning. TMG is ignoring it in this case, but it won’t always do so, and other programs may or may respond the same way. I suggest you escape the special characters, i.e., replace “Register of Burials [1941-1992]” with “Register of Burials \[1941-1992\]”. The backslashes tell the parser to ignore the special meaning of [ and ].

 

Regarding the removal of the italic codes, I will look into it.

 

John Cardinal

 

Products: ORA, GedSite, Gedcom Publisher, Second Site, TMG Utility, TMG to GEDCOM

Web Hosting for Genealogists: Family History Hosting

David Crouch

unread,
Oct 16, 2023, 4:33:26 PM10/16/23
to TMG to GEDCOM
Hi John,

Thank you for the quick reply.  Is there an easy way to bulk search and replace "[" with "\[" in the Record Title field of the Source Definition?  Otherwise, I will have to manually trawl through several thousand source definitions.  Is this something that the TMG Utility could perform?

Thanks!

David

John Cardinal

unread,
Oct 16, 2023, 4:56:29 PM10/16/23
to t2...@googlegroups.com

David,

 

You can probably use the Find and Replace feature in TMG Utility. I strongly suggest that you do a dry run first where you have the “Log Only” option checked. If you use “[:CR:]” or some other printer code in a source element value, that will get broken. So, review the list of possible changes to see if there are any issues. If there are, you can either:

 

  1. Use the Find and Replace feature and then manually cleanup the sources that it breaks,

 

or

 

  1. Do all the edits manually, but use the Find and Replace log as a guide to which sources need changing.

 

This should get you started:

 

TMG Utility v7.5.0

Copyright (C) 1999-2023 by John Cardinal. All Rights Reserved. Do not redistribute.

 

Function=Find and Replace

Project=test_ (C:\TMG Projects\test\test_)

Dataset=test / My Data Set

Log Only=True

Started at 16:46:18

 

Options:

  Field: Source Subfields

  Find what: (^|[^\\])(\[|\])

  Replace with: $1\$2

  Contents must match Find What exactly: Unchecked

  Use Pattern Matching: Checked

  Match case: Unchecked

  Find Whole Word Only: <na>

 

-------------

Source#, Info

14, 'Mackenzie, P. [Paul] W.', 'Mackenzie, P.\[Paul\] W.'

 

Finished at 16:46:18

Elapsed Time=0 seconds

Records=22

Changed fields=1

 

Copy/paste the “Find what” and “Replace with” values. They are not for the feint of heart.

David Crouch

unread,
Oct 16, 2023, 7:56:19 PM10/16/23
to TMG to GEDCOM
Hi John,

Thank you for your e-mail and helpful suggestions.  I am glad that you provided the template for the find and replace.  It took me a long time to understand what you were doing.  I have run the changes in "Log Only" mode and generated 15 pages of output.  Everything looks fine but I will look it over again tomorrow morning with fresh eyes before removing the "Log Only" mode.

Thank you for all your assistance.

Be Well,

David

John Cardinal

unread,
Oct 16, 2023, 10:36:02 PM10/16/23
to t2...@googlegroups.com

David,

 

A handy trick for understanding a regular expression (regex) written by someone else is to copy and paste it into https://regex101.com. It provides an explanation pane:

 

 

For a regex used by TMG Utility, you should set the “Flavor” to “.NET C#)”. TMGU predates .NET, so that’s a bit of a stretch, but the regex used by TMGU follows the same rules as in .NET (C#).

 

For a regex used in JavaScript, set the flavor to “ECMAScript (JavaScript)”. That’s the right choice for working with a regex used in ORA.

 

https://regex101.com does more than just explain, of course.

 

John Cardinal

 

Products: ORA, GedSite, Gedcom Publisher, Second Site, TMG Utility, TMG to GEDCOM

Web Hosting for Genealogists: Family History Hosting

 

From: t2...@googlegroups.com <t2...@googlegroups.com> On Behalf Of David Crouch
Sent: Monday, October 16, 2023 7:56 PM
To: TMG to GEDCOM <t2...@googlegroups.com>
Subject: Re: [T2G] Parsing Behaviour in Source Citations

 

Hi John,

 

Thank you for your e-mail and helpful suggestions.  I am glad that you provided the template for the find and replace.  It took me a long time to understand what you were doing.  I have run the changes in "Log Only" mode and generated 15 pages of output.  Everything looks fine but I will look it over again tomorrow morning with fresh eyes before removing the "Log Only" mode.

image001.png
Reply all
Reply to author
Forward
0 new messages