Duplicate detection and statutes

7 views
Skip to first unread message

Frank Bennett

unread,
Feb 14, 2010, 6:19:07 PM2/14/10
to zotero-dev
I've just realized something about statutory material that leads
directly to a rather tall order for duplicate detection.

The US Code site at Cornell is one of the most commonly used statutory
sources in that jurisdiction. For research purposes, each section of
the Code should be stored as a separate Zotero item, but that raises a
problem with duplicates. It's quite a challenge, but it seems to me
that handling legislative material in any volume is going to be pretty
cumbersome without a solid means of duplicate detection.

A common workflow will begin with specific provisions. Someone
writing on, say, arbitration, might start off by grabbing the
provision that ratifies the 1958 Convention on the Recognition and
Enforcement of Foreign Arbitral Awards:

http://www.law.cornell.edu/uscode/html/uscode09/usc_sec_09_00000201----000-.html

They might later decide to grab all of Title 9, Chapter 2, that deals
with the Convention:

http://www.law.cornell.edu/uscode/html/uscode09/usc_sup_01_9_10_2.html

For things to run smoothly in this workflow, when the whole Chapter
(or the enclosing Title) is grabbed, any existing items (section 201,
in the example above) should be passed over in the grab, and
transparently linked into the current target collection.

I don't have a solution, but there's the problem in a nutshell, at
least.

Frank Bennett

unread,
Feb 14, 2010, 7:33:23 PM2/14/10
to zotero-dev
On Feb 15, 8:19 am, Frank Bennett <biercena...@gmail.com> wrote:
> I've just realized something about statutory material that leads
> directly to a rather tall order for duplicate detection.
>
> The US Code site at Cornell is one of the most commonly used statutory
> sources in that jurisdiction.  For research purposes, each section of
> the Code should be stored as a separate Zotero item, but that raises a
> problem with duplicates.  It's quite a challenge, but it seems to me
> that handling legislative material in any volume is going to be pretty
> cumbersome without a solid means of duplicate detection.
>
> A common workflow will begin with specific provisions.  Someone
> writing on, say, arbitration, might start off by grabbing the
> provision that ratifies the 1958 Convention on the Recognition and
> Enforcement of Foreign Arbitral Awards:
>
>  http://www.law.cornell.edu/uscode/html/uscode09/usc_sec_09_00000201--...

>
> They might later decide to grab all of Title 9, Chapter 2, that deals
> with the Convention:
>
>  http://www.law.cornell.edu/uscode/html/uscode09/usc_sup_01_9_10_2.html
>
> For things to run smoothly in this workflow, when the whole Chapter
> (or the enclosing Title) is grabbed, any existing items (section 201,
> in the example above) should be passed over in the grab, and
> transparently linked into the current target collection.
>
> I don't have a solution, but there's the problem in a nutshell, at
> least.

Here is a thought for how to solve this. If the fields to be checked
for duplicate detectiion are specified in the translator, you would
get some interesting benefits.

Statutes change over time, and a properly maintained statutory site
(which certainly describes the Cornell LII) will include a note of the
most recent revision to each provision in the archive. For the US
Code, the data items used for duplicate detection could be set to the
archive name (Cornell LII), the name of the code (US Code), the title
(Title 9), the section number, and the most recent revision date.
With these narrow constraints, it can be known to certainty whether or
not an item is a duplicate.

One further enhancement in the UI would be to provide a tick-box in
the folder icon pop-up, to allow the user to request that duplicate
items be silently passed over or linked into the target collection.
The latter would yield entries in the collection for all items in the
current index page. The latter would have an RSS-like effect,
grabbing only items that are unknown to the user's database.

Not sure how feasible it is to implement this, but it would vastly
enhance the utility of statutory resources. The same strategy for
duplicates detection might be used for common archives like PubMed
that are known to offer a unique ID for each item they contain.

Reply all
Reply to author
Forward
0 new messages