Duplicated patents in invpat dataset

14 views
Skip to first unread message

Rav Chi

unread,
Mar 2, 2018, 5:39:03 AM3/2/18
to Dataverse Users Community


I am using patent data as my proxy for innovation for my thesis project. I have noticed that invpat data duplicates some patents as being invented by more than one person. For example, patent 03858241 is recorded twice as having been invented by two people, under separate observations within the dataset, both from Massachusetts. Why do we have the duplicates? If I am using this dataset to create my patent count variable for innovation, will my sample not suffer from double counting? 

Already comparing this dataset with the NBER dataset, its about 4 times larger than the NBER. Could this duplication be the cause for inflated patent observations? 

Gautier, Julian

unread,
Mar 2, 2018, 11:36:39 AM3/2/18
to dataverse...@googlegroups.com
Hi,

Thanks for the question. Could you post the link to the datasets you're referring to? I think it might be the datasets here: https://dataverse.harvard.edu/dataverse/patent. Is that right?

On Mar 2, 2018 5:39 AM, "Rav Chi" <mercy...@gmail.com> wrote:


I am using patent data as my proxy for innovation for my thesis project. I have noticed that invpat data duplicates some patents as being invented by more than one person. For example, patent 03858241 is recorded twice as having been invented by two people, under separate observations within the dataset, both from Massachusetts. Why do we have the duplicates? If I am using this dataset to create my patent count variable for innovation, will my sample not suffer from double counting? 

Already comparing this dataset with the NBER dataset, its about 4 times larger than the NBER. Could this duplication be the cause for inflated patent observations? 

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse-community@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/8bdef8bf-d365-4090-908a-b8bfb0b67b93%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mercy Mhuru

unread,
Mar 2, 2018, 1:04:46 PM3/2/18
to dataverse...@googlegroups.com

On 3/03/2018 5:36 AM, "Gautier, Julian" <julian...@g.harvard.edu> wrote:
Hi,

Thanks for the question. Could you post the link to the datasets you're referring to? I think it might be the datasets here: https://dataverse.harvard.edu/dataverse/patent. Is that right?
On Mar 2, 2018 5:39 AM, "Rav Chi" <mercy...@gmail.com> wrote:


I am using patent data as my proxy for innovation for my thesis project. I have noticed that invpat data duplicates some patents as being invented by more than one person. For example, patent 03858241 is recorded twice as having been invented by two people, under separate observations within the dataset, both from Massachusetts. Why do we have the duplicates? If I am using this dataset to create my patent count variable for innovation, will my sample not suffer from double counting? 

Already comparing this dataset with the NBER dataset, its about 4 times larger than the NBER. Could this duplication be the cause for inflated patent observations? 

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "Dataverse Users Community" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dataverse-community/OfaVH0cccDc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to dataverse-community+unsub...@googlegroups.com.

To post to this group, send email to dataverse-community@googlegroups.com.

julian...@g.harvard.edu

unread,
Mar 2, 2018, 2:26:36 PM3/2/18
to Dataverse Users Community
Ah, thank you, Rav Chi. That's a file in the dataverse I linked to. Have you had a chance to contact the dataset owner? The dataset is at https://dataverse.harvard.edu/dataset.xhtml?persistentId=hdl:1902.1/15705, and the Contact link on the top right-hand will let you send an email to the person who uploaded the dataset, who may be more familiar with the details of the file and of the NBER dataset (or help get you in touch with someone who is). Or perhaps the NBER community members on this page (https://sites.google.com/site/patentdataproject/Home/community-members) could help.

I hope this helps!

Julian
Reply all
Reply to author
Forward
0 new messages