Worked Example

1 view
Skip to first unread message

Kevin Campbell

unread,
Apr 20, 2012, 12:55:19 PM4/20/12
to total-im...@googlegroups.com
Heather, Jason,

Do you have a worked example of us processing an Item and the resulting aliases and metrics? I'm interested in an item where we are getting a good number of results, and the alias chaining from one provider to another.

If I'm writing notes as I go here, it's just simpler to have a real world example in mind.

Regards,
Kevin

Jason Priem

unread,
Apr 20, 2012, 4:32:09 PM4/20/12
to total-im...@googlegroups.com
kevin, you mean just a description of how it would go, bit by bit
through the system using real providers?

> --
> You received this message because you are subscribed to the Google Groups
> "total-impact-dev" group.
> To post to this group, send email to total-im...@googlegroups.com.
> To unsubscribe from this group, send email to
> total-impact-d...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/total-impact-dev?hl=en.

--
Jason Priem
UNC Royster Scholar
School of Information and Library Science
University of North Carolina at Chapel Hill

Kevin Campbell

unread,
Apr 21, 2012, 10:00:16 AM4/21/12
to total-im...@googlegroups.com
On Fri, Apr 20, 2012 at 9:32 PM, Jason Priem <j...@jasonpriem.org> wrote:
kevin, you mean just a description of how it would go, bit by bit
through the system using real providers?

I think what I'm looking for is somewere between the summary details at http://total-impact.org/about#toc_2_6 and the level of detail in test/unit_tests/provider_dryad/test_dryad.py

Running get_aliases on this provider for an example item, we would get:

'doi', '10.5061/dryad.7898' -->
  ('url', 'http://hdl.handle.net/10255/dryad.7898')
  ('doi', '10.5061/dryad.7898')
  ('title', 'data from: can clone size serve as a proxy for clone age? an exploration using microsatellite divergence in populus tremuloides')

       
And get_metrics would return the following details:

'doi', '10.5061/dryad.7898' -->

   [('dryad:most_downloaded_file', 63), ('dryad:package_views', '149'), ('dryad:total_downloads', 169)])

Again, I think this is more that I'm lacking domain knowledge here, but possibly that's a good thing.

Doing this on a provider level is fine, but I think an example where an alias which is found by one provider and then used by another would really help. I could then use this when describing examples in any docs.

K


Jason Priem

unread,
Apr 22, 2012, 5:54:19 PM4/22/12
to total-im...@googlegroups.com
Sure thing, Kevin, it's a good question (and sorry for not getting back to you sooner). I'll take a crack at it, but Heather has spent more time thinking about this part than I have, so hopefully she'll weigh in as well.

CrossRef is one of the most important providers, because it does generate a lot of aliases that other providers need. For example, I might start with a doi, then ask CrossRef and get the url, which topsy (our twitter intermediary) needs.
<doi> -> CrossRef -> <doi> 
                     <url> -> Topsy -> 14 tweets 
                      ...

Facebook also uses the url. We also may use the title later in some providers, which crossref also gets.

Another example is PubMed, which we use to resolve PMIDs. These resolve to DOIs, and then you need to feed them into CrossRef to get the urls: 
<pmid> -> PubMed -> <doi> -> CrossRef -> <doi> 
                    <pmid>               <pmid>
                                         <url> -> Topsy -> 14 tweets 
                                          ...
 
Is that what you had in mind? Let me know if that's not clear.
j

Hilmar Lapp

unread,
Apr 22, 2012, 9:29:04 PM4/22/12
to total-im...@googlegroups.com, total-im...@googlegroups.com
Just in case you weren't aware, NCBI Entrez will also resolve DOIs to PMIDs.

-hilmar

Sent with a tap.

Heather Piwowar

unread,
Apr 23, 2012, 2:55:23 AM4/23/12
to total-im...@googlegroups.com
Hi Kevin, sorry for the delay.

Here's a worked real-life example to illustrate what Jason has described.

The currently-deployed total-impact takes a Mendeley Group ID and imports the associated Mendeley paper IDs.  

For example, for the exemplar Mendeley group 530031(http://www.mendeley.com/groups/530031/future-of-science/), one of the paper is identified as Mendeley UUID 

1d77b4a1-0a84-11e0-9e7a-0024e8453de6


By using the Mendeley APIs the Mendeley plugin would figure out that this UUID corresponds to this record http://www.mendeley.com/research/collocation-inform-impact-collaboration/

The Mendeley plugin would return the PubMed ID (PMID) 21179507
It may also return the DOI, but sometimes the Mendeley records don't have the DOI properly recorded (see, in contrast to this record, how nicely the DOI is shown in another record http://www.mendeley.com/research/biotorrents-file-sharing-service-scientific-data/)

So let's say that Mendeley only returned the PMID.  Then (as Hilmar suggested) we use the PubMed plugin to get the DOI.  10.1371/journal.pone.0014279

Then the Crossref plugin takes this DOI and returns the full text landing page (I think in this case the one registered with Crossref is http://dx.plos.org/10.1371/journal.pone.0014279)

With all of those aliases we then call the metrics.  This lets us find tweets that use the full text landing page as the ID:  http://topsy.com/dx.plos.org/10.1371/journal.pone.0014279?utm_source=otter

Does that help make it more concrete?

Heather

Kevin Campbell

unread,
Apr 23, 2012, 4:45:21 PM4/23/12
to total-im...@googlegroups.com
On Mon, Apr 23, 2012 at 7:55 AM, Heather Piwowar <hpiw...@gmail.com> wrote:
Hi Kevin, sorry for the delay.

Here's a worked real-life example to illustrate what Jason has described.

...

 
Heather,

This is very useful, thanks!

Can I check that the following would then be correct?

Mendeley.get_members(530031) ==>
   [..., 1d77b4a1-0a84-11e0-9e7a-0024e8453de6, ...]

Mendeley.get_aliases(('mendeley', '1d77b4a1-0a84-11e0-9e7a-0024e8453de6')) ==>
   [('url', 'http://www.mendeley.com/research/collocation-inform-impact-collaboration/'), ('pubmed', '21179507')]
Pubmed.get_aliases(('pubmed', '21179507')) ==>
  [('doi', '10.1371/journal.pone.0014279')]
Crossref.get_aliases(('doi','10.1371/journal.pone.0014279')) ==>
  [('url', 'http://dx.plos.org/10.1371/journal.pone.0014279'), ('title', 'Does Collocation Inform the Impact of Collaboration?')]

Twitter.get_metrics([..., ('url','http://topsy.com/dx.plos.org/10.1371/journal.pone.0014279?utm_source=otter'), ...]) ==>
    ['twitter.tweets':9]


I've treated get_aliases as being non-recursive in this case, and ignored aliases being sent to providers for namespaces they don't support.

Regards,
Kevin


Heather Piwowar

unread,
Apr 23, 2012, 5:05:08 PM4/23/12
to total-im...@googlegroups.com


Can I check that the following would then be correct?

Mendeley.get_members(530031) ==>
   [..., 1d77b4a1-0a84-11e0-9e7a-0024e8453de6, ...]

Mendeley.get_aliases(('mendeley', '1d77b4a1-0a84-11e0-9e7a-0024e8453de6')) ==>
   [('url', 'http://www.mendeley.com/research/collocation-inform-impact-collaboration/'), ('pubmed', '21179507')]
Pubmed.get_aliases(('pubmed', '21179507')) ==>
  [('doi', '10.1371/journal.pone.0014279')]
Crossref.get_aliases(('doi','10.1371/journal.pone.0014279')) ==>
  [('url', 'http://dx.plos.org/10.1371/journal.pone.0014279'), ('title', 'Does Collocation Inform the Impact of Collaboration?')]

Twitter.get_metrics([..., ('url','http://topsy.com/dx.plos.org/10.1371/journal.pone.0014279?utm_source=otter'), ...]) ==>
    ['twitter.tweets':9]


Some of the providers would return additional aliases.  For example, Mendeley would ideally return the title, PubMed would return a title, PubMed would probably also return a PubMed Central ID as well since this paper would have one  and maybe some urls to the PMC full text, Crossref might return additional urls, and Topsy returns more than one metric.

But in general, yes :)

Also, I think we are actually removing support for get_members from Mendeley for this version... I used it because it was a good way to show the long chain from the previous version.  Also get_members takes a type.

Heather

Reply all
Reply to author
Forward
0 new messages