web scraping + text mining + visualisation for coi in medicine idea

72 views
Skip to first unread message

Carl Reynolds

unread,
Feb 11, 2012, 3:32:40 PM2/11/12
to oss-uk-health
I've a hunch that in certain areas of medical practice where there are
very large financial stakes e.g thromboprophylaxis the medical
literature is awash with industry sponsored research and authors with
industry links.

It would be interesting to attempt to investigate this at scale.

I see it as 3 sub-problems:
1. List of who publishes on what
Pubmed is the gold standard database of who published on what.
Unfortunately who = name and so authors are not unique.

http://biostar.stackexchange.com/questions/12353/examples-of-full-text-mining-on-the-pubmed-central-open-access-subset
http://biostar.stackexchange.com/questions/10049/how-do-people-go-about-pubmed-text-mining
http://biostar.stackexchange.com/questions/11566/local-copy-of-pubmed

might be better to scrape www.biomedexperts.com e.g
http://www.biomedexperts.com/Profile.bme/252895/Alexander_T_Cohen...

2. List of each authors conflict of interests
I'm not sure how best to automagically get conflict/competing (of)
interest statements/declarations.....

google throws up

Alexander T. Cohen, MD
Disclosure: Alexander T. Cohen, MD, has disclosed the following
relevant financial relationships:
Served as an advisor or consultant for: Astellas Pharma, Inc.;
AstraZeneca Pharmaceuticals LP; Bayer HealthCare Pharmaceuticals;
Boehringer Ingelheim Pharmaceuticals, Inc.; Bristol-Myers Squibb
Company; Daiichi Sankyo, Inc.; GlaxoSmithKline; Johnson & Johnson
Pharmaceutical Research & Development, L.L.C.; Mitsubishi Pharma
America, Inc.; Pfizer Inc.; sanofi-aventis; Schering-Plough
Corporation; Takeda Pharmaceuticals North America, Inc.
Served as a speaker or a member of a speakers bureau for: Bayer
HealthCare Pharmaceuticals; Boehringer Ingelheim Pharmaceuticals,
Inc.; Bristol-Myers Squibb Company; Daiichi Sankyo, Inc.;
GlaxoSmithKline; Johnson & Johnson Pharmaceutical Research &
Development, L.L.C.; Mitsubishi Pharma America, Inc.; Pfizer Inc.;
sanofi-aventis
Received grants for clinical research from: AstraZeneca
Pharmaceuticals LP; Bayer HealthCare Pharmaceuticals; Boehringer
Ingelheim Pharmaceuticals, Inc.; Bristol-Myers Squibb Company; Daiichi
Sankyo, Inc.; GlaxoSmithKline; Johnson & Johnson Pharmaceutical
Research & Development, L.L.C.; Pfizer Inc.; sanofi-aventis; Schering-
Plough Corporation

but this
Funding: Sanofi-Synthelabo (France) and NV Organon (Netherlands)
sponsored the study and carried out on-site monitoring of all
participants. The steering committee had the final responsibility for
the study protocol, case report forms, statistical analysis plan,
progress of the study and analysis, as well as the reporting of the
data. The sponsors had an opportunity to comment on the manuscripts
before submission, but the final version was the sole responsibility
of the authors.
Competing interests: ATC, BLD, ASG, MRL, WT, and AGGT participated as
investigators, consultants, or both for NV Organon and Sanofi-
Synthelabo. JFME and AWAL are employees of NV Organon. BLD has served
as an investigator or consultant for AstraZeneca, Bristol-Myers
Squibb, Boehringer Ingelheim, and Pharmacia. ASG has served as an
investigator, consultant, or advisory board member for Bristol-Myers
Squibb, AstraZeneca, Aventis, Bayer, and Progen. MRL has served as an
investigator and advisory board member for AstraZeneca, Britol-Myers
Squibb, Mitsubishi Pharma Europe, Yamanouchi Pharma, and Bayer. AGGT
is a consultant for Bristol-Myers Squibb

could be found in a free full text paper from pubmed central
http://www.bmj.com/content/332/7537/325

so maybe limit dataset considered to pubmed central. Fortunately a
nice man called Lars Juhl Jensen made this available in a nice format
here: http://biostar.stackexchange.com/questions/2077/full-text-retrieval-from-pubmedcentral/2082#2082
unfortunately it's down but I've dropped him a mail.

3. List of products/product areas for each drug company
e.g http://en.wikipedia.org/wiki/Sanofi#Products

Carl

Carl Reynolds

unread,
Feb 13, 2012, 2:38:24 AM2/13/12
to oss-uk-health
Lars rebooted his server for me, downloading PMC OA now :-)

On Feb 11, 8:32 pm, Carl Reynolds <drc...@gmail.com> wrote:
> I've a hunch that in certain areas of medical practice where there are
> very large financial stakes e.g thromboprophylaxis the medical
> literature is awash with industry sponsored research and authors with
> industry links.
>
> It would be interesting to attempt to investigate this at scale.
>
> I see it as 3 sub-problems:
> 1. List of who publishes on what
> Pubmed is the gold standard database of who published on what.
> Unfortunately who = name and so authors are not unique.
>
> http://biostar.stackexchange.com/questions/12353/examples-of-full-tex...http://biostar.stackexchange.com/questions/10049/how-do-people-go-abo...http://biostar.stackexchange.com/questions/11566/local-copy-of-pubmed
>
> might be better to scrapewww.biomedexperts.come.ghttp://www.biomedexperts.com/Profile.bme/252895/Alexander_T_Cohen...
> could be found in a free full text paper from pubmed centralhttp://www.bmj.com/content/332/7537/325
>
> so maybe limit dataset considered to pubmed central. Fortunately a
> nice man called Lars Juhl Jensen made this available in a nice format
> here:http://biostar.stackexchange.com/questions/2077/full-text-retrieval-f...

ben bray

unread,
Feb 13, 2012, 5:24:32 AM2/13/12
to oss-uk...@googlegroups.com
Hi Carl

I like this idea very much ("They Work for Them" !?...apologies to the creators of the original phrase)

Another source of information would be the ClinicalTrials.gov database - this can be linked to pubmed citations through the unique identifier number.  As well as information on pharma backing, it also links to a variety of other info that might be useful e.g. product details from the FDA.

BW

Ben

Carl Reynolds

unread,
Feb 13, 2012, 12:19:45 PM2/13/12
to oss-uk...@googlegroups.com
good idea, thanks!

tkrohn

unread,
Feb 14, 2012, 6:40:26 PM2/14/12
to oss-uk-health
Allow me a quick intro: My name is Tom Krohn and I lead a team working
on Clinical Open Innovation with the goal to transform and accelerate
clinical development. Our effort is part of Eli Lilly, a US-based
pharmaceutical firm, with all our work put into the public under
Creative Commons Zero license (no restrictions or contraints).

Our focus is on bringing value to public data and the power of an
engaged crowd in clinical information. You can learn more on our
initiative on the www.tbcommons.org website. In particular, take a
look at www.tbcommons.org/trials as you will see that we have created
a powerful app to search, filter, view mashups and share data from
clinicaltrials.gov. This is our first data domain and example
application. Our API is open - albeit fairly simple now to serve json
files for the clinical collections application. Watch for more to
come in the months ahead.

I'd be curious to hear others' perpsective on our work and where we
can collaborate and "divide and conquer" on some public data
enhancement.

Give it a look and let us know what you think.

Cheers,
Tom

On Feb 13, 5:24 am, ben bray <bentheb...@gmail.com> wrote:
> Hi Carl
>
> I like this idea very much ("They Work for Them" !?...apologies to the
> creators of the original phrase)
>
> Another source of information would be the ClinicalTrials.gov database -
> this can be linked to pubmed citations through the unique identifier
> number.  As well as information on pharma backing, it also links to a
> variety of other info that might be useful e.g. product details from the
> FDA.
>
> BW
>
> Ben
>
> On 11 February 2012 20:32, Carl Reynolds <drc...@gmail.com> wrote:
>
>
>
>
>
>
>
> > I've a hunch that in certain areas of medical practice where there are
> > very large financial stakes e.g thromboprophylaxis the medical
> > literature is awash with industry sponsored research and authors with
> > industry links.
>
> > It would be interesting to attempt to investigate this at scale.
>
> > I see it as 3 sub-problems:
> > 1. List of who publishes on what
> > Pubmed is the gold standard database of who published on what.
> > Unfortunately who = name and so authors are not unique.
>
> >http://biostar.stackexchange.com/questions/12353/examples-of-full-tex...
>
> >http://biostar.stackexchange.com/questions/10049/how-do-people-go-abo...
> >http://biostar.stackexchange.com/questions/2077/full-text-retrieval-f...
Reply all
Reply to author
Forward
0 new messages