Hi,
Anyone interested in participating in scraping doctors' details from GMC website?
If you are interested, please let me know and I’ll assign you a range of numbers to scan (to avoid duplicating efforts).
Or feel free to scan outside of below ranges, though I doubt you'll find anything, except maybe in the 7600000+ range.
As soon as my scans are complete I’ll
GMC numbers contain 7 digits giving 10M possibilities, but a brute force scan of all possibly existing GMC numbers on the GMC website would require a scan of only about 1610k numbers.
From experience I’d expect that about 20-25% of these numbers actually (still) exist.
Running two Powershell processes at once I can do about 5k numbers an hour, so that would be 15 days, without interruption.
Two sessions peak occasionally on 10Mbps net traffic and about 150Mb memory, so more than sufficient bandwidth and memory left for other stuff.
Scan extracts (for existing numbers):
Extraction datetime
Full Doctor details
Full Doctor history
Example Powershell script available at:
https://github.com/DutchHarry/GMC/tree/master
Numbers to be scanned:
Old check digit numbers:
000000-499999 (plus check digit) : 500k
Former LR numbers:
5900000-5999999 : 100k
5000000-5209999 : 210k
Former FPR numbers:
6000000-6179999 : 180k
New numbers:
(since abandoning check digit)
7000000-7600000 : 600k
Total : 1610k
I already did the first 500k, and making headways in the remainder.
Purpose:
First of all, the quarterly ODS consultants files (econcur+wconcur) are not that up to date, and the same applies for the weekly egmcmem with the GMC numbers for GPs.
So for certain analytics you may want to know the existing GMC numbers, so you can identify 'made-up' codes which the hospital may use to signal different activity (they are not supposed to do it that way, but alas).
And if you're really paranoid or cynical and have access to clear data you might want to check if there are hospitals who are still using codes of long retired (even deceased) consultants, who are still treating long removed (emigrated or deceased) patients. After all it's not only Virgin Care that tries to screw its commissioners, in the NHS this type of activity, apparently considered fraud elsewhere (even in Nigeria), is 'business as usual'.
Also interesting the percentage of foreign trained persons on the specialist register and foreign trained GPs, Just to see if a couple of £100M for additional medical school places to reduce ‘dependency on foreigners’ is more than just 'gesture-politics'.
Cheers
--
You received this message because you are subscribed to the Google Groups "nhshackday" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nhshackday+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To unsubscribe from this group and stop receiving emails from it, send an email to nhshackday+unsubscribe@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "nhshackday" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nhshackday+unsubscribe@googlegroups.com.
Yes, most of the ODS/SDS is available via the Spine Directory Service. You would use LDAP queries to interrogate it from software. We used LDAPAdmin (http://www.ldapadmin.org/) to browse the data and help construct our queries.
My immediate question is what you intend to use the GMC data for, and what information is it you require (over and above the GMC identifier, obviously)?
The files we publish come from a variety of sources and use the GMC to identify an individual within a particular context. Econcur and Wconcur are hospital consultants, taken from the NHS Electronic Staff Record system. Egmcmem is a mapping of GMC to GP Prescriber codes. Spine does indeed hold some GMC codes but only those that we match to users that already hold a smart card (and therefore have a Spine record).
So – none of our listings that contain GMC somewhere will be ‘complete’ i.e. not everyone registered with the GMC goes on to become a hospital consultant, or a GP prescriber, or be allocated a smart card. In addition, despite the fact we are provided a copy of the register by the GMC, this is under the terms of a license agreement that restricts what we are able to share (essentially the code and name only) – so I can’t give you a dump of the GMC data in full anyway.
You are able to obtain a full extract of the GMC register direct from GMC, although you will have to pay…
Can you provide a monthly extract of the full GMC register free of charge for (NHS clients related) analytical purposes.
(Or even better, create an API interface that anyone can use to automate data requests)
This would save us scraping this data periodically from your website, which is a bit of a pain as obviously we don’t know which numbers exist, and it takes a lot of valuable time.
Kindest Regards
To unsubscribe from this group and stop receiving emails from it, send an email to nhshackday+...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to nhshackday+unsubscribe@googlegroups.com.
Hi Mike,
Thanks for the swift response, that answers my question.
To answer your question:
I use the GMC numbers to identify consultant codes providers use in their data for contracting, commissioning and pathway analytics for my clients.
Providers use more numbers than the ones available in econcur and wconcur, and occasionally even no-longer existing numbers (retired or deceased consultants)
Don’t think even NHS Digital in SUS actually enforces that much (if any) data quality on consultant codes and referrer codes, and many other codes for that matter; even the data dictionary got watered down a bit, when ‘mandatory’ was changed to ‘mandatory, where available’ :>(
The odd provider uses ‘made up’ codes to flag certain activity for their commissioners, unfortunately the ‘business rules/data dictionaries’ of these ‘flags’ are not always shared by the so called ‘lead-commissioner’, so sometimes we have to figure this out by some guesswork.
Cheers
Harry