Data validation

174 views
Skip to first unread message

Hack G

unread,
Feb 6, 2014, 1:55:28 PM2/6/14
to np...@googlegroups.com


I'm not too knowledgeable on how all the data is initially generated, but it seems obvious that there are various methods or forms of how the data gets into the final data set

See the provider_credentials field, and you will see values such as:
MD
M.D. 
MD.
DO
D.O.

What are the reasons this data is not uniform in the first place? Is this simply due to different data entry clerks typing it in however they wish, or think it should be entered? Is this simply copied from different states internal databases, where states have different validation rules?

It would help if every field had a validation standard to follow on the points of entry where the data is initially generated. This would maintain data quality and remove some of the headaches working with the data set. I'm know I'm not the first one to wish the data was "clean" to begin with

Can some validation rules be implemented to ensure only clean data makes its way into the final data set?

http://goo.gl/fMhkWR



Thanks,
Greg
2014-02-06_1206.png

Hack G

unread,
Feb 7, 2014, 11:27:59 AM2/7/14
to np...@googlegroups.com

see also - provider first name and last name, where first or last have a . character at the beginning 



2014-02-07_1122.png

Alan Viars

unread,
Feb 7, 2014, 1:22:22 PM2/7/14
to np...@googlegroups.com
Hello Greg:

The errors you are pointing out are a result of free-text fields.  So non-letters should be removed form the beginning of name fields.  Is that your request? What about capitalization?  When I look at the data, I think we should put names and addresses in all caps.

Best,

Alan

Hack G

unread,
Feb 10, 2014, 10:22:03 AM2/10/14
to np...@googlegroups.com

I mean free text fields are fine...  but if this data is online and entered originally in a web based form, there should be javascript data validation that does not allow a user entering data, to put non-letter characters in the beginning of the name field, and pops up an error to user so they can correct their typing before being allowed to submit the information

If the source where data is originally entered is web based - I would think that all fields can have validation rules implemented using regular expression and JS

provider_credentials for example - if a data entry clerk typed in "MD" without the abbreviation . character, it can pop up an error saying "Please use proper abbreviation before continuing..."  or the system can auto-correct "MD" to "M.D." upon form submission. This way there will be not be some values where its "MD" and some where its "M.D."

lowerCAPS or Proper Capitalization are not much concern to me, because the data can still be queried in whatever database system of your choosing (you can still match  FN = Jane  to  FN = JANE)

Transforming the case of text fields is something we can do after the fact if needed, but I certainly don't see any harm in capitalizing names and address fields. Since uniformity in the data is what I seek here, I'm not against any idea that follows this concept

Alan Viars

unread,
Feb 11, 2014, 10:44:11 AM2/11/14
to np...@googlegroups.com
Hi Greg:

Some validations were not in place at the beginning of NPPES resulting in some cruft.

Some database technology is case sensitive and the capitalization has caused problems for me.

I'll be adding validation in the new NPPES redesigned prototype.

Assuming the redesign was on a public GitHub repo would you want to contribute?

Best,

Alan

Hack G

unread,
Feb 11, 2014, 12:20:35 PM2/11/14
to np...@googlegroups.com
Most SQL platforms have this covered, as do many noSQL technologies

Example: Using Lower() function in postgres
http://stackoverflow.com/questions/7005302/postgresql-how-to-make-not-case-sensitive-queries

select address
from npi
where lower(address) =

But this does and will affect performance, so it would certainly be a plus to standardize the casing of text.


I'll be adding validation in the new NPPES redesigned prototype.

Thanks! Validation for data entered is fantastic news - very glad to hear that will be included

I was actually going to suggest putting the entire project of "modernizing NPPES/NPI data" on github for everyone to see, contribute to and comment on there. That would be ideal I think, and I would certainly contribute where possible

- Greg

eric_...@jsi.com

unread,
Mar 7, 2014, 1:43:54 PM3/7/14
to np...@googlegroups.com
Provider Credential Text – should be limited to a defined drop-down list rather than free text entry.  OK to have an Other category with a description.  See related point on tying the Taxonomy codes to the current credentials listed - that would only be possible if there is a defined list of credentials for most common categories.


On Thursday, February 6, 2014 1:55:28 PM UTC-5, Hack G wrote:

Alan Viars

unread,
Mar 7, 2014, 2:21:06 PM3/7/14
to np...@googlegroups.com
Should it correspond to a license in all casses?

Eric Turer

unread,
Mar 7, 2014, 3:27:24 PM3/7/14
to Viars, Alan, np...@googlegroups.com
Hi again.... Did you mean should the taxonomy and credential always match or the license to always be found in a lookup?  If it's the first, I'm not sure I know all the taxonomies and if there would always be a 'credential' in the list.  Maybe the `ProviderLicenseUniverseFeb2014.csv`   file would help.  On the second point, I think it would much depend on how quickly and consistently it would be possible to get all license authorities to provide updated data.  In working with them I'm not too hopeful that you could always expect to find a corresponding license  - maybe in the future.  Did I understand that correctly?
 
ET

 
 
Eric S. Turer
Senior Health Services Consultant
John Snow, Inc. (JSI)
501 South St.
Bow, NH 03304
JSI Web Site
>>> Alan Viars <alan.c...@gmail.com> 3/7/2014 2:21 PM >>>
--
You received this message because you are subscribed to a topic in the Google Groups "NPPES Modernization Forum" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/nppes/SV3nZWbDNj8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to nppes+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Darrell DeVeaux

unread,
May 28, 2014, 9:49:24 AM5/28/14
to np...@googlegroups.com
My $0.02 is not so much that a bunch of standardizations and pop up errors are needed, as that is a turnoff to users and will likely not accomplish what you are seeking in many cases. Yeah maybe a couple, but that the backend database should clean up most of this itself! I completely agree with the comment of Hack G who said "most SQL platforms have this covered"...exactly.

The fact that a provider puts in MD, M.d, M.D. md etc. could be slightly controlled at front end but to be honest, I think you'd find in user acceptance testing that you will have a mess there. If you put a message like "please use proper format" what is that? Is it M.D. or M.D or MD? Just clean up this on backend. Credentials, address, phone, name and license number (based on state) can all be done on backend or with a little validation. State license can say something like ("please use #####-## format for this state") Other stuff probably skip the validation.

Cleaning up on backend only affects performance if the entry table is the same as the table used to store and query the data...which isn't the case now.

Again, for us, the above is based on experience. We normalize the NPPES file and do all of the above using MySQL and it's not a super heavy lift.


On Tuesday, February 11, 2014 10:44:11 AM UTC-5, Alan Viars wrote:
Reply all
Reply to author
Forward
0 new messages