Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
need to mask patient info in HL7 messages
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  8 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
harryooopot...@hotmail.com  
View profile  
 More options Sep 26 2012, 9:40 pm
Newsgroups: comp.unix.shell
From: harryooopot...@hotmail.com
Date: Wed, 26 Sep 2012 18:40:24 -0700 (PDT)
Local: Wed, Sep 26 2012 9:40 pm
Subject: need to mask patient info in HL7 messages
I need a script, awk preferred, to mask sensitive patient info in HL7
messages.

The line numbers below do not belong to the HL7 messages; I just added
them for the sake of clarity in this posting.

I have some log files containing thousands of HL7 messages, separated by
blank lines, with real patient data. I need to mask out those sensitive
patient info before I could send these files to a third party (a Lab
Report Repostory) for them to use.

A) Sample HL7 message :
     1  MSH|^~\&|OPEN ENGINE|CLS|Egate|8832253|20120926150049||ORU^R01|Q521477659T517738211|P|2.3
     2  PID|1|123456789^^^AB|123456789^^^8832253|777888999^^^ULI~2444690^^^PSID|Nam e,Masked||19010131|F|||123 Random St^^Calgary^AB^A1B 2D3^CA^H^^83|83|(123)222-3333||ENG|S||100033344555^^^8832253|789030200||||| ||||||N
     3  PV1|1|D|01362^^^8832253|UR|||112233^Attending, Doctor|||||||||||D|32112345|ab||||||||||||||||||||||||20120925170500|201209 25213000
     4  OBR|1|001TKPWNZ|0589313008^101MA|2922077^URINE BACTERIAL CULTURE^L01N^M URINE^^MI|||20120925192000|||^CONTRIBUTOR_SYSTEM^SCMLAB^^^^^^^Personnel|||| 20120925211100|URINE^^^Midstream|10882^Khorrami, Katayoun^004406||||UR-12-1234567||20120926150043||MA|F||1^^^20120925191900^ ^ST~^^^^^ST|||||||||20120925192000
     5  OBX|1|TX|4384297^URINE BACTERIAL CULTURE^L01N^M URINE^^MI||*****Microbiology Urine*****|||A|||F|||20120926150043
     6  OBX|2|TX|4384297^URINE BACTERIAL CULTURE^L01N^M URINE^^MI|||||A|||F|||20120926150043
     7  OBX|3|TX|4384297^URINE BACTERIAL CULTURE^L01N^M URINE^^MI||  TEST: Urine Culture|||A|||F|||20120926150043

B) Some HL7 fields need_mask :
     1  MSH|^~\&|OPEN ENGINE|CLS|Egate|8832253|20120926150049||ORU^R01|Q521477659T517738211|P|2.3
     2  PID|1|PID-2.1_Need_Mask^^^AB|PID-3.1_Need_Mask^^^8832253|PID-4.1_Need_Mask^ ^^ULI~2444690^^^PSID|PID-5_Need_Mask||PID-7_Need_Mask|F|||PID-11.1_Need_Mas k^^Calgary^AB^PID-11.5_Need_Mask^CA^H^^83|83|PID-11.13_Need_Mask||ENG|S||10 0037307051^^^8832253|789030200|||||||||||N
     3  PV1|1|D|01362^^^8832253|UR|||PV1-7.1_Need_Mask^PV1-7.2_Need_Mask||||||||||| D|PV1-19_Need_Mask|ab||||||||||||||||||||||||20120925170500|20120925213000
     4  OBR|1|001TKPWNZ|0589313008^101MA|2922077^URINE BACTERIAL CULTURE^L01N^M URINE^^MI|||20120925192000|||^CONTRIBUTOR_SYSTEM^SCMLAB^^^^^^^Personnel|||| 20120925211100|URINE^^^Midstream|10882^Khorrami, Katayoun^004406||||UR-12-0178975||20120926150043||MA|F||1^^^20120925191900^ ^ST~^^^^^ST|||||||||20120925192000
     5  OBX|1|TX|4384297^URINE BACTERIAL CULTURE^L01N^M URINE^^MI||*****Microbiology Urine*****|||A|||F|||20120926150043
     6  OBX|2|TX|4384297^URINE BACTERIAL CULTURE^L01N^M URINE^^MI|||||A|||F|||20120926150043
     7  OBX|3|TX|4384297^URINE BACTERIAL CULTURE^L01N^M URINE^^MI||  TEST: Urine Culture|||A|||F|||20120926150043

If I have another file that tell what-to-mask-to-what, would it be easier?
Like ...

Format: <mask_to>, <HL7_Header>, <field> [,<subfield>]  // Comment

-- mask_spec.txt--
123456789;PID;2;1  // Patient ID
123456789;PID;3;1  // Patient ID
123456789;PID;4;1  // Patient ID
Name,Masked;PID;5  // Patient Name
19010131;PID;7     // Date of Birth
123 Random Street;PID;11;1  // Street Address
A1B 2D3;PID;11;5  // Postal Code
(123)222-3333;PID;11;13 // Phone Number
112233;PV1;7;1  // Physician ID
Attending,Doctor;PV1;7;2 // Attending Doctor Name
-- mask_spec.txt--

Any help appreciated.
TIA


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Janis Papanagnou  
View profile  
 More options Sep 27 2012, 1:42 am
Newsgroups: comp.unix.shell
From: Janis Papanagnou <janis_papanag...@hotmail.com>
Date: Thu, 27 Sep 2012 07:42:32 +0200
Local: Thurs, Sep 27 2012 1:42 am
Subject: Re: need to mask patient info in HL7 messages
On 27.09.2012 03:40, harryooopot...@hotmail.com wrote:

Your sample data are quite confusing and not very suitable to see what
you want. Also the partly field substitutions are not clear.

> If I have another file that tell what-to-mask-to-what, would it be easier?

It depends.

If you have to mask just specific fields in specific records I'd choose
another approach; mask those fields and, if necessary, save the mapping
in an independent file. Something like

  awk 'BEGIN { FS=OFS="|" }
    $1=="PID" {
       old_field3 = $3
       new_field3 = "mask3-" ++mask3count
       $3 = new_field3
       print old_field3, new_field3 >"mapping-file"  # if necessary
       print $0

       old_field11 = $11
       new_field11 = "mask11-" ++mask11count
       $11 = new_field11
       print old_field11, new_field11 >"mapping-file"  # if necessary
       print $0

       # etc. for other fields
    }

    $1=="PV1" {
       # similar as above for other record types
    }

    # etc. for more record types

  ' in_data  > out_data

That outlined approach can be made more concise by introducing a function
where the field number is a parameter.

Your fields also seem to be substituted partly only (in some cases?); so
an actual substitution would have to use the match() and substr() function
(or alternatively the sub()/gsub() function to replace only the relevant
part and keep the rest.

If you provide more concise sample data and an accurate description about
your substitution rules we can help you in more detail, and likely provide
an even simpler solution.

Janis


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ed Morton  
View profile  
 More options Sep 27 2012, 8:42 am
Newsgroups: comp.unix.shell
From: Ed Morton <mortons...@gmail.com>
Date: Thu, 27 Sep 2012 07:42:39 -0500
Local: Thurs, Sep 27 2012 8:42 am
Subject: Re: need to mask patient info in HL7 messages
On 9/26/2012 8:40 PM, harryooopot...@hotmail.com wrote:

> I need a script, awk preferred, to mask sensitive patient info in HL7
> messages.

> The line numbers below do not belong to the HL7 messages; I just added
> them for the sake of clarity in this posting.

> I have some log files containing thousands of HL7 messages, separated by
> blank lines, with real patient data. I need to mask out those sensitive
> patient info before I could send these files to a third party (a Lab
> Report Repostory) for them to use.

It's good that you provided some sample input but couldn't you come up with
something much briefer that REPRESENTS your input instead of something which
presumably IS your in/out and is so lengthy with all those non-alpha-numeric
characters and wrapping lines? If you could put a little effort into that it'd
save everyone reading your post from having to put that effort into
understanding your data and so make it much more likely we'd take that time and
come up with the best answer for you.

The expected output for that input would be very useful too.

Not unless the fields you want to "mask" vary for different input files or
something.

> Like ...

> Format: <mask_to>, <HL7_Header>, <field> [,<subfield>]  // Comment

Your format above says comma-separated fields, but your data has semi-colon
separated fields.

It looks like your fields are separated by "|"s and sub-fields by "^"s, and you
are numbering your fields starting at 0 (while awk starts them at 1) and your
sub-fields at 1. I THINK all you need to do is something like:

awk '
BEGIN { FS=OFS="|" }
$1 == "PID" {
    sub(/^[^^]+\^/,"123456789,",$3)
    sub(/^[^^]+\^/,"123456789,",$4)
    sub(/^[^^]+\^/,"123456789,",$5)

    $6 = "Name,Masked"
    $8 = "19010131"

    n = split($12,sf,/\^/)
    sf[1]  = "123 Random Street"
    sf[5]  = "A1B 2D3"
    sf[13] = "(123)222-3333"
    $12 = sep = ""
    for (i=1;i<=n;i++) {
       $12 = $12 sep sf[i]
       sep = "^"
    }

}

$1 == "PV1" {
    n = split($8,sf,/\^/)
    sf[1]  = "112233"
    sf[2]  = "Attending,Doctor"
    $8 = sep = ""
    for (i=1;i<=n;i++) {
       $8 = $8 sep sf[i]
       sep = "^"
    }
}

{ print }
' input_file

but without a simpler input file and the expected output it's hard to tell.

Regards,

     Ed.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
harryooopot...@hotmail.com  
View profile  
 More options Sep 27 2012, 10:10 am
Newsgroups: comp.unix.shell
From: harryooopot...@hotmail.com
Date: Thu, 27 Sep 2012 07:10:49 -0700 (PDT)
Local: Thurs, Sep 27 2012 10:10 am
Subject: Re: need to mask patient info in HL7 messages
Ed,

Your advice and solution are much appreciated.

Your codes work well with the following simplified input.

$ fold -60 infile.txt
PID|1|777777777^^^AB|888888888^^^8832253|999999999^^^ULI~244
4690^^^PSID|Name,Orig||20010131|F|||123 Orig St^^Calgary^AB^
A9B 9D9^CA^H^^83|83|(666)666-666||ENG|S||100033344555^^^8832
253|789030200|||||||||||N
PV1|1|D|01362^^^8832253|UR|||555555^Doctor, Original||||||||
|||D|32112345|ab||||||||||||||||||||||||20120925170500|20120
925213000

$ ./mask.awk < infile.txt | fold -60
PID|1|123456789,^^AB|123456789,^^8832253|123456789,^^ULI~244
4690^^^PSID|Name,Masked||19010131|F|||123 Random Street^^Cal
gary^AB^A1B 2D3^CA^H^^83|83|(666)666-666||ENG|S||10003334455
5^^^8832253|789030200|||||||||||N
PV1|1|D|01362^^^8832253|UR|||112233^Attending,Doctor||||||||
|||D|32112345|ab||||||||||||||||||||||||20120925170500|20120
925213000

Thanks


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
harryooopot...@hotmail.com  
View profile  
 More options Sep 27 2012, 10:41 am
Newsgroups: comp.unix.shell
From: harryooopot...@hotmail.com
Date: Thu, 27 Sep 2012 07:41:56 -0700 (PDT)
Local: Thurs, Sep 27 2012 10:41 am
Subject: Re: need to mask patient info in HL7 messages
Janis,

I tried it out and your codes work well.

$ cat mask.awk
awk 'BEGIN { FS=OFS="|" }
  $1=="PID" {
    old_field3 = $3
    new_field3 = "123456789" ++mask3count
    $3 = new_field3
    print $0
  }'

$ fold -60 infile.txt
PID|1|777777777^^^AB|888888888^^^8832253|999999999^^^ULI~244
4690^^^PSID|Name,Orig||20010131|F|||123 Orig St^^Calgary^AB^
A9B 9D9^CA^H^^83|83|(666)666-666||ENG|S||100033344555^^^8832
253|789030200|||||||||||N
PV1|1|D|01362^^^8832253|UR|||555555^Doctor, Original||||||||
|||D|32112345|ab||||||||||||||||||||||||20120925170500|20120
925213000

$ ./mask.awk < infile.txt | fold -60
PID|1|1234567891|888888888^^^8832253|999999999^^^ULI~2444690
^^^PSID|Name,Orig||20010131|F|||123 Orig St^^Calgary^AB^A9B
9D9^CA^H^^83|83|(666)666-666||ENG|S||100033344555^^^8832253|
789030200|||||||||||N

Thanks


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
harryooopot...@hotmail.com  
View profile  
 More options Sep 27 2012, 11:25 am
Newsgroups: comp.unix.shell
From: harryooopot...@hotmail.com
Date: Thu, 27 Sep 2012 08:25:20 -0700 (PDT)
Local: Thurs, Sep 27 2012 11:25 am
Subject: Re: need to mask patient info in HL7 messages
P.S.

It was my fault on the Phone Number mask_spec ...
It should be
  (123)222-3333;PID;13;1 // Phone Number
instead of
  (123)222-3333;PID;11;13 // Phone Number
.
So the awk snippet should be
      $14 = "(123)222-3333"
instead.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ed Morton  
View profile  
 More options Sep 28 2012, 8:49 am
Newsgroups: comp.unix.shell
From: Ed Morton <mortons...@gmail.com>
Date: Fri, 28 Sep 2012 07:49:53 -0500
Local: Fri, Sep 28 2012 8:49 am
Subject: Re: need to mask patient info in HL7 messages
On 9/27/2012 9:10 AM, harryooopot...@hotmail.com wrote:

> Ed,

> Your advice and solution are much appreciated.

> Your codes work well with the following simplified input.

Then you could make it much more concise with a function, e.g. (untested):

function updsfs(srcS,deltasA,   srcA,tgtS,sep,i,n) {
    n = split(srcS,srcA,/\^/)
    for (i=1;i<=n;i++) {
       tgtS = tgtS sep (i in deltasA ? deltasA[i] : srcA[i])
       sep = "^"
       delete deltasA[i]
    }
    return tgtS

}

BEGIN { FS=OFS="|" }
$1 == "PID" {
    sf[1] = "123456789"
    $3 = updsfs($3,sf)

    sf[1] = "123456789"
    $4 = updsfs($4,sf)

    sf[1] = "123456789"
    $5 = updsfs($5,sf)

    $6 = "Name,Masked"
    $8 = "19010131"

    sf[1] = "123 Random Street"
    sf[5] = "A1B 2D3"
    sf[13] = "(123)222-3333"
    $12 = updsfs($12,sf)

}

$1 == "PV1" {
    sf[1] = "112233"
    sf[2] = "Attending,Doctor"
    $8 = updsfs($8,sf)

}

{ print }

Regards,

    Ed.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Eric  
View profile  
 More options Sep 28 2012, 2:40 pm
Newsgroups: comp.unix.shell
From: Eric <e...@deptj.eu>
Date: Fri, 28 Sep 2012 19:37:37 +0100
Local: Fri, Sep 28 2012 2:37 pm
Subject: Re: need to mask patient info in HL7 messages
On 2012-09-27, harryooopot...@hotmail.com <harryooopot...@hotmail.com> wrote:

> I need a script, awk preferred, to mask sensitive patient info in HL7
> messages.

> The line numbers below do not belong to the HL7 messages; I just added
> them for the sake of clarity in this posting.

> I have some log files containing thousands of HL7 messages, separated by
> blank lines, with real patient data. I need to mask out those sensitive
> patient info before I could send these files to a third party (a Lab
> Report Repostory) for them to use.

Totally OT for this group, but are you sure you should be mapping
(say) all patient IDs to the same value? What about data for the same
patient? I would have said you needed a safe-haven list of real<->fake
IDs (and possibly some other fields).

Eric
--
ms fnd in a lbry


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »