Is I2B2DEMODATA.CONCEPT_DIMENSION.CONCEPT_CD supposed to be unique?

106 views
Skip to first unread message

AFB

unread,
Sep 22, 2014, 7:52:16 PM9/22/14
to i2b2-ins...@googlegroups.com
If I understand correctly, the only thing linking I2B2DEMODATA.OBSERVATION_FACT to I2B2DEMODATA.CONCEPT_DIMENSION is the CONCEPT_CD column, is that correct?

Therefore, there must be a unique constraint on the CONCEPT_CD column in CONCEPT_DIMENSION, otherwise there would be exploding joins. Is that also correct?

Thanks.

AFB

unread,
Sep 22, 2014, 9:32:16 PM9/22/14
to i2b2-ins...@googlegroups.com
Okay, looks like in the actual I2B2DEMODATA content, CONCEPT_CD is not unique:

select count(*) from(
select concept_cd,count(*) N from concept_dimension
group by concept_cd)
where n>1
union all
select count(*) from(
select concept_cd,count(*) N from concept_dimension
group by concept_cd)
where n=1;

/*
non unique    1671
unique    70130
*/

...but then, I2B2 devs, what's the point of even having a CONCEPT_DIMENSION? The many-to-many relationship between paths and actual physical, non-redundant observations is already served by the ontology tables in I2B2METADATA. CONCEPT_DIMENSION could be a mapping to a set of reference paths and descriptions chosen so that starting from a set of observations (e.g. "show me what the most frequent diagnoses were between these dates for these patients") there would always be available a human-readable set of annotations. But, when CONCEPT_CD is permitted to be non-unique, such joins cannot will give redundant observations.

So, my question becomes this: what, if anything, in I2B2 actually relies on there being this many-to-many relationship between entries in CONCEPT_DIMENSION and OBSERVATION_FACT?

If nothing, then this suggests I can work around this problem by altering CONCEPT_DIMENSION to only have unique CONCEPT_CDs without impacting anything else. Thoughts?

Murphy, Shawn N.

unread,
Sep 23, 2014, 7:15:53 AM9/23/14
to i2b2-ins...@googlegroups.com

The primary key of the concept dimension is the concept_path.   The logical interpretation is that a singled coded item (such as the medication “aspirin”) could exist under multiple classification paths,  such as an anti inflammatory medication and a cardiac medication.

 

Thanks,

Shawn.

--
You received this message because you are subscribed to the Google Groups "i2b2 Install Help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to i2b2-install-h...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.

AFB

unread,
Sep 23, 2014, 10:47:00 AM9/23/14
to i2b2-ins...@googlegroups.com
Hi, Shawn, long time no see and thanks for your response.

Aren't multiple concept paths already provided by the ontologies in the I2B2METADATA schema?

For the aspirin example, if you use CONCEPT_PATH search for encounters where aspirin was prescribed as a cardiac medication, you will also get all visits associated with anti-inflammatory prescriptions and vice-versa.

If CONCEPT_PATHs encode actual data rather than metadata, that information is lost once that observation is imported into OBSERVATION_FACT because of this ambiguity in the mapping between CONCEPT_CD and CONCEPT_PATH. So, it logically follows that either OBSERVATION_FACT should contain a CONCEPT_PATH column or a uniqueness constraint should be enforced on CONCEPT_CD.

I'm guessing that it's probably easier to implement the latter fix at my end. I'm just wondering what components of I2B2 query CONCEPT_DIMENSION.CONCEPT_PATH to find concept codes instead of querying e.g. I2B2METADATA.I2B2.C_FULLNAME

Murphy, Shawn N.

unread,
Sep 23, 2014, 10:59:30 AM9/23/14
to i2b2-ins...@googlegroups.com

If you are looking to go from the fact table to identifying a concept with the fact table code, c_name should be unique and is probably what you are looking for.  A real world problem is that the classification paths can be multiple for each item and that is what the concept path deals with.

 

Thanks,

Shawn.

 

From: i2b2-ins...@googlegroups.com [mailto:i2b2-ins...@googlegroups.com] On Behalf Of AFB
Sent: Tuesday, September 23, 2014 10:47 AM
To: i2b2-ins...@googlegroups.com
Subject: Re: Is I2B2DEMODATA.CONCEPT_DIMENSION.CONCEPT_CD supposed to be unique?

 

Hi, Shawn, long time no see and thanks for your response.

--

You received this message because you are subscribed to the Google Groups "i2b2 Install Help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to i2b2-install-h...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages