Got a situation where our main, core data in our main tables consist of 15-20 normal columns (dates, integers, varchar, etc.) and then several "sets" of booleans. Example - health risk factors.....high blood pressure, diabetes, depression.....up to 20 risk factors, let's say. The user can choose none, all, or any combo in between in these sets of booleans. Well each of those fundamentally is just a bit column, with a zero or one, attached to the record concerning the individual person/event record. Well what if I have 8 or 9 "sets" of these boolean questions? This results in 150-200+ columns in my table. I know sql can handle up to 1024, and these data really belong on this record with this individual person/event.
I just wanted to see what others thought from the design perspective. Any suggestions???
On Jun 15, 1:05 am, CoreyB <unc27...@yahoo.com> wrote:
> Got a situation where our main, core data in our main tables consist > of 15-20 normal columns (dates, integers, varchar, etc.) and then > several "sets" of booleans. Example - health risk factors.....high > blood pressure, diabetes, depression.....up to 20 risk factors, let's > say. The user can choose none, all, or any combo in between in these > sets of booleans. Well each of those fundamentally is just a bit > column, with a zero or one, attached to the record concerning the > individual person/event record. Well what if I have 8 or 9 "sets" of > these boolean questions? This results in 150-200+ columns in my > table. I know sql can handle up to 1024, and these data really belong > on this record with this individual person/event.
> I just wanted to see what others thought from the design perspective. > Any suggestions???
That's quite easy. Let's say you have one column called 'Risk Factor' that can take 20 possible values. You can store values like 'Diabetes,High Blood Pressure,Depression'. This is one solution and I would prefer this. Another solution is you can have a varchar(20) with values like 10001000 etc where you set a bit for particular risk factor. You have to know the ordinal position for a particular risk and if that bit is set or not.
> On Jun 15, 1:05 am, CoreyB <unc27...@yahoo.com> wrote:
> > Got a situation where our main, core data in our main tables consist > > of 15-20 normal columns (dates, integers, varchar, etc.) and then > > several "sets" of booleans. Example - health risk factors.....high > > blood pressure, diabetes, depression.....up to 20 risk factors, let's > > say. The user can choose none, all, or any combo in between in these > > sets of booleans. Well each of those fundamentally is just a bit > > column, with a zero or one, attached to the record concerning the > > individual person/event record. Well what if I have 8 or 9 "sets" of > > these boolean questions? This results in 150-200+ columns in my > > table. I know sql can handle up to 1024, and these data really belong > > on this record with this individual person/event.
> > I just wanted to see what others thought from the design perspective. > > Any suggestions???
> That's quite easy. Let's say you have one column called 'Risk Factor' > that can take 20 possible values. You can store values like > 'Diabetes,High Blood Pressure,Depression'. This is one solution and I > would prefer this. Another solution is you can have a varchar(20) with > values like 10001000 etc where you set a bit for particular risk > factor. You have to know the ordinal position for a particular risk > and if that bit is set or not.
and of course if you want to be relational then you have a row for each 'Risk Factor'. So instead of growing the table horizontally you grow them vertically. So if a person has 'Diabetes' and 'High Blood Pressure' there are 2 rows for that person. Then you can group by a particular 'Risk Factor'. For example, find all patients where 'Risk Factor' is 'Diabetes' and sum how much they spent etc on medication etc.
I've seen that approach used in some other places in reading online, mainly with surveys & questionarres. But some issues I have with it are.....
1 - The relational form of a tall skinny table, with each answer as a row seems good if you may not have data for every question/element, so you save space on the questions that aren't answered. We will likely have data for each element. Which means we'll have billions of rows......per year. The size will start to become an issue after several years. 2 - Storage.....At first glance the wide tables seem like they'll be larger. But a lot of the columns are bits, which are optimized for storage, so if I have a wide table with 32 bit columns, they'll only take up 4 bytes of storage. But if I store the same answers (1 or 0) in another table as rows in a generic catch-all varchar column, and then have two IDs of int or bigint tying them back to the question & respondent, that's already 8 or 9 bytes per row minimum * 32 answers = 250+ bytes just for the one respondent. And there's way more than 32 - probably 100 or so. 3 - Data integrity. If everything has its own column in a wide table, then you can make sure that a bit is a bit, and a date is a datetime, and an integer is an int. But if everything is put in one generic column, then you give up a little bit of ground on the data integrity, and then I'm depending on the ETL process, or the developer to validate all data types.
As crappy as it sounds, the wide table looks better to me in this situation. Unless someone out here can talk me out of it.
On Jun 15, 12:29 am, SB <othell...@yahoo.com> wrote:
> On Jun 15, 10:09 am, SB <othell...@yahoo.com> wrote:
> > On Jun 15, 1:05 am, CoreyB <unc27...@yahoo.com> wrote:
> > > Got a situation where our main, core data in our main tables consist > > > of 15-20 normal columns (dates, integers, varchar, etc.) and then > > > several "sets" of booleans. Example - health risk factors.....high > > > blood pressure, diabetes, depression.....up to 20 risk factors, let's > > > say. The user can choose none, all, or any combo in between in these > > > sets of booleans. Well each of those fundamentally is just a bit > > > column, with a zero or one, attached to the record concerning the > > > individual person/event record. Well what if I have 8 or 9 "sets" of > > > these boolean questions? This results in 150-200+ columns in my > > > table. I know sql can handle up to 1024, and these data really belong > > > on this record with this individual person/event.
> > > I just wanted to see what others thought from the design perspective. > > > Any suggestions???
> > That's quite easy. Let's say you have one column called 'Risk Factor' > > that can take 20 possible values. You can store values like > > 'Diabetes,High Blood Pressure,Depression'. This is one solution and I > > would prefer this. Another solution is you can have a varchar(20) with > > values like 10001000 etc where you set a bit for particular risk > > factor. You have to know the ordinal position for a particular risk > > and if that bit is set or not.
> and of course if you want to be relational then you have a row for > each 'Risk Factor'. So instead of growing the table horizontally you > grow them vertically. So if a person has 'Diabetes' and 'High Blood > Pressure' there are 2 rows for that person. Then you can group by a > particular 'Risk Factor'. For example, find all patients where 'Risk > Factor' is 'Diabetes' and sum how much they spent etc on medication > etc.- Hide quoted text -
SB wrote: > On Jun 15, 1:05 am, CoreyB <unc27...@yahoo.com> wrote: >> Got a situation where our main, core data in our main tables consist >> of 15-20 normal columns (dates, integers, varchar, etc.) and then >> several "sets" of booleans. Example - health risk factors.....high >> blood pressure, diabetes, depression.....up to 20 risk factors, let's >> say. The user can choose none, all, or any combo in between in these >> sets of booleans. Well each of those fundamentally is just a bit >> column, with a zero or one, attached to the record concerning the >> individual person/event record. Well what if I have 8 or 9 "sets" of >> these boolean questions? This results in 150-200+ columns in my >> table. I know sql can handle up to 1024, and these data really belong >> on this record with this individual person/event.
>> I just wanted to see what others thought from the design perspective. >> Any suggestions???
> That's quite easy. Let's say you have one column called 'Risk Factor' > that can take 20 possible values. You can store values like > 'Diabetes,High Blood Pressure,Depression'. This is one solution and I > would prefer this. Another solution is you can have a varchar(20) with > values like 10001000 etc where you set a bit for particular risk > factor. You have to know the ordinal position for a particular risk > and if that bit is set or not.
Well, bit fields create a searching an filtering nightmares (ask me how I know :-).)
I don't believe there is a bullet-proof solution. I would consider even a de-normalized 1-to-1 relationship architecture option. For example, having a Patient table (PatientID + personal data), then RiskFactors table (PatientID + 20+ risk factor bit fields), etc. Since in most cases you search either for a singe patient or for group of patients that meet certain criteria, massive multi-table joins will not be required too often.
> I've seen that approach used in some other places in reading online, > mainly with surveys & questionarres. But some issues I have with it > are.....
> 1 - The relational form of a tall skinny table, with each answer as a > row seems good if you may not have data for every question/element, so > you save space on the questions that aren't answered. We will likely > have data for each element. Which means we'll have billions of > rows......per year. The size will start to become an issue after > several years. > 2 - Storage.....At first glance the wide tables seem like they'll be > larger. But a lot of the columns are bits, which are optimized for > storage, so if I have a wide table with 32 bit columns, they'll only > take up 4 bytes of storage. But if I store the same answers (1 or 0) > in another table as rows in a generic catch-all varchar column, and > then have two IDs of int or bigint tying them back to the question & > respondent, that's already 8 or 9 bytes per row minimum * 32 answers = > 250+ bytes just for the one respondent. And there's way more than 32 > - probably 100 or so. > 3 - Data integrity. If everything has its own column in a wide table, > then you can make sure that a bit is a bit, and a date is a datetime, > and an integer is an int. But if everything is put in one generic > column, then you give up a little bit of ground on the data integrity, > and then I'm depending on the ETL process, or the developer to > validate all data types.
> As crappy as it sounds, the wide table looks better to me in this > situation. Unless someone out here can talk me out of it.
> On Jun 15, 12:29 am, SB <othell...@yahoo.com> wrote:
> > On Jun 15, 10:09 am, SB <othell...@yahoo.com> wrote:
> > > On Jun 15, 1:05 am, CoreyB <unc27...@yahoo.com> wrote:
> > > > Got a situation where our main, core data in our main tables consist > > > > of 15-20 normal columns (dates, integers, varchar, etc.) and then > > > > several "sets" of booleans. Example - health risk factors.....high > > > > blood pressure, diabetes, depression.....up to 20 risk factors, let's > > > > say. The user can choose none, all, or any combo in between in these > > > > sets of booleans. Well each of those fundamentally is just a bit > > > > column, with a zero or one, attached to the record concerning the > > > > individual person/event record. Well what if I have 8 or 9 "sets" of > > > > these boolean questions? This results in 150-200+ columns in my > > > > table. I know sql can handle up to 1024, and these data really belong > > > > on this record with this individual person/event.
> > > > I just wanted to see what others thought from the design perspective. > > > > Any suggestions???
> > > That's quite easy. Let's say you have one column called 'Risk Factor' > > > that can take 20 possible values. You can store values like > > > 'Diabetes,High Blood Pressure,Depression'. This is one solution and I > > > would prefer this. Another solution is you can have a varchar(20) with > > > values like 10001000 etc where you set a bit for particular risk > > > factor. You have to know the ordinal position for a particular risk > > > and if that bit is set or not.
> > and of course if you want to be relational then you have a row for > > each 'Risk Factor'. So instead of growing the table horizontally you > > grow them vertically. So if a person has 'Diabetes' and 'High Blood > > Pressure' there are 2 rows for that person. Then you can group by a > > particular 'Risk Factor'. For example, find all patients where 'Risk > > Factor' is 'Diabetes' and sum how much they spent etc on medication > > etc.- Hide quoted text -
> > - Show quoted text -- Hide quoted text -
> - Show quoted text -
I think the first solution should work where you store the values as coma (or any other delimiter) separated values. For example for certain risk factor, for a customer values can be: diabetes,high blood pressure.