parquet and columns that arent always shared

1 view
Skip to first unread message

db042188

unread,
May 9, 2018, 7:35:27 AM5/9/18
to Hadoop Learners from Hadoop-skills.com
Hi.  Someone suggested parquet for my first dive into hadoop for an app that contains survey data.  Survey data contains questions x answers x forms x studies x person being evaluated x person doing the evaluation x eval # and a host of other dimensions (attributes).  Not all attributes are shared across all studies.  Which from my understanding means not all columns in parquet will be relevant to all evaluations.  Can parquet handle this?  My goal is to aggregate metrics contained in the data but to also handle the comments contained in about 7% of the 100 million data points.  An answer can be a metric or a comment. 
Reply all
Reply to author
Forward
0 new messages