You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Hadoop Learners from Hadoop-skills.com
Hi. Someone suggested parquet for my first dive into hadoop for an app that contains survey data. Survey data contains questions x answers x forms x studies x person being evaluated x person doing the evaluation x eval # and a host of other dimensions (attributes). Not all attributes are shared across all studies. Which from my understanding means not all columns in parquet will be relevant to all evaluations. Can parquet handle this? My goal is to aggregate metrics contained in the data but to also handle the comments contained in about 7% of the 100 million data points. An answer can be a metric or a comment.