hi stian and alasdair,
there's a real use case along these lines that is part of the broad's TCGA firehose pipeline. for each tumor type and for each of the platforms (gene expression, miRNA, methylation, etc.) the data is stored per subject. part of the broad pipeline 'merges' all the values from the subject files into one file per column from the set of original files per subject. so for miRNA, there is a file that merges all the raw values, a file that merges all the RPKM values, and a file that merges the values from the cross-mapping column. the gene names are not duplicated, they are the row headers. so no values are changed, just a bit of modest reformatting and filtering.
Institute for Systems Biology
 http://gdac.broadinstitute.org/runs/stddata__2014_07_15/data/STAD/20140715/ and the file gdac.broadinstitute.org_STAD.Merge_mirnaseq__illuminaga_mirnaseq__bcgsc_ca__Level_3__miR_gene_expression__data.Level_3.2014071500.0.0.tar.gz