Hi Selva,
I was able to reproduce the issue locally:
In [1]: import modin.pandas as pd
In [2]: df = pd.read_csv("2e16x2e6.csv.gz")
In [3]: df
Out[3]:
Unnamed: 0 Unnamed: 0.1 col0 col1 col2 col3 col4 col5 col6 col7 col8 col53 col54 col55 col56 col57 col58 col59 col60 col61 col62 col63
0 0 0 90 32 83 38 31 31 95 72 55 67 2 2 74 22 11 68 60 64 84 52
1 1 1 60 14 0 53 76 56 74 24 54 29 57 20 35 73 28 6 96 6 75 77
2 2 2 38 94 7 11 42 50 38 30 11 2 86 2 66 47 97 24 35 30 61 66
3 3 3 76 60 49 57 79 21 26 69 30 38 59 89 99 23 65 48 2 73 79 30
4 4 4 12 90 71 50 0 43 70 17 64 3 57 39 77 68 45 91 6 59 25 55
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
65531 65531 65531 30 13 28 1 8 37 97 33 41 42 21 19 15 22 23 59 11 1 39 49
65532 65532 65532 34 25 5 22 19 8 13 45 63 95 72 16 95 22 96 71 80 94 3 30
65533 65533 65533 78 12 86 13 55 56 26 15 16 47 0 25 56 4 79 57 50 20 17 13
65534 65534 65534 69 38 55 35 21 67 35 9 28 64 27 99 37 39 42 7 97 74 82 65
65535 65535 65535 84 67 48 95 21 49 51 49 18 41 68 52 85 42 14 96 99 88 10 15
[65536 rows x 66 columns]
In [4]: df[df.columns.difference(df.filter(like='Unnamed').columns)]
Out[4]:
col0 col1 col10 col11 col12 col13 col14 col15 col16 col17 col18 col57 col58 col59 col6 col60 col61 col62 col63 col7 col8 col9
0 90 32 14 15 71 10 46 84 81 25 4 22 11 68 95 60 64 84 52 72 55 93
1 60 14 9 1 12 30 86 99 55 30 45 73 28 6 74 96 6 75 77 24 54 45
2 38 94 65 87 29 26 96 19 81 71 98 47 97 24 38 35 30 61 66 30 11 16
3 76 60 68 13 19 94 90 58 93 59 81 23 65 48 26 2 73 79 30 69 30 99
4 12 90 79 90 37 69 40 66 88 91 51 68 45 91 70 6 59 25 55 17 64 78
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
65531 30 13 48 29 53 37 55 12 66 50 59 22 23 59 97 11 1 39 49 33 41 47
65532 34 25 86 97 41 71 25 77 88 43 12 22 96 71 13 80 94 3 30 45 63 68
65533 78 12 8 31 0 22 22 64 77 5 52 4 79 57 26 50 20 17 13 15 16 20
65534 69 38 66 26 40 55 83 84 58 3 93 39 42 7 35 97 74 82 65 9 28 22
65535 84 67 6 0 0 24 88 90 74 58 15 42 14 96 51 99 88 10 15 49 18 53
[65536 rows x 64 columns]
In [5]: df.columns
Out[5]:
Index(['Unnamed: 0', 'Unnamed: 0.1', 'col0', 'col1', 'col2', 'col3', 'col4',
'col5', 'col6', 'col7', 'col8', 'col9', 'col10', 'col11', 'col12',
'col13', 'col14', 'col15', 'col16', 'col17', 'col18', 'col19', 'col20',
'col21', 'col22', 'col23', 'col24', 'col25', 'col26', 'col27', 'col28',
'col29', 'col30', 'col31', 'col32', 'col33', 'col34', 'col35', 'col36',
'col37', 'col38', 'col39', 'col40', 'col41', 'col42', 'col43', 'col44',
'col45', 'col46', 'col47', 'col48', 'col49', 'col50', 'col51', 'col52',
'col53', 'col54', 'col55', 'col56', 'col57', 'col58', 'col59', 'col60',
'col61', 'col62', 'col63'],
dtype='object')
It is a metadata issue, where we are dropping the columns in the data, but not in the metadata tracked separately. It should be a simple fix, thanks for the report! I will open an issue on the GitHub repo to track this.
Devin