Dear all,
I have a dataset with wide format data, and try to reshape it into long format. I encounter a problem while using pandas.wide_to_long of columns. A sample of data is attached. My code is
ins = pandas.read_excel('sampleData.xlsx', na_values=['--'])
ins.columns = ['stkcd', 'coname'] + [y + str(x) for y in ['revt', 'rev', 'ni', 'eps', 'cfo', 'beforeextraper'] for x in range(2003, 2015)]
ins_long = pandas.wide_to_long(ins, ['revt', 'rev', 'ni', 'eps', 'cfo', 'beforeextraper'], i = 'stkcd', j='year')
And then, I got a ValueError:
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-91-4c4c324681e7> in <module>() 3 for x in range(2003, 2015)] 4 ins_long = pandas.wide_to_long(ins, ['revt', 'rev', 'ni', 'eps', 'cfo', 'beforeextraper'], ----> 5 i = 'stkcd', j='year') /usr/lib/python3.4/site-packages/pandas/core/reshape.py in wide_to_long(df, stubnames, i, j) 938 for stub in stubnames[1:]: 939 new = melt_stub(df, stub, id_vars, j) --> 940 newdf = newdf.merge(new, how="outer", on=id_vars + [j], copy=False) 941 return newdf.set_index([i, j]) 942 /usr/lib/python3.4/site-packages/pandas/core/frame.py in merge(self, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy) 4137 left_on=left_on, right_on=right_on, 4138 left_index=left_index, right_index=right_index, sort=sort, -> 4139 suffixes=suffixes, copy=copy) 4140 4141 #---------------------------------------------------------------------- /usr/lib/python3.4/site-packages/pandas/tools/merge.py in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy) 36 right_index=right_index, sort=sort, suffixes=suffixes, 37 copy=copy) ---> 38 return op.get_result() 39 if __debug__: 40 merge.__doc__ = _merge_doc % '\nleft : DataFrame' /usr/lib/python3.4/site-packages/pandas/tools/merge.py in get_result(self) 203 result = typ(result_data).__finalize__(self, method='merge') 204 --> 205 self._maybe_add_join_keys(result, left_indexer, right_indexer) 206 207 return result /usr/lib/python3.4/site-packages/pandas/tools/merge.py in _maybe_add_join_keys(self, result, left_indexer, right_indexer) 228 key_col.put( 229 na_indexer, com.take_1d(self.right_join_keys[i], --> 230 right_na_indexer)) 231 elif name in self.right: 232 na_indexer = (right_indexer == -1).nonzero()[0] /usr/lib/python3.4/site-packages/pandas/core/series.py in put(self, *args, **kwargs) 376 numpy.ndarray.put 377 """ --> 378 self.values.put(*args, **kwargs) 379 380 def __len__(self): ValueError: could not convert string to float: 't2014'
If I change the columns of DataFrame ins into this,
ins.columns = ['stkcd', 'coname'] + [y + str(x) for y in ['trev', 'rev', 'ni', 'eps', 'cfo', 'beforeextraper'] for x in range(2003, 2015)]
ins_long = pandas.wide_to_long(ins, ['trev', 'rev', 'ni', 'eps', 'cfo', 'beforeextraper'], i = 'stkcd', j='year')
They give me the right answers. It seems that the problem comes from the parsing names of column, “revt” and “rev”. Is this a bug?.
Peng Lian
P.S.
pandas.util.print_versions.show_versions()
INSTALLED VERSIONS ------------------ commit: None python: 3.4.3.final.0 python-bits: 64 OS: Linux OS-release: 4.1.4-1-ARCH machine: x86_64 processor: byteorder: little LC_ALL: None LANG: zh_CN.utf8 pandas: 0.16.2+278.g58ae9db nose: 1.3.7 Cython: 0.22.1 numpy: 1.9.2 scipy: 0.16.0 statsmodels: 0.7.0.dev0+32e739d IPython: 3.2.1 sphinx: None patsy: 0.4.0-dev dateutil: 2.4.2 pytz: 2015.4 bottleneck: None tables: None numexpr: 2.4 matplotlib: 1.4.3 xlrd: 0.9.4 xlwt: None