Out[26]:
0 int64
1 int64
2 float64
3 float64
dtype: object
Can't this be just made simpler, either by documenting it well (and sorry, I can't help here, I didn't build the conversion engine for this, but I would be glad to help),
or either by adding keywords to control such behavior.
I would expect this to be as well controlable as to_dict() function behavior, which can be fairly well controlled...
I don't mean to be disrespectful about your tremendous and awesome work, I just point a fact that is primordial in my opinion; that a data structure done to handle data from other structures should be easy to handle itself, so to set and for this, it seems to me that keywords would be the best (or maybe, a list of dimensions, with wildcards for specific behavior like : align on the shortest/longest line, split some nested structures, and so on)
Hello
I have been working on topics implying a lot of meddling with data structures and Pandas was great to perform this.
That is, once you get how to input your data into it so it output a dataframe with good form.
this is especially the case when you want to handle nested data structures.
pd.DataFrame(pd.Series(l))
0
0 [1, 2]
1 [1, 2, 3]
2 [1, 2, 3, 4]
pd.DataFrame(l, input1d=True)
pd.DataFrame.from_input1d(l)
pd.DataFrame.from_dict(d, orient='index')
0 1 2 3
0 1 2 NaN NaN
1 1 2 3.0 NaN
2 1 2 3.0 4.0
Thanks for your answer. Actually I am less concerned about keystrokes than about processing time (having to build a Series before building the dataframe could be costly, for massive amount of data and multiple such columns).
Also, about the fact that for values with same object type , behavior is not constant: for dict, it is acquired that index will be index. For lists of lists, it is auto generated. Dict values could be considered as a list. And if it is a list of list, behavior is not the same: for dict it crashes, for list it works like a charm. Only pd.Series will manage the data the same way.
pd.DataFrame(d.values(), index=d.keys())
Thanks for the info about NaN. Does this mean that even if forcing datatype with dtype=int, it will give NaN?
You don't use lists to store massive amounts of data, do you? If you do then the main CPU overhead won't be the instantiation of a temporary Series. Do you have a realistic use case where this is an actual issue?
Sorry, I do not understand this. Dictionaries and lists are different things, dictionaries have keys and they are interpreted as index ot column labels in a well documented way IIRC. If you mean this should work:
pd.DataFrame(d.values(), index=d.keys())
take into account that dict_values is not a sequence type (it does not have __getitem__) so no, it is not list-like. You can use list(d.values()) or DataFrame.from_dict(d, orient='index') instead.