Weird pandas behavior when using a large list

20 views

Skip to first unread message

Kristian

unread,

May 31, 2016, 11:00:15 AM5/31/16

to PyData

I have the following CSV file:

1/4/2010,08:01:00,5982.8
1/4/2010,08:02:00,5992.2
1/4/2010,08:03:00,5990.2
...
1/4/2010,22:00:00,6000.1

My code is as following:

import pandas as pd

# Settings, Index & Columns Initialization
filename = 'Germany30-2.csv'

# Load data from CSV
dataFrame = pd.read_csv(filename,names=['date','time','price'],header=None)

# Set DateTime index
dataFrame.index = pd.to_datetime(dataFrame['date'],format='%m/%d/%Y')
dataFrame=dataFrame.drop('date',1)

# Pivot
data = dataFrame.pivot(columns='time',values='price')
print(data)

And the ouput is:

time        08:01:00  08:02:00  08:03:00  08:04:00  08:05:00  08:06:00  \
date                                                                     
2010-01-04    5982.8    5992.2    5990.2    5989.8    5993.7    5993.7

Everything as expected. But when I use a larger CSV file, exactly the same structure as above but with 3 years of data (480 datapoints per day) the output starts with "07:01:00" like this:

time        07:01:00  07:03:00  07:04:00  07:05:00  07:06:00  07:07:00  \
date                                                                     
2010-01-04       NaN       NaN       NaN       NaN       NaN       NaN   
2010-01-05       NaN       NaN       NaN       NaN       NaN       NaN   
2010-01-06       NaN       NaN       NaN       NaN       NaN       NaN   
2010-01-07       NaN       NaN       NaN       NaN       NaN       NaN

BUT there is NO 07:01:00 data in the CSV file, the data starts at 08:01:00.

What is the problem here? As you can see I am very new to pandas!

Reply all

Reply to author

Forward

0 new messages