Issue 55 in rpostgresql: Problem handling timestamp columns (without timezone) in R when pulling data using RPostgreSQL

6 views
Skip to first unread message

rpost...@googlecode.com

unread,
Jul 22, 2013, 9:09:58 AM7/22/13
to rpostgr...@googlegroups.com
Status: New
Owner: ----
Labels: Type-Defect Priority-Medium

New issue 55 by xjosi...@gmail.com: Problem handling timestamp columns
(without timezone) in R when pulling data using RPostgreSQL
http://code.google.com/p/rpostgresql/issues/detail?id=55

I'm trying to pull data from a PostgreSQL database and the results for a
timestamp field are inconsistent. I'm not sure if I'm handling POSIXct
results properly. Otherwise, I think I found a bug in the RPostgreSQL
package. Here are the steps to reproduce the problem:

1. Create testing database and table:

CREATE DATABASE mydb;
CREATE TABLE test_table
(
"DateTime" timestamp without time zone NOT NULL,
CONSTRAINT "pk_test_table" PRIMARY KEY ("DateTime")
)
WITH (
OIDS=FALSE
);
ALTER TABLE test_table
OWNER TO postgres;

2. And let’s say there are a few hundred records. I will populate them in
R. Here is the code:

library(RPostgreSQL)

# Let's feed the table with some sequence of date/time values
date_values <- as.chron(seq(10000, 10500, 1/24))

format.chron <- function(z) {
sprintf("%04.0f-%02.0f-%02.0f %02.0f:%02.0f:00",
as.numeric(as.character(years(z))),
months(z),
as.numeric(as.character(days(z))),
as.numeric(as.character(hours(z))),
as.numeric(as.character(minutes(z))))
}

.generateInsertQuery <- function(date_values, field_name, table_name) {
insert_val <- paste(paste0("(", sQuote(format(date_values)), ")"),
collapse=',')
qry <- paste("INSERT INTO", dQuote(table_name), paste0("(",
dQuote(field_name), ")"), "VALUES", insert_val)
qry
}

drv <- dbDriver('PostgreSQL')
con <- dbConnect(drv, user='postgres', dbname='mydb')
qry <- .generateInsertQuery(date_values, "DateTime", "test_table")
dbSendQuery(con, qry)

3. Trying to get the values from postgres, the time component gets stripped
out of the resulting data

res <- dbGetQuery(con, "SELECT * FROM test_table")
res[1:20,1]

The class of the result, however, is POSIXct

class(res[,1])

If the result is fetched one record at a time, the values with hour:min
equal to 00:00 loose the time component:

rs <- dbSendQuery(con, "SELECT \"DateTime\" FROM test_table")
res_list <- list()
for(i in 1:100) res_list[i] <- fetch(rs,1)
res_list

===================================================================

The expected output is a data.frame with a POSIXct column with date and
time components.
What I see instead is a POSIXct column only with date components. The time
component is stripped out of the data. If I want to see at least partially
correct values, I need to fetch results one at a time.

===================================================================

My system specs are:
"PostgreSQL 9.2.4, compiled by Visual C++ build 1600, 64-bit"
Win 7 Enterprise 64-bit
R 3.0.1 64-bit
RPostgreSQL version 0.4
DBI version 0.2-7

--
You received this message because this project is configured to send all
issue notifications to this address.
You may adjust your notification preferences at:
https://code.google.com/hosting/settings

rpost...@googlecode.com

unread,
Jul 22, 2013, 11:04:48 PM7/22/13
to rpostgr...@googlegroups.com

Comment #1 on issue 55 by tomoa...@kenroku.kanazawa-u.ac.jp: Problem
handling timestamp columns (without timezone) in R when pulling data using
RPostgreSQL
http://code.google.com/p/rpostgresql/issues/detail?id=55

Please don't use sQuote, dQuote in constructing queries. That make
incompatible quotes, for left side of the quote.

Since POSIXct just hold the seconds since the epoch, nothing is stripped.
compare the following two:
as.POSIXct("1997-05-23 00:00:00")
as.POSIXct("1997-05-23 01:00:00")

You should see that the 00:00:00 are not printed. This does not mean
anything
is lost. The difference is just how POSIXct are printed and does not relate
to
the behavior of RPostgreSQL.

rpost...@googlecode.com

unread,
Jul 23, 2013, 4:18:30 AM7/23/13
to rpostgr...@googlegroups.com
Updates:
Labels: -Priority-Medium Priority-Low

Comment #2 on issue 55 by tomoa...@kenroku.kanazawa-u.ac.jp: Problem
handling timestamp columns (without timezone) in R when pulling data using
RPostgreSQL
http://code.google.com/p/rpostgresql/issues/detail?id=55

See also issue 44

To recover the string with the trailing 00:00:00,
you can try
format(as.POSIXct("1997-05-23"), "%Y-%m-%d %H:%M:%OS")

rpost...@googlecode.com

unread,
Jul 23, 2013, 11:45:41 AM7/23/13
to rpostgr...@googlegroups.com

Comment #3 on issue 55 by xjosi...@gmail.com: Problem handling timestamp
columns (without timezone) in R when pulling data using RPostgreSQL
http://code.google.com/p/rpostgresql/issues/detail?id=55

Thanks for the reply. I forgot to include:
options(useFancyQuotes=FALSE)
that makes dQuote/sQuote output compatible with the query.

Dirk replied to my previous post in StackOverflow:
http://stackoverflow.com/questions/17754635/is-there-a-specific-way-to-handle-timestamp-columns-in-r-when-pulling-data-using

Apparently RPostgreSQL does not handle timestamps without timezone
correctly. However, it does handle timestamps with time zone, which is
even better. I even found some gaps in my dataset which were making my
time series irregular.

I think not having timezone was a key part of the issue. When Postgres
field does not account for timezones, and RPostgreSQL reads the field into
R, it throws the timestamp field into a POSIXct vector, which apparently
always expects the values with timezones. It turns out that some values
from the database were incompatible with this expectation. For example,
one of the old db records had "2008-03-09 02:00:00", which is not a valid
time value when using daylight savings, as the clocks went 1 hour forward
at that time. POSIXct simply stripped out the time part of the value and
converted it to "2008-03-09" instead. Since the result from this field is
a vector in R, all the data in the vector was also transformed to
YYYY-MM-DD.

Jose A. Aponte

rpost...@googlecode.com

unread,
Jul 23, 2013, 11:48:21 AM7/23/13
to rpostgr...@googlegroups.com

Comment #4 on issue 55 by xjosi...@gmail.com: Problem handling timestamp
columns (without timezone) in R when pulling data using RPostgreSQL
http://code.google.com/p/rpostgresql/issues/detail?id=55

Thanks for the reply. I forgot to include: options(useFancyQuotes=FALSE)
which makes dQuote/sQuote output compatible with the query.

Dirk replied to my previous post in StackOverflow:
http://stackoverflow.com/questions/17754635/is-there-a-specific-way-to-handle-timestamp-columns-in-r-when-pulling-data-using

Apparently RPostgreSQL does not handle timestamps without timezone
correctly. However, it does handle timestamps with time zone, which is
even better. I even found some gaps in my dataset which were making my
time series irregular.

I think not having timezone was a key part of the issue. When Postgres
field does not account for timezones, and RPostgreSQL reads the field into
R, it throws the timestamp field into a POSIXct vector, which apparently
always expects the values with timezones. It turns out that some values
from the database were incompatible with this expectation. For example,
one of the old db records had "2008-03-09 02:00:00", which is not a valid
time value when using daylight savings, as the clocks went 1 hour forward
at that time. POSIXct simply stripped out the time part of the value and
converted it to "2008-03-09" instead. Since the result from this field is
a vector in R, and all the POSIXct data in the vector needs to have the
same format, the format of all the data was also turned into YYYY-MM-DD.

rpost...@googlecode.com

unread,
Jul 23, 2013, 11:07:18 PM7/23/13
to rpostgr...@googlegroups.com

Comment #5 on issue 55 by tomoa...@kenroku.kanazawa-u.ac.jp: Problem
handling timestamp columns (without timezone) in R when pulling data using
RPostgreSQL
http://code.google.com/p/rpostgresql/issues/detail?id=55

An important point is that timestamp without timezone is not defined
properly.
No one can determine what time it precisely refers to, without external
information.
So, I don't think there are any correct way.

If daylight savings is the problem, you may simply
try ignoring them.

> Sys.setenv(TZ='UTC')
and then pull the data.

By the way, what time zone combination are you using?
So far, I can not reproduce
"2008-03-09 02:00:00" be treated as invalid and just a day.

Please distinguish string and POSIXct data. POSIXct data is a numeric data
and do not contain the format you mentioned.

> as.POSIXct(c("2008-03-09 00:00:00", "2008-03-09 02:00:00","2008-03-09
> 03:00:00"),tz='CEST')
[1] "2008-03-09 00:00:00 CEST" "2008-03-09 02:00:00 CEST"
[3] "2008-03-09 03:00:00 CEST"
> as.POSIXct(c("2008-03-09 00:00:00", "2008-03-09 02:00:00","2008-03-09
> 03:00:00"),tz='CEST')[1]
[1] "2008-03-09 CEST"
> as.POSIXct(c("2008-03-09 00:00:00", "2008-03-09 02:00:00","2008-03-09
> 03:00:00"),tz='CEST')[2]
[1] "2008-03-09 02:00:00 CEST"

rpost...@googlecode.com

unread,
Jul 24, 2013, 2:53:31 PM7/24/13
to rpostgr...@googlegroups.com

Comment #6 on issue 55 by xjosi...@gmail.com: Problem handling timestamp
columns (without timezone) in R when pulling data using RPostgreSQL
http://code.google.com/p/rpostgresql/issues/detail?id=55

Hmmm... I guess that is an important detail I should have included. I'm in
NY, USA, so I'm using the EST/EDT time zone. I wasn't aware of the
Sys.env(TZ) option. Thanks for the tip. Maybe it would be helpful if
POSIXct threw an error for invalid times, instead of converting it to
absolute dates. Of course, this has nothing to do with RPostgreSQL.

rpost...@googlecode.com

unread,
Apr 24, 2015, 2:35:24 PM4/24/15
to rpostgr...@googlegroups.com

Comment #7 on issue 55 by howard.s...@formsdirect.net: Problem handling
timestamp columns (without timezone) in R when pulling data using
RPostgreSQL
https://code.google.com/p/rpostgresql/issues/detail?id=55

I have the same issue. The datetime column only returns the date portion
and not the time portion. :(
Reply all
Reply to author
Forward
0 new messages