Selecting date ranges?

780 views
Skip to first unread message

Aron Lindberg

unread,
Oct 21, 2013, 11:55:17 AM10/21/13
to sq...@googlegroups.com
Hi,

I'm having some problem selecting rows based on date ranges. I have this data, where the dates are in POSIXct format:

     id     event                time
1  1545 discussed 2013-09-14 01:23:55
2  1545    closed 2013-09-14 01:23:57
3  1545 discussed 2013-09-13 18:58:00
4  1544    closed 2013-09-14 01:36:57
5  1544 discussed 2013-09-14 01:36:55
6  1543    opened 2013-09-13 11:10:33
7  1542    opened 2013-09-12 20:19:11
8  1541 discussed 2013-09-12 20:11:48
9  1541 discussed 2013-09-12 20:27:55
10 1541 discussed 2013-09-12 20:35:54


Then I run this code:

sequences_subset <- sqldf("SELECT * FROM sequences WHERE time > 2013-09-13")
sequences_subset


Which does not generate an error message, but simply selects everything. How can I select based on a date range? I've been playing around with method = "Date", but don't know how to get it to apply to a single column. I know I should RTFM, which I have tried to do, but it is quite difficult to decipher.

Any assistance would be helpful and appreciated!

Best,
Aron

Gabor Grothendieck

unread,
Oct 21, 2013, 4:48:24 PM10/21/13
to sq...@googlegroups.com
2013-09-13 means 2013 minus 9 minus 13 which is not likely what you want.

We would need the output of 

   dput(sequences) 

to reproduce it.

Aron Lindberg

unread,
Oct 21, 2013, 8:18:13 PM10/21/13
to sq...@googlegroups.com
Here's is the dput. Thanks!

> dput(sequences)
structure
(list(id = c(1545L, 1545L, 1545L, 1544L, 1544L, 1543L,
1542L, 1541L, 1541L, 1541L), event = structure(c(3L, 2L, 3L,
2L, 3L, 8L, 8L, 3L, 3L, 3L), .Label = c("assigned", "closed",
"discussed", "head_ref_deleted", "head_ref_restored", "mentioned",
"merged", "opened", "referenced", "reopened", "reviewed", "subscribed",
"synchronize", "unsubscribed"), class = "factor"), time = structure(c(1379136235,
1379136237, 1379113080, 1379137017, 1379137015, 1379085033, 1379031551,
1379031108, 1379032075, 1379032554), class = c("POSIXct", "POSIXt"
))), .Names = c("id", "event", "time"), row.names = c(NA, 10L
), class = "data.frame")

Gabor Grothendieck

unread,
Oct 21, 2013, 9:11:32 PM10/21/13
to sq...@googlegroups.com
Using the dput output to recreate sequences, as str shows the column named "time" is of class POSIXct:

> str(sequences)
'data.frame':   10 obs. of  3 variables:
 $ id   : int  1545 1545 1545 1544 1544 1543 1542 1541 1541 1541
 $ event: Factor w/ 14 levels "assigned","closed",..: 3 2 3 2 3 8 8 3 3 3
 $ time : POSIXct, format: "2013-09-14 01:23:55" "2013-09-14 01:23:57" "2013-09-13 18:58:00" "2013-09-14 01:36:57" ..

SQLite has no date/time type so such columns are stored in SQLite as the number of seconds since 1907-01-01 00:00:00 GMT.  We can use the SQLite strftime function to convert such a number to text and then do the comparison using text strings:

> sqldf("select * from sequences
+ where strftime('%Y-%m-%d', time, 'unixepoch', 'localtime') > '2013-09-13' ")
    id     event                time
1 1545 discussed 2013-09-14 01:23:55
2 1545    closed 2013-09-14 01:23:57
3 1544    closed 2013-09-14 01:36:57
4 1544 discussed 2013-09-14 01:36:55

An alternative would be to use the H2 database backend to sqldf for this because the H2 database does support a date/time type.  In that case it would just be this:

> library(RH2) 
> library(sqldf)
> sqldf("select * from sequences where time > '2013-09-13 23:59:59' ")
    id     event                time
1 1545 discussed 2013-09-14 01:23:55
2 1545    closed 2013-09-14 01:23:57
3 1544    closed 2013-09-14 01:36:57
4 1544 discussed 2013-09-14 01:36:55

In the above as RH2 is loaded, sqldf will notice that and use RH2 instead of SQLite.  Note that RH2 is a java database and java must be installed for the above to work Java has an automated installer and is very easy to install.  Only need the java runtime is needed.  The java development environment is not needed.

Aron Lindberg

unread,
Oct 22, 2013, 9:34:55 AM10/22/13
to sq...@googlegroups.com
Thank you! That is very helpful. Another, less sophisticated answer that I came up with is to convert the desired date range into seconds-since-the-epoch and do the SQL query directly in that format:

date1 <- as.integer(as.POSIXct("2013-09-12 20:19:11"))
date2
<- as.integer(as.POSIXct("2013-09-13 18:58:00"))

sequences_subset
<- sqldf("SELECT * FROM sequences WHERE time >= 1379031551 AND time <= 1379113080")
sequences_subset

Gabor Grothendieck

unread,
Oct 22, 2013, 10:13:11 AM10/22/13
to sq...@googlegroups.com
In the same vein this also works:

> Bounds <- data.frame(from = as.POSIXct("2013-09-12 20:19:11"), to = as.POSIXct("2013-09-13 18:58:00"))
> sqldf('SELECT s.* FROM sequences s, Bounds b WHERE time between "from" AND "to"')
    id     event                time
1 1545 discussed 2013-09-13 18:58:00
2 1543    opened 2013-09-13 11:10:33
3 1542    opened 2013-09-12 20:19:11
4 1541 discussed 2013-09-12 20:27:55
5 1541 discussed 2013-09-12 20:35:54

Note that from is a keyword so we had to double quote it.
Reply all
Reply to author
Forward
0 new messages