Getting sequence of time and subset it

100 views
Skip to first unread message

akrun

unread,
May 30, 2016, 5:18:21 AM5/30/16
to julia-users
I have a dataset similar to the one below

   sdt1 = DataFrame(ID = 1:2, StartTime = DateTime(["4/13/2016 07:00", "4/13/2016 07:15"], "m/d/y H:M"),
                            EndTime = DateTime(["4/13/2016 12:00", "4/13/2016 12:15"], "m/d/y H:M"),
                            BreakTime1 = DateTime(["4/13/2016 10:00", "4/13/2016 10:15"], "m/d/y H:M"),
                            BreakTime2 = DateTime(["4/13/2016 12:00", "4/13/2016 12:15"], "m/d/y H:M")
                            )

I can get the sequence of date time using

   for i in 1:nrow(sdt1)
   si = Dates.format([sdt1[i,:StartTime]:Dates.Minute(30):sdt1[i,:EndTime]], "HH:MM")
    println(si)
  end

#Any["07:00","07:30","08:00","08:30","09:00","09:30","10:00","10:30","11:00","11:30","12:00"]
#Any["07:15","07:45","08:15","08:45","09:15","09:45","10:15","10:45","11:15","11:45","12:15"]


Suppose I want to subset the sequence to not include the times in BreakTime columns, how do I do it?  I have tried
  for i in 1:nrow(sdt1)
    si = Dates.format([sdt1[i,:StartTime]:Dates.Minute(30):sdt1[i,:EndTime]], "HH:MM")
     println(si in sdt1[i, [:BreakTime1, :BreakTime2]])
   end                    

I get an error message
MethodError: `start` has no method matching start(::DataFrames.DataFrame)
 in mapreduce_sc_impl at reduce.jl:194
 in in at reduce.jl:377
 
I can do this using data.table in R

    library(data.table)
    dt1 <- data.table(ID = 1:2, StartTime =as.POSIXct(c("4/13/2016 07:00", "4/13/2016 07:15"), "%m/%d/%Y %H:%M", tz = "GMT"),
                            EndTime =as.POSIXct(c("4/13/2016 14:00", "4/13/2016 14:15"), "%m/%d/%Y %H:%M", tz = "GMT"),
                            BreakTime1 = as.POSIXct(c("4/13/2016 10:00", "4/13/2016 10:15"), "%m/%d/%Y %H:%M", tz = "GMT"),
                            BreakTime2 = as.POSIXct(c("4/13/2016 12:00", "4/13/2016 12:15"), "%m/%d/%Y %H:%M", tz = "GMT"))
    dt1[, {
     s1 <- head(seq(StartTime, EndTime, by = "30 min"), -1)
     list(time= format(s1[!s1 %in% c(BreakTime1, BreakTime2)], "%H:%M"))},
      by = ID]
    ID  time
 1:  1 07:00
 2:  1 07:30
 3:  1 08:00
 4:  1 08:30
 5:  1 09:00
 6:  1 09:30
 7:  1 10:30
 8:  1 11:00
 9:  1 11:30
10:  1 12:30
11:  1 13:00
12:  1 13:30
13:  2 07:15
14:  2 07:45
15:  2 08:15
16:  2 08:45
17:  2 09:15
18:  2 09:45
19:  2 10:45
20:  2 11:15
21:  2 11:45
22:  2 12:45
23:  2 13:15
24:  2 13:45

Thanks.




  

Evan Fields

unread,
May 30, 2016, 11:52:24 AM5/30/16
to julia-users
I'm not 100% sure I understand your question, but let me give it a shot. 

First thing is to note why you're getting that MethodError. It's from the line

println(si in sdt1[i, [:BreakTime1, :BreakTime2]])

Define sdt1 as you do (I just copied into a Julia REPL) and set i = 1 to for the first iteration of the for loop.

Then:

julia> sdt1[i, [:BreakTime1, :BreakTime2]]
1×2 DataFrames.DataFrame
│ Row │ BreakTime1          │ BreakTime2          │
├─────┼─────────────────────┼─────────────────────┤
│ 1   │ 2016-04-13T10:00:00 │ 2016-04-13T12:00:00 │

julia> si = Dates.format([sdt1[i,:StartTime]:Dates.Minute(30):sdt1[i,:EndTime]], "HH:MM")
WARNING: [a] concatenation is deprecated; use collect(a) instead
 in depwarn at deprecated.jl:73
 in oldstyle_vcat_warning at abstractarray.jl:29
 in vect at abstractarray.jl:32
while loading no file, in expression starting on line 0
11-element Array{Any,1}:
 "07:15"
 "07:45"
 "08:15"
 "08:45"
 "09:15"
 "09:45"
 "10:15"
 "10:45"
 "11:15"
 "11:45"
 "12:15"

julia> si in sdt1[i, [:BreakTime1, :BreakTime2]]
ERROR: MethodError: `start` has no method matching start(::DataFrames.DataFrame)
 in mapreduce_sc_impl at reduce.jl:194
 in in at reduce.jl:377

So si is of type Vector{Any} but happens to hold strings. sdt1[i, [:BreakTime1, :BreakTime2]] is a DataFrame. What does it mean to ask if a vector is in a DataFrame? That's what's happening in your println; the `in` is just an infix operator of function in.

3-element Array{Int64,1}:
 1
 2
 3

julia> 1 in x
true

julia> in(1,x)
true

julia> in(5,x)
false

I assume you get the MethodError of no method start because in(a,b) iterates over b, checking each element of b for equality to a. But DataFrames are not iterable in this way. (This part just a guess; look it up in the manual before telling your friends.)

Anyway, to get output as from your R snippet, you could use a loop like the following:

julia> for i in 1:nrow(sdt1), t in sdt1[:StartTime][i]:Dates.Minute(30):sdt1[:EndTime][i]
           if !(t in [sdt1[:BreakTime1][i], sdt1[:BreakTime2][i]])
               println("$i $(Dates.format(t, "HH:MM"))")
           end
       end
1 07:00
1 07:30
1 08:00
1 08:30
1 09:00
1 09:30
1 10:30
1 11:00
1 11:30
2 07:15
2 07:45
2 08:15
2 08:45
2 09:15
2 09:45
2 10:45
2 11:15
2 11:45

I hope this helps!

akrun

unread,
May 31, 2016, 9:35:44 AM5/31/16
to julia-users
Many thanks for the solution and it works well with the example posted.   But, if I have missing values in one of the columns, it throws an error.

   sdt2 = DataFrame(ID = 1:2, StartTime = DateTime(["4/13/2016 07:00", "4/13/2016 07:15"], "m/d/y H:M"),

                           EndTime = DateTime(["4/13/2016 12:00", "4/13/2016 12:15"], "m/d/y H:M"),
                           BreakTime1 = DateTime([NA, "4/13/2016 10:15"], "m/d/y H:M"),

                           BreakTime2 = DateTime(["4/13/2016 12:00", "4/13/2016 12:15"], "m/d/y H:M")


  d1= DataFrame(id=Int64[], Time = AbstractString[])
for i in 1:nrow(sdt2), t in [sdt2[:StartTime][i]:Dates.Minute(30):sdt2[:EndTime][i]]
  if !(t in [sdt2[:BreakTime1][i], sdt2[:BreakTime2][i]])
      push!(d1, [i, Dates.format(t, "HH:MM")])
      end
   end

   MethodError: `convert` has no method matching convert(::Type{Int64}, ::Array{Any,1})
This may have arisen from a call to the constructor Int64(...),
since type constructors fall back to convert methods.
Closest candidates are:
  call{T}(::Type{T}, ::Any)
  convert(::Type{Int64}, !Matched::Int8)
  convert(::Type{Int64}, !Matched::UInt8)

Is there any way to correct this?

Thanks.

Evan Fields

unread,
Jun 1, 2016, 9:18:38 AM6/1/16
to julia-users
Of course there's a way :)

You can use the isna function to check if a value is NA or not. There's also the dropna function which takes a DataArray as input and returns a regular Vector with the NA elements removed.

You could try something like the following:

firstbreakcol = 4
lastbreakcol = 5
for i in 1:nrow(sdt2), t in sdt2[:StartTime][i]:Dates.Minute(30):sdt2[:EndTime][i]
    shouldprint = true
    for j in firstbreakcol:lastbreakcol

        if !isna(sdt2[i,j]) && sdt2[i,j] == t
            # t is in a break column
            shouldprint = false
            break
        end
    end
    if shouldprint
        println(t)
    end
end

It's a little verbose, true. I'm sure there's something cleaner with dropna, filter, etc.
Reply all
Reply to author
Forward
0 new messages