Theissue is due to Month being a character, the as.yearmon() function appears to require month as an integer. The code below joins Month to its integer and achieves what you are looking to do. However, another option would be to use regular dates of the form YYYY-mm-dd and just fixing the day component as 01 as this may allow you to do date arithmetic more easily.
Working with dates in R requires more attention than working with other object classes. Below, we offer some tools and example to make this process less painful. Luckily, dates can be wrangled easily with practice, and with a set of helpful packages such as lubridate.
Upon import of raw data, R often interprets dates as character objects - this means they cannot be used for general date operations such as making time series and calculating time intervals. To make matters more difficult, there are many ways a date can be formatted and you must help R know which part of a date represents what (month, day, hour, etc.).
This code chunk shows the loading of packages required for this page. In this handbook we emphasize p_load() from pacman, which installs the package if necessary and loads it for use. You can also load installed packages with library() from base R. See the page on R basics for more information on R packages.
We import the dataset of cases from a simulated Ebola epidemic. If you want to download the data to follow along step-by-step, see instruction in the Download handbook and data page. We assume the file is in the working directory so no sub-folders are specified in this file path.
Converting character objects to dates can be made easier by using the lubridate package. This is a tidyverse package designed to make working with dates and times more simple and consistent than in base R. For these reasons, lubridate is often considered the gold-standard package for dates and time, and is recommended whenever working with them.
The lubridate package provides several different helper functions designed to convert character objects to dates in an intuitive, and more lenient way than specifying the format in as.Date(). These functions are specific to the rough date format, but allow for a variety of separators, and synonyms for dates (e.g. 01 vs Jan vs January) - they are named after abbreviations of date formats.
To convert a 2-digit year into a 4-digit year (all in the same century) you can convert to class character and then combine the existing digits with a pre-fix using str_glue() from the stringr package (see Characters and strings). Then convert to date.
You can use the lubridate functions make_date() and make_datetime() to combine multiple numeric columns into one date column. For example if you have numeric columns onset_day, onset_month, and onset_year in the data frame linelist:
As previously mentioned, R also supports a datetime class - a column that contains date and time information. As with the Date class, these often need to be converted from character objects to datetime objects.
A standard datetime object is formatted with the date first, which is followed by a time component - for example 01 Jan 2020, 16:30. As with dates, there are many ways this can be formatted, and there are numerous levels of precision (hours, minutes, seconds) that can be supplied.
When working with a data frame, time and date columns can be combined to create a datetime column using str_glue() from stringr package and an appropriate lubridate function. See the page on Characters and strings for details on stringr.
If your data contain only a character time (hours and minutes), you can convert and manipulate them as times using strptime() from base R. For example, to get the difference between two of these times:
Note however that without a date value provided, it assumes the date is today. To combine a string date and a string time together see how to use stringr in the section just above. Read more about strptime() here.
In a data frame context, if either of the above dates is missing, the operation will fail for that row. This will result in an NA instead of a numeric value. When using this column for calculations, be sure to set the na.rm = argument to TRUE. For example:
Note: using format() will convert the values to class Character, so this is generally used towards the end of an analysis or for display purposes only! You can see the complete list by running ?strptime.
Note that lubridate also has functions week(), epiweek(), and isoweek(), each of which has slightly different start dates and other nuances. Generally speaking though, floor_date() should be all that you need. Read the details for these functions by entering ?week into the console or reading the documentation here.
When data is present in different time time zones, it can often be important to standardise this data in a unified time zone. This can present a further challenge, as the time zone component of data must be coded manually in most cases.
In R, each datetime object has a timezone component. By default, all datetime objects will carry the local time zone for the computer being used - this is generally specific to a location rather than a named timezone, as time zones will often change in locations due to daylight savings time. It is not possible to accurately compensate for time zones without a time component of a date, as the event a date column represents cannot be attributed to a specific time, and therefore time shifts measured in hours cannot be reasonably accounted for.
To deal with time zones, there are a number of helper functions in lubridate that can be used to change the time zone of a datetime object from the local time zone to a different time zone. Time zones are set by attributing a valid tz database time zone to the datetime object. A list of these can be found here - if the location you are using data from is not on this list, nearby large cities in the time zone are available and serve the same purpose.
lead() and lag() are functions from the dplyr package which help find previous (lagged) or subsequent (leading) values in a vector - typically a numeric or date vector. This is useful when doing calculations of change/difference between time units.
I have a date in my fact table in the format DD/MM/YYYY and I want to convert it to Year-Month format MMM-YYYY. I will be using this new Month-Year field to aggregate my fact table by Month-Year like below.
It does give the same output and will be a shorter expression, but I have seen some issues with using MonthName in set analysis when you have to do >= or or This vignette covers time series class conversion toand from the many time series classes in R including the general dataframe (or tibble) and the various time series classes (xts,zoo, and ts).
The timetk package provides tools that solve the issueswith conversion, maximizing attribute extensibility (the required dataattributes are retained during the conversion to each of the primarytime series classes). The following tools are available to coerce andretrieve key information:
Index function: tk_index returnsthe index. When the argument, timetk_idx = TRUE, Atime-based index (non-regularized index) of forecastobjects, models, and ts objects is returned if present.Refer to tk_ts() to learn about non-regularized indexpersistence during the conversion process.
This vignette includes a brief case study on conversion issues andthen a detailed explanation of timetk function conversionbetween time-based tbl objects and several primary timeseries classes (xts, zoo, zooregand ts).
The ts object class has roots in the statspackage and many popular packages use this time series data structureincluding the popular forecast package. With that said, thets data structure is the most difficult to coerce back andforth because by default it does not contain a time-based index. Ratherit uses a regularized index computed using the start andfrequency arguments. Conversion to ts is doneusing the ts() function from the statslibrary, which results in various problems.
We can get the index using the index() function from thezoo package. The index retained is a regular sequence ofnumeric values. In many cases, the regularized values cannot be coercedback to the original time-base because the date and date time datacontains significantly more information (i.e. year-month-day,hour-minute-second, and timezone attributes) and the data may not be ona regularized interval (frequency).
The timetk package contains a new function,tk_ts(), that enables maintaining the original date indexas an attribute. When we repeat the tbl to tsconversion process using the new function, tk_ts(), we cansee a few differences.
First, only numeric columns get coerced, which prevents unintendedconsequences due to R conversion rules (e.g. dates gettingunintentionally converted or characters causing the homogeneous datastructure converting all numeric values to character). If a column isdropped, the user gets a warning.
The function has_timetk_idx() can be used to testwhether toggling the timetk_idx argument in thetk_index() and tk_tbl() functions will have aneffect on the output. Here are several examples using the ten yeartreasury data used in the case study:
The timetk_idx argument will only have an effect onobjects that use regularized time series. Therefore,has_timetk_idx() returns FALSE for otherobject types (e.g. tbl, xts, zoo)since toggling the argument has no effect on these classes.
I teach how to build a HPTFS System in my High-PerformanceTime Series Forecasting Course. If interested in learningScalable High-Performance Forecasting Strategies then takemy course. You will learn:
3a8082e126