Comprehensive Date-Time Handling for R

  r-lib

  Davis Vaughan

We’re thrilled to announce the first release of clock. clock is a new package providing a comprehensive set of tools for working with date-times. It is packed with features, including utilities for: parsing, formatting, arithmetic, rounding, and extraction/updating of individual components. In addition to these tools for manipulating date-times, clock provides entirely new date-time types which are structured to reduce the agony of working with time zones as much as possible. At a high-level, clock:

  • Provides a new family of date-time classes (durations, time points, zoned-times, and calendars) that partition responsibilities so that you only have to think about time zones when you need them.

  • Implements a high level API for Date and POSIXct classes that lets you get productive quickly without having to learn the details of clock’s new date-time types.

  • Requires explicit handling of invalid dates (e.g. what date is one month after January 31st?) and nonexistent or ambiguous times (caused by daylight saving time issues).

  • Is built on the C++ date library, which provides a correct and high-performance backend. In general, operations on Dates are much faster with clock than with lubridate. Currently, operations on POSIXct have roughly the same performance between clock and lubridate (clock’s performance with POSIXct will improve greatly in a future release, once a few upstream changes in date are accepted).

You can install it from CRAN with:

install.packages("clock")

This blog post will show off a few of clock’s unique features. To learn more, you’ll want to take a look at clock’s vignettes:

Thanks to Julie Jung, clock has an amazing logo:



What about lubridate?

If you’ve ever worked with dates or date-times in R, you’ve probably used lubridate. lubridate has powerful capabilities for working with this kind of data. So, why clock?

One of the primary motivations for creating clock was to improve on lubridate’s handling of invalid dates and daylight saving time. As you’ll see in the following sections, clock tries extremely hard to guard you from unexpected problems that can arise from these two complex concepts.

Additionally, clock provides a variety of new types for working with date-times. While lubridate is solely focused on working with R’s native Date and POSIXct classes, clock goes many steps further with types such as: date-times without an implied time zone, nanosecond precision date-times, built-in granular types such as year-month and year-quarter, and a type for representing a weekday.

lubridate will never go away, and is not being deprecated or superseded. As of now, we consider clock to be an alternative to lubridate. You can stick with one or the other, or use them together, as there are no name conflicts between the two. Keep in mind that clock is a young package, with plenty of room to grow. If you have any feedback about clock, or questions about its design, we’d love for you to open an issue.

First steps

The best place to start learning about clock is by checking out the High-Level API. This lists all of the utilities in clock that work with R’s native date (Date) and date-time (POSIXct) types. You’ll notice that all of these helpers start with one of the following prefixes:

  • get_*(): Get a component

  • set_*(): Set a component

  • add_*(): Add a unit of time

  • date_*(): General date manipulation

We’ll explore some of these with a trimmed down version of the flights dataset from the nycflights13 package.

flights
#> # A tibble: 100 x 5
#>     year month   day dep_time dep_delay
#>    <int> <int> <int>    <int>     <dbl>
#>  1  2013     1     6     1827        -3
#>  2  2013     1     8     1458        -2
#>  3  2013     1    17     1823        80
#>  4  2013     1    26     1052        13
#>  5  2013     1    29      448       -12
#>  6  2013     1    30        3       124
#>  7  2013     2     1      816        -9
#>  8  2013     2     4     1943         3
#>  9  2013     2    10     1508        36
#> 10  2013     2    13     2033        33
#> # … with 90 more rows

The flight departure date is separated into year, month, and day fields. We can combine these together into a Date with date_build().

flights <- flights %>%
  mutate(
    date = date_build(year, month, day), 
    .keep = "unused", 
    .before = 1
  )

flights
#> # A tibble: 100 x 3
#>    date       dep_time dep_delay
#>    <date>        <int>     <dbl>
#>  1 2013-01-06     1827        -3
#>  2 2013-01-08     1458        -2
#>  3 2013-01-17     1823        80
#>  4 2013-01-26     1052        13
#>  5 2013-01-29      448       -12
#>  6 2013-01-30        3       124
#>  7 2013-02-01      816        -9
#>  8 2013-02-04     1943         3
#>  9 2013-02-10     1508        36
#> 10 2013-02-13     2033        33
#> # … with 90 more rows

If you need to get those individual components back, extract them with the corresponding get_*() function.

mutate(flights, year = get_year(date), month = get_month(date))
#> # A tibble: 100 x 5
#>    date       dep_time dep_delay  year month
#>    <date>        <int>     <dbl> <int> <int>
#>  1 2013-01-06     1827        -3  2013     1
#>  2 2013-01-08     1458        -2  2013     1
#>  3 2013-01-17     1823        80  2013     1
#>  4 2013-01-26     1052        13  2013     1
#>  5 2013-01-29      448       -12  2013     1
#>  6 2013-01-30        3       124  2013     1
#>  7 2013-02-01      816        -9  2013     2
#>  8 2013-02-04     1943         3  2013     2
#>  9 2013-02-10     1508        36  2013     2
#> 10 2013-02-13     2033        33  2013     2
#> # … with 90 more rows

To summarize the average departure delay by month, one option is to use date_group() to group by the current month of the year. For Dates, this ends up setting every day of the month to 1.

flights %>%
  mutate(date = date_group(date, "month")) %>%
  group_by(date) %>%
  summarise(avg_delay = mean(dep_delay, na.rm = TRUE), .groups = "drop")
#> # A tibble: 12 x 2
#>    date       avg_delay
#>    <date>         <dbl>
#>  1 2013-01-01     33.3 
#>  2 2013-02-01     16   
#>  3 2013-03-01     41.8 
#>  4 2013-04-01     -3.14
#>  5 2013-05-01     -1.14
#>  6 2013-06-01     12.2 
#>  7 2013-07-01     15.2 
#>  8 2013-08-01     26.2 
#>  9 2013-09-01     10.2 
#> 10 2013-10-01     21   
#> 11 2013-11-01     -4.14
#> 12 2013-12-01      6.2

If you’ve used lubridate before, you would have probably used lubridate::floor_date() for this. In clock, date summarization is broken into three groups: grouping, shifting, and rounding. This separation leads to code that is both less surprising, and more powerful, giving you the ability to summarize in new ways, such as: flooring by multiple weeks, grouping by day of the quarter, and flooring by rolling sets of, say, 60 days.

Be sure to check out the many other high-level tools for working with dates, including powerful utilities for formatting ( date_format()) and parsing ( date_parse() and date_time_parse()).

As a lubridate user, none of the above should seem particularly revolutionary, and that’s the entire idea of the high-level API. We’ve tried to make transitioning over to clock as easy as possible. In the following sections, you’ll see some of the benefits you’ll get from doing so.

Invalid dates

Using our flights data, imagine we want to add 1 month to date, perhaps to set up some kind of forward looking variable. With lubridate, we can use + months(1).

mutate(flights, date2 = date + months(1))
#> # A tibble: 100 x 4
#>    date       dep_time dep_delay date2     
#>    <date>        <int>     <dbl> <date>    
#>  1 2013-01-06     1827        -3 2013-02-06
#>  2 2013-01-08     1458        -2 2013-02-08
#>  3 2013-01-17     1823        80 2013-02-17
#>  4 2013-01-26     1052        13 2013-02-26
#>  5 2013-01-29      448       -12 NA        
#>  6 2013-01-30        3       124 NA        
#>  7 2013-02-01      816        -9 2013-03-01
#>  8 2013-02-04     1943         3 2013-03-04
#>  9 2013-02-10     1508        36 2013-03-10
#> 10 2013-02-13     2033        33 2013-03-13
#> # … with 90 more rows

Huh, what’s up with those NA values? Let’s try with clock:

mutate(flights, date2 = add_months(date, 1))
#> Error: Problem with `mutate()` input `date2`.
#>  Invalid date found at location 5. Resolve invalid date issues by specifying the `invalid` argument.
#>  Input `date2` is `add_months(date, 1)`.

What’s this about an invalid date? Location 5? Taking a closer look, we can see that adding 1 month to 2013-01-29 theoretically results in 2013-02-29, which doesn’t exist. In clock, this is known as an invalid date. With lubridate, invalid dates result in a silent NA. With clock, an error is raised.

So, how do we handle this? Well, there are a number of things that you could do:

  • Return NA

  • Return the previous valid moment in time

  • Return the next valid moment in time

  • Overflow our invalid date into March by the number of days past the true end of February that it landed at

With lubridate, %m+% (i.e.  add_with_rollback()) can help with the second and third bullets. The hardest part about %m+% is just remembering to use it. It is a common bug to forget to use this helper until after you have been bitten by an invalid date issue with an unexpected NA.

With clock, the error message advised us to use the invalid argument to add_months(). This allows for explicitly specifying one of many invalid date resolution strategies.

problems <- flights %>%
  select(date) %>%
  slice(c(5, 6))

problems %>%
  mutate(
    date2 = add_months(date, 1, invalid = "previous"),
    date3 = add_months(date, 1, invalid = "next"),
    date4 = add_months(date, 1, invalid = "overflow"),
    date5 = add_months(date, 1, invalid = "NA")
  )
#> # A tibble: 2 x 5
#>   date       date2      date3      date4      date5     
#>   <date>     <date>     <date>     <date>     <date>    
#> 1 2013-01-29 2013-02-28 2013-03-01 2013-03-01 NA        
#> 2 2013-01-30 2013-02-28 2013-03-01 2013-03-02 NA

The overarching goal of clock is to protect you from issues like invalid dates by erroring early and often, rather than letting them slip through unnoticed, only to cause hard to debug issues down the line. If you’re thinking, “That would never happen to me!", consider that if you had a daily sequence of every date in a particular year, and added 1 month to each date in that sequence, you would immediately generate 7 invalid dates (6 if you chose a leap year).

Daylight saving time

The dep_time column of flights contains the hour and minute of the actual departure time, encoded together into a single integer. Let’s extract that.

flights_hm <- flights %>%
  select(date, dep_time) %>%
  mutate(
    hour = dep_time %/% 100L,
    minute = dep_time %% 100L,
    .keep = "unused"
  )

head(flights_hm, n = 2)
#> # A tibble: 2 x 3
#>   date        hour minute
#>   <date>     <int>  <int>
#> 1 2013-01-06    18     27
#> 2 2013-01-08    14     58

We’d like to be able to add this time of day information to our date column. This flight information was recorded in the America/New_York time zone, so our resulting date-time should have that time zone as well. However, converting Date -> POSIXct will always assume that Date starts as UTC, rather than being naive to any time zones, and the result will use your system’s local time zone. This can have unintended side effects:

# My local time zone is actually America/New_York.
# The conversion to POSIXct retains the underlying UTC instant, but
# the printed time changes unexpectedly, showing the equivalent time
# in the local time zone.
flights_hm %>%
  select(date) %>%
  mutate(
    datetime = as.POSIXct(date),
    datetime_utc = date_set_zone(datetime, "UTC")
  ) %>%
  head(n = 3)
#> # A tibble: 3 x 3
#>   date       datetime            datetime_utc       
#>   <date>     <dttm>              <dttm>             
#> 1 2013-01-06 2013-01-05 19:00:00 2013-01-06 00:00:00
#> 2 2013-01-08 2013-01-07 19:00:00 2013-01-08 00:00:00
#> 3 2013-01-17 2013-01-16 19:00:00 2013-01-17 00:00:00

To get what we want, we need to convince the date column to “forget” that it is UTC, then add on the America/New_York time zone. With clock, we’ll do this by going through a new intermediate type called naive-time, a date-time type with a yet-to-be-specified time zone. The ability to separate a date-time from its associated time zone is one of clock’s most powerful features, which we’ll explore more in the Time Points section below. For now, the important thing is that this retains the printed time as we expected.

flights_dt <- flights_hm %>%
  mutate(
    datetime = as.POSIXct(as_naive_time(date), "America/New_York"),
    .keep = "unused",
    .before = 1
  )

flights_dt
#> # A tibble: 100 x 3
#>    datetime             hour minute
#>    <dttm>              <int>  <int>
#>  1 2013-01-06 00:00:00    18     27
#>  2 2013-01-08 00:00:00    14     58
#>  3 2013-01-17 00:00:00    18     23
#>  4 2013-01-26 00:00:00    10     52
#>  5 2013-01-29 00:00:00     4     48
#>  6 2013-01-30 00:00:00     0      3
#>  7 2013-02-01 00:00:00     8     16
#>  8 2013-02-04 00:00:00    19     43
#>  9 2013-02-10 00:00:00    15      8
#> 10 2013-02-13 00:00:00    20     33
#> # … with 90 more rows

We can now add on our hours and minutes.

flights_dt <- flights_dt %>%
  mutate(
    datetime = datetime %>%
      add_hours(hour) %>%
      add_minutes(minute),
    .keep = "unused"
  )

flights_dt
#> # A tibble: 100 x 1
#>    datetime           
#>    <dttm>             
#>  1 2013-01-06 18:27:00
#>  2 2013-01-08 14:58:00
#>  3 2013-01-17 18:23:00
#>  4 2013-01-26 10:52:00
#>  5 2013-01-29 04:48:00
#>  6 2013-01-30 00:03:00
#>  7 2013-02-01 08:16:00
#>  8 2013-02-04 19:43:00
#>  9 2013-02-10 15:08:00
#> 10 2013-02-13 20:33:00
#> # … with 90 more rows

Now assume that we want to add two days to this datetime column, again to construct some forward looking variable.

flights_dt_lubridate <- flights_dt %>%
  mutate(datetime2 = datetime + days(2))

flights_dt_lubridate
#> # A tibble: 100 x 2
#>    datetime            datetime2          
#>    <dttm>              <dttm>             
#>  1 2013-01-06 18:27:00 2013-01-08 18:27:00
#>  2 2013-01-08 14:58:00 2013-01-10 14:58:00
#>  3 2013-01-17 18:23:00 2013-01-19 18:23:00
#>  4 2013-01-26 10:52:00 2013-01-28 10:52:00
#>  5 2013-01-29 04:48:00 2013-01-31 04:48:00
#>  6 2013-01-30 00:03:00 2013-02-01 00:03:00
#>  7 2013-02-01 08:16:00 2013-02-03 08:16:00
#>  8 2013-02-04 19:43:00 2013-02-06 19:43:00
#>  9 2013-02-10 15:08:00 2013-02-12 15:08:00
#> 10 2013-02-13 20:33:00 2013-02-15 20:33:00
#> # … with 90 more rows

Looks reasonable. Now with clock:

flights_dt %>%
  mutate(datetime2 = add_days(datetime, 2))
#> Error: Problem with `mutate()` input `datetime2`.
#>  Nonexistent time due to daylight saving time at location 18. Resolve nonexistent time issues by specifying the `nonexistent` argument.
#>  Input `datetime2` is `add_days(datetime, 2)`.

Another problem! This time a nonexistent time at row 18. Let’s investigate what lubridate gave us here:

flights_dt_lubridate[18,]
#> # A tibble: 1 x 2
#>   datetime            datetime2          
#>   <dttm>              <dttm>             
#> 1 2013-03-08 02:23:00 NA

An NA? But why?

As it turns out, in the America/New_York time zone, on 2013-03-10 the clocks jumped forward from 01:59:59 -> 03:00:00, creating a daylight saving time gap, and a nonexistent 2 o’clock hour. By adding two days, we’ve landed right in that gap (at 02:23:00). With nonexistent times like this, lubridate silently returns NA, while clock errors.

Like with invalid dates, clock tries to guard you from these issues by erroring as soon as they occur. You can resolve these particular issues with the nonexistent argument to add_days(). In this case, we could:

  • Roll forward to the next valid moment in time

  • Roll backward to the previous valid moment in time

  • Shift forward by the size of the gap

  • Shift backward by the size of the gap

  • Return NA

problem <- flights_dt$datetime[18]
problem
#> [1] "2013-03-08 02:23:00 EST"

# 02:23:00 -> 03:00:00
add_days(problem, 2, nonexistent = "roll-forward")
#> [1] "2013-03-10 03:00:00 EDT"

# 02:23:00 -> 01:59:59
add_days(problem, 2, nonexistent = "roll-backward")
#> [1] "2013-03-10 01:59:59 EST"

# 02:23:00 -> 03:23:00
add_days(problem, 2, nonexistent = "shift-forward")
#> [1] "2013-03-10 03:23:00 EDT"

# 02:23:00 -> 01:23:00
add_days(problem, 2, nonexistent = "shift-backward")
#> [1] "2013-03-10 01:23:00 EST"

# 02:23:00 -> NA
add_days(problem, 2, nonexistent = "NA")
#> [1] NA

I recommend "roll-forward" or "roll-backward", as these retain the relative ordering of datetime, an issue that you can read about here.

Unlike with invalid dates, lubridate does not provide any tools for resolving nonexistent times.

There are another class of daylight saving time issues related to ambiguous times. These generally result from daylight saving fallbacks, where your clock might show two 1 AM hours. You resolve them in a similar way to what was done with nonexistent times. If you’re interested, you can read more about ambiguous times here.

Nonexistent and ambiguous times are particularly nasty issues because they occur relatively infrequently. If your time zone uses daylight saving time, these issues each come up once per year, generally for a duration of 1 hour (but not always!). This can be incredibly frustrating in production, where an analysis that has been working fine suddenly crashes on new data due to a daylight saving time issue. Which brings me to…

Production

This new invalid date and daylight saving time behavior might sound great to you, but you might be wondering about usage of clock in production. What happens if add_months() worked in interactive development, but then you put your analysis into production, gathered new data, and all of the sudden it started failing?

dates <- flights$date

# All good! Ship it!
add_months(dates[1:4], 1) 
#> [1] "2013-02-06" "2013-02-08" "2013-02-17" "2013-02-26"

# Failed in production with new data! Oh no!
add_months(dates[1:10], 1)
#> Error: Invalid date found at location 5. Resolve invalid date issues by specifying the `invalid` argument.

To balance the usefulness of clock in interactive development with the strict requirements of production, you can set the clock.strict global option to TRUE to turn invalid, nonexistent, and ambiguous from optional arguments into required ones.

rlang::with_options(clock.strict = TRUE, .expr = {
  add_months(dates[1:4], 1)
})
#> Error: The global option, `clock.strict`, is currently set to `TRUE`. In this mode, `invalid` must be set and cannot be left as `NULL`.

Forcing yourself to specify these arguments up front during interactive development is a great way to explicitly document your assumptions about these possible issues, while also guarding against future problems in production.

Advanced features

This blog post has only scratched the surface of what clock can do. Up until now, we’ve only explored clock’s high-level API. There exists an entire world of more powerful utilities in the low-level API that powers clock. We’ll briefly explore a few of those in the next few sections, but I’d encourage checking out the rest of the reference page to get a bird’s-eye view of all that clock can do.

Calendars

Calendars allow you to represent a date using an alternative format. Rather than using a typical year, month, and day of the month format, you might want to specify the fiscal year, quarter, and day of the quarter. In the end, these point to the same moment in time, just in different ways. For example:

ymd <- year_month_day(2019, 2, 25)

# Fiscal year starting in January
as_year_quarter_day(ymd)
#> <year_quarter_day<January><day>[1]>
#> [1] "2019-Q1-56"

# Fiscal year starting in April
as_year_quarter_day(ymd, start = clock_months$april)
#> <year_quarter_day<April><day>[1]>
#> [1] "2019-Q4-56"

There are 5 calendars that come with clock. The neat part about these is that they have varying precision, from year to nanosecond. This provides built-in granular types like year-month and year-quarter.

# Gregorian year, month, and day of the month
year_month_day(2019, 1, 14)
#> <year_month_day<day>[1]>
#> [1] "2019-01-14"
year_month_day(2019, 2)
#> <year_month_day<month>[1]>
#> [1] "2019-02"
year_month_day(2019, 2, 14, 2, 30, 25, 12345, subsecond_precision = "nanosecond")
#> <year_month_day<nanosecond>[1]>
#> [1] "2019-02-14 02:30:25.000012345"

# Gregorian year, month, and indexed weekday of the month
# (i.e. the 2nd Wednesday)
year_month_weekday(2019, 2, day = clock_weekdays$wednesday, index = 2)
#> <year_month_weekday<day>[1]>
#> [1] "2019-02-Wed[2]"

# Gregorian year and day of the year
year_day(2019, 105)
#> <year_day<day>[1]>
#> [1] "2019-105"

# Fiscal year, quarter, and day of the quarter
year_quarter_day(2019, 1, 14)
#> <year_quarter_day<January><day>[1]>
#> [1] "2019-Q1-14"
year_quarter_day(2019, 1, 14, start = clock_months$april)
#> <year_quarter_day<April><day>[1]>
#> [1] "2019-Q1-14"
year_quarter_day(2019, 2:4)
#> <year_quarter_day<January><quarter>[3]>
#> [1] "2019-Q2" "2019-Q3" "2019-Q4"

# ISO year, week, and day of the week
iso_year_week_day(2019, 2, clock_iso_weekdays$friday)
#> <iso_year_week_day<day>[1]>
#> [1] "2019-W02-5"

As shown above, you can convert from one calendar to another with functions like as_year_quarter_day(), and to Date or POSIXct with the standard as.Date() and as.POSIXct() functions.

One of the most unique features of calendars is the ability to represent invalid dates directly. In a previous section, we added 1 month to a Date and used the invalid argument to resolve invalid date issues. Let’s swap to a year-month-day and try again. We can also use the cleaner + duration_months() syntax here, which we can’t use with Dates.

invalids <- flights %>%
  select(date) %>%
  mutate(
    ymd = as_year_month_day(date),
    ymd2 = ymd + duration_months(1)
  )

invalids
#> # A tibble: 100 x 3
#>    date       ymd        ymd2      
#>    <date>     <ymd<day>> <ymd<day>>
#>  1 2013-01-06 2013-01-06 2013-02-06
#>  2 2013-01-08 2013-01-08 2013-02-08
#>  3 2013-01-17 2013-01-17 2013-02-17
#>  4 2013-01-26 2013-01-26 2013-02-26
#>  5 2013-01-29 2013-01-29 2013-02-29
#>  6 2013-01-30 2013-01-30 2013-02-30
#>  7 2013-02-01 2013-02-01 2013-03-01
#>  8 2013-02-04 2013-02-04 2013-03-04
#>  9 2013-02-10 2013-02-10 2013-03-10
#> 10 2013-02-13 2013-02-13 2013-03-13
#> # … with 90 more rows

The ymd2 column directly contains the invalid dates, 2013-02-29 and 2013-02-30! You can resolve these dates at any time using invalid_resolve(), providing an invalid resolution strategy like we did earlier. Or, you can ignore them if you expect them to be resolved naturally in some other way. For example, if our end goal was to add 1 month, then fix the day of the month to the 15th, then these invalid dates would naturally resolve themselves:

mutate(invalids, ymd3 = set_day(ymd2, 15))
#> # A tibble: 100 x 4
#>    date       ymd        ymd2       ymd3      
#>    <date>     <ymd<day>> <ymd<day>> <ymd<day>>
#>  1 2013-01-06 2013-01-06 2013-02-06 2013-02-15
#>  2 2013-01-08 2013-01-08 2013-02-08 2013-02-15
#>  3 2013-01-17 2013-01-17 2013-02-17 2013-02-15
#>  4 2013-01-26 2013-01-26 2013-02-26 2013-02-15
#>  5 2013-01-29 2013-01-29 2013-02-29 2013-02-15
#>  6 2013-01-30 2013-01-30 2013-02-30 2013-02-15
#>  7 2013-02-01 2013-02-01 2013-03-01 2013-03-15
#>  8 2013-02-04 2013-02-04 2013-03-04 2013-03-15
#>  9 2013-02-10 2013-02-10 2013-03-10 2013-03-15
#> 10 2013-02-13 2013-02-13 2013-03-13 2013-03-15
#> # … with 90 more rows

To detect which dates are invalid, use invalid_detect(), which returns a logical vector that can be useful for filtering:

filter(invalids, invalid_detect(ymd2))
#> # A tibble: 3 x 3
#>   date       ymd        ymd2      
#>   <date>     <ymd<day>> <ymd<day>>
#> 1 2013-01-29 2013-01-29 2013-02-29
#> 2 2013-01-30 2013-01-30 2013-02-30
#> 3 2013-10-31 2013-10-31 2013-11-31

With invalid dates, the important thing is that they eventually get resolved. You must resolve them before converting to another calendar or to a Date / POSIXct.

as.Date(invalids$ymd2)
#> Error: Conversion from a calendar requires that all dates are valid. Resolve invalid dates by calling `invalid_resolve()`.

Time points and zoned-times

The daylight saving time section of this post was complicated by the need to work around time zones. If your analysis doesn’t actually require time zones, you can represent a date or date-time using a naive-time. This date-time type makes no assumption about the current time zone, instead assuming that there is a yet-to-be-specified time zone that hasn’t been declared yet.

flights_nt <- flights_hm %>%
  mutate(
    naive_day = as_naive_time(date),
    naive_time = naive_day + duration_hours(hour) + duration_minutes(minute)
  ) %>%
  select(date, starts_with("naive"))

flights_nt
#> # A tibble: 100 x 3
#>    date       naive_day        naive_time         
#>    <date>     <tp<naive><day>> <tp<naive><minute>>
#>  1 2013-01-06 2013-01-06       2013-01-06 18:27   
#>  2 2013-01-08 2013-01-08       2013-01-08 14:58   
#>  3 2013-01-17 2013-01-17       2013-01-17 18:23   
#>  4 2013-01-26 2013-01-26       2013-01-26 10:52   
#>  5 2013-01-29 2013-01-29       2013-01-29 04:48   
#>  6 2013-01-30 2013-01-30       2013-01-30 00:03   
#>  7 2013-02-01 2013-02-01       2013-02-01 08:16   
#>  8 2013-02-04 2013-02-04       2013-02-04 19:43   
#>  9 2013-02-10 2013-02-10       2013-02-10 15:08   
#> 10 2013-02-13 2013-02-13       2013-02-13 20:33   
#> # … with 90 more rows

Going from Date -> naive-time has dropped the UTC time zone assumption altogether, while keeping the printed time. This allowed us to convert back to POSIXct in an earlier example. Essentially, all that we were doing was declaring that yet-to-be-specified time zone as America/New_York, keeping the printed time where possible. We could have easily chosen a different time zone, like Europe/London.

flights_nt %>%
  select(naive_time) %>%
  mutate(
    datetime_ny = as.POSIXct(naive_time, "America/New_York"),
    datetime_lo = as.POSIXct(naive_time, "Europe/London")
  )
#> # A tibble: 100 x 3
#>    naive_time          datetime_ny         datetime_lo        
#>    <tp<naive><minute>> <dttm>              <dttm>             
#>  1 2013-01-06 18:27    2013-01-06 18:27:00 2013-01-06 18:27:00
#>  2 2013-01-08 14:58    2013-01-08 14:58:00 2013-01-08 14:58:00
#>  3 2013-01-17 18:23    2013-01-17 18:23:00 2013-01-17 18:23:00
#>  4 2013-01-26 10:52    2013-01-26 10:52:00 2013-01-26 10:52:00
#>  5 2013-01-29 04:48    2013-01-29 04:48:00 2013-01-29 04:48:00
#>  6 2013-01-30 00:03    2013-01-30 00:03:00 2013-01-30 00:03:00
#>  7 2013-02-01 08:16    2013-02-01 08:16:00 2013-02-01 08:16:00
#>  8 2013-02-04 19:43    2013-02-04 19:43:00 2013-02-04 19:43:00
#>  9 2013-02-10 15:08    2013-02-10 15:08:00 2013-02-10 15:08:00
#> 10 2013-02-13 20:33    2013-02-13 20:33:00 2013-02-13 20:33:00
#> # … with 90 more rows

If you’re used to lubridate, converting to naive-time and back with a different time zone is similar to using lubridate::force_tz(), but with more control over possible daylight saving time issues (again using nonexistent and ambiguous, but supplied directly to as.POSIXct()).

In clock, a naive-time is a particular kind of time point, a type that counts units of time with respect to some origin. Time points are extremely efficient at daily and sub-daily arithmetic, but calendars are better suited for monthly and yearly arithmetic. Time points are also efficient at rounding and shifting, through time_point_floor() and time_point_shift(), but calendars are better at grouping, through calendar_group(). In the high-level API for Date and POSIXct, we gloss over these details and internally switch between these two types for you.

There is a second type of time point in clock, the sys-time, which works exactly like a naive-time except that it is assumed to be in UTC. If you never use a time zone aware class like POSIXct, then sys-time and naive-time are equivalent. However, once you start adding in time zones, the way you interpret each of them becomes extremely important.

ymd <- year_month_day(2019, 1, 1)

# Yet-to-be-specified time zone
naive <- as_naive_time(ymd)
naive
#> <time_point<naive><day>[1]>
#> [1] "2019-01-01"

# UTC time zone
sys <- as_sys_time(ymd)
sys
#> <time_point<sys><day>[1]>
#> [1] "2019-01-01"

# - Keeps printed time
# - Changes underlying duration
as.POSIXct(naive, "America/New_York")
#> [1] "2019-01-01 EST"

# - Changes printed time
# - Keeps underlying duration
as.POSIXct(sys, "America/New_York")
#> [1] "2018-12-31 19:00:00 EST"

clock also provides its own time zone aware date-time type, the zoned-time. Converting to a zoned-time from a sys-time or naive-time works the same as converting to a POSIXct, but zoned-times can have up to nanosecond precision.

naive %>%
  add_nanoseconds(100) %>%
  add_hours(2) %>%
  as_zoned_time("America/New_York")
#> <zoned_time<nanosecond><America/New_York>[1]>
#> [1] "2019-01-01 02:00:00.000000100-05:00"

There isn’t actually a lot you can do with zoned-times directly. Generally, zoned-times are the start or end points of an analysis meant for humans to interpret. In the middle, you’ll convert to naive-time, sys-time, or to a calendar type to perform any date-time specific manipulations.