archive 1.1.2 is now on CRAN. archive lets you work with file archives, such as ZIP, tar, 7-Zip and RAR and compression formats like gzip, bzip2, XZ and Zstandard. It does this by building on top of the libarchive C library.
You can install it from CRAN with:
This blog post will explain the main functions of archive, and show how you can use them to read from and write to archives.
You can see a full list of changes in the release notes
library(archive) my_dir <- fs::file_temp() |> fs::dir_create() knitr::opts_knit$set(root.dir = my_dir)
archive() to return a tibble of the files contained in a given archive.
archive("nycflights13.zip") #> # A tibble: 5 × 3 #> path size date #> <chr> <int> <dttm> #> 1 nycflights13/airlines.csv 386 2021-11-04 15:14:15 #> 2 nycflights13/airports.csv 71209 2021-11-04 15:14:15 #> 3 nycflights13/flights.csv 90886 2021-11-04 15:14:16 #> 4 nycflights13/planes.csv 72927 2021-11-04 15:14:16 #> 5 nycflights13/weather.csv 86753 2021-11-04 15:14:16
archive_read() is used to read a single file from an archive. This function returns an R connection, which can be passed to many R functions that take a connection object as input. All base R file system functions use connections, as well as some packages like readr.
file= argument accepts numeric positions in the archive, or filenames as input.
con1 <- archive_read("nycflights13.zip", file = 2) readLines(con1, n = 5) #>  "faa,name,lat,lon,alt,tz,dst,tzone" #>  "04G,Lansdowne Airport,41.1304722,-80.6195833,1044,-5,A,America/New_York" #>  "06A,Moton Field Municipal Airport,32.4605722,-85.6800278,264,-6,A,America/Chicago" #>  "06C,Schaumburg Regional,41.9893408,-88.1012428,801,-6,A,America/Chicago" #>  "06N,Randall Airport,41.431912,-74.3915611,523,-5,A,America/New_York" close(con1) con2 <- archive_read("nycflights13.zip", file = "nycflights13/planes.csv") readLines(con2, n = 5) #>  "tailnum,year,type,manufacturer,model,engines,seats,speed,engine" #>  "N10156,2004,Fixed wing multi engine,EMBRAER,EMB-145XR,2,55,NA,Turbo-fan" #>  "N102UW,1998,Fixed wing multi engine,AIRBUS INDUSTRIE,A320-214,2,182,NA,Turbo-fan" #>  "N103US,1999,Fixed wing multi engine,AIRBUS INDUSTRIE,A320-214,2,182,NA,Turbo-fan" #>  "N104UW,1999,Fixed wing multi engine,AIRBUS INDUSTRIE,A320-214,2,182,NA,Turbo-fan" close(con2)
archive_write() is used to write a single file to an archive. Again this creates a writable R connection. Like reading, many base R functions work with writable connections, as well as some packages like readr.
The archive and compression formats are automatically guessed based on the output filename file extensions. However you can also specify them explicity with the
Here we create a new zip archive containing the file
readr::write_csv(mtcars, archive_write("my-cars.zip", "mtcars.csv")) archive("my-cars.zip") #> # A tibble: 1 × 3 #> path size date #> <chr> <int> <dttm> #> 1 mtcars.csv 1281 1980-01-01 00:00:00
archive_write_files() writes multiple files to a new archive. In this case the files to be added to the archive should already be written on disk.
archive_write_dir() is a helper to archive all the files in a given directory.
library(readr) # Write a few files to the temp directory write_csv(iris, "iris.csv") write_csv(mtcars, "mtcars.csv") write_csv(airquality, "airquality.csv") # Add them to a new XZ compressed tar archive archive_write_files("data.tar.xz", c("iris.csv", "mtcars.csv", "airquality.csv")) # View archive contents archive("data.tar.xz") #> # A tibble: 3 × 3 #> path size date #> <chr> <int> <dttm> #> 1 iris.csv 3716 2021-11-04 15:14:17 #> 2 mtcars.csv 1281 2021-11-04 15:14:17 #> 3 airquality.csv 2890 2021-11-04 15:14:17
archive_extract() allows you to extract one or more files to disk from an archive.
Note the archive and compression formats will be automatically detected.
# Create a new directory my_dir <- fs::file_temp() |> fs::dir_create() # Extract two of the files in the archive to that directory archive_extract("data.tar.xz", dir = my_dir, files = c("iris.csv", "mtcars.csv")) # Show the extracted files fs::dir_ls(my_dir) |> fs::path_file() #>  "iris.csv" "mtcars.csv"