Chapter 8 File Compression

In this chapter, we will explore file compression using the following

  • tar
  • gzip
  • zip
  • unzip

8.1 tar

The tar command is used for file compression. It works with both tar and tar.gz extensions. It is used to

  • list files
  • extract files
  • create archives
  • append file to existing archives

tar creates, maintains, modifies, and extracts files that are archived in the tar format. Tar stands for tape archive and is an archiving file format.

Command Description
tar tvf List an archive
tar tvfz List a gzipped archive
tar xvf Extract an archive
tar xvfz Extract a gzipped archive
tar cvf Create an uncompressed tar archive
tar cvfz Create a tar gzipped archive
tar rvf Add a file to an existing archive
tar rvfz Add a file to an existing gzipped archive

We will use different options along with the tar command for listing, extracting, creating and adding files. The vf (v stands for verbosely show .tar file progress and f stands for file name type of the archive file) option is common for all the above operations while the following are specific.

  • t for listing
  • x for extracting
  • c for creating
  • r for adding files

While dealing with tar.gz archives we will use z in addition to vf and the above options.

8.1.1 List

Let us list all the files & folders in release_names.tar. As mentioned above. to list the files in the archive, we use the t option.

## -rwxrwxrwx aravind/aravind 546 2019-09-16 15:59 release_names.txt
## -rwxrwxrwx aravind/aravind  65 2019-09-16 15:58 release_names_18.txt
## -rwxrwxrwx aravind/aravind  53 2019-09-16 15:59 release_names_19.txt

8.1.2 Extract

Let us extract files from release_names.tar using the x option in addition to vf.

## release_names.txt
## release_names_18.txt
## release_names_19.txt
## analysis.R
## bash.R
## bash.Rmd
## bash.sh
## imports_blorr.txt
## imports_olsrr.txt
## lorem-ipsum.txt
## main_project.zip
## myfiles
## mypackage
## myproject
## myproject3
## myproject4
## package_names.txt
## pkg_names.txt
## r
## r2
## r_releases
## release_names.tar
## release_names.tar.gz
## release_names.txt
## release_names_18.txt
## release_names_18_19.txt
## release_names_19.txt
## sept_15.csv.gz
## urls.txt
## zip_example.zip

8.1.3 Add

To add a file to an existing archive, use the r option. Let us add release_names_18.txt and release_names_19.txt to the archive we created in the previous step.

## release_names_18.txt
## release_names_19.txt

8.1.4 Create

Using the c option we can create tar archives. In the below example, we are using a single file but you can specify multiple files and folders as well.

## pkg_names.txt

8.2 gzip

Command Description
gzip Compress a file
gzip -d Decompress a file
gzip -c Compress a file and specify the output file name
zip -r Compress a directory
zip Add files to an existing zip file
unzip Extract files from a zip files
unzip -d Extract files from a zip file and specify the output file name
unzip -l List contents of a zip file

gzip, gunzip, and zcat commands are used to compress or expand files in the GNU GZIP format i.e. files with .gz extension

8.2.1 Compress

Let us compress release_names.txt file using gzip.

## analysis.R
## bash.R
## bash.Rmd
## bash.sh
## imports_blorr.txt
## imports_olsrr.txt
## lorem-ipsum.txt
## main_project.zip
## myfiles
## mypackage
## myproject
## myproject3
## myproject4
## package_names.txt
## pkg_names.tar
## pkg_names.txt
## r
## r2
## r_releases
## release_names.tar
## release_names.tar.gz
## release_names.txt.gz
## release_names_18.txt
## release_names_18_19.txt
## release_names_19.txt
## sept_15.csv.gz
## urls.txt
## zip_example.zip

8.2.2 Decompress

Use the -d option with gzip to decompress a file. In the below example, we decompress the sept_15.csv.gz file (downloaded using wget or curl earlier). You can also use gunzip for the same result.

## analysis.R
## bash.R
## bash.Rmd
## bash.sh
## imports_blorr.txt
## imports_olsrr.txt
## lorem-ipsum.txt
## main_project.zip
## myfiles
## mypackage
## myproject
## myproject3
## myproject4
## package_names.txt
## pkg_names.tar
## pkg_names.txt
## r
## r2
## r_releases
## release_names.tar
## release_names.tar.gz
## release_names.txt
## release_names_18.txt
## release_names_18_19.txt
## release_names_19.txt
## sept_15.csv
## urls.txt
## zip_example.zip

8.2.3 Specify Filename

Use -c and > to specify a different file name while compressing using gzip. In the below example, gzip will create releases.txt.gz instead of release_names.txt.gz.

## analysis.R
## bash.R
## bash.Rmd
## bash.sh
## imports_blorr.txt
## imports_olsrr.txt
## lorem-ipsum.txt
## main_project.zip
## myfiles
## mypackage
## myproject
## myproject3
## myproject4
## package_names.txt
## pkg_names.tar
## pkg_names.txt
## r
## r2
## r_releases
## release_names.tar
## release_names.tar.gz
## release_names.txt
## release_names_18.txt
## release_names_18_19.txt
## release_names_19.txt
## releases.txt.gz
## sept_15.csv
## urls.txt
## zip_example.zip

8.3 zip & unzip

zip creates ZIP archives while unzip lists and extracts compressed files in a ZIP archive.

8.3.1 List

Let us list all the files and folders in main_project.zip() using unzip and the -l option.

## Archive:  main_project.zip
##   Length      Date    Time    Name
## ---------  ---------- -----   ----
##         0  2019-09-23 18:07   myproject/
##         0  2019-09-20 14:02   myproject/.gitignore
##         0  2019-09-23 18:07   myproject/data/
##         0  2019-09-20 14:02   myproject/data/processed/
##         0  2019-09-20 14:02   myproject/data/raw/
##         0  2019-09-20 14:02   myproject/output/
##         0  2019-09-20 14:02   myproject/README.md
##        13  2019-09-20 14:02   myproject/run_analysis.R
##         0  2019-09-20 14:02   myproject/src/
##         0  2019-09-23 18:07   mypackage/
##         0  2019-09-20 14:11   mypackage/.gitignore
##         0  2019-09-20 14:11   mypackage/.Rbuildignore
##         0  2019-09-20 14:10   mypackage/data/
##         0  2019-09-20 14:11   mypackage/DESCRIPTION
##         0  2019-09-20 14:10   mypackage/docs/
##         0  2019-09-20 14:11   mypackage/LICENSE
##         0  2019-09-20 14:10   mypackage/man/
##         0  2019-09-20 14:11   mypackage/NAMESPACE
##         0  2019-09-20 14:11   mypackage/NEWS.md
##         0  2019-09-20 14:10   mypackage/R/
##         0  2019-09-20 14:11   mypackage/README.md
##         0  2019-09-20 14:11   mypackage/src/
##         0  2019-09-20 14:10   mypackage/tests/
##         0  2019-09-20 14:10   mypackage/vignettes/
##         0  2019-09-23 18:07   myfiles/
##        12  2019-09-20 15:30   myfiles/analysis.R
##         7  2019-09-20 15:31   myfiles/NEWS.md
##         9  2019-09-20 15:31   myfiles/README.md
##       546  2019-09-20 15:29   myfiles/release_names.txt
##        65  2019-09-20 15:29   myfiles/release_names_18.txt
##        53  2019-09-20 15:30   myfiles/release_names_19.txt
##        12  2019-09-20 15:30   myfiles/visualization.R
##     15333  2019-10-01 16:58   bash.sh
##         0  2019-09-16 12:42   r/
## ---------                     -------
##     16050                     34 files

8.3.2 Extract

Using unzip, let us now extract files and folders from zip_example.zip.

## Archive:  zip_example.zip
##    creating: zip_example/
##   inflating: zip_example/bash.sh     
##   inflating: zip_example/pkg_names.txt

Using the -d option, we can extract the contents of zip_example.zip to a specific folder. In the below example, we extract it to a new folder examples.

## [1] "Archive:  zip_example.zip"                        
## [2] "   creating: examples/zip_example/"               
## [3] "  inflating: examples/zip_example/bash.sh  "      
## [4] "  inflating: examples/zip_example/pkg_names.txt  "

8.3.3 Compress

Use the -r option along with zip to create a ZIP archive. In the below example, we create a ZIP archive of myproject folder.

##   adding: myproject/ (stored 0%)
##   adding: myproject/.gitignore (stored 0%)
##   adding: myproject/data/ (stored 0%)
##   adding: myproject/data/processed/ (stored 0%)
##   adding: myproject/data/raw/ (stored 0%)
##   adding: myproject/output/ (stored 0%)
##   adding: myproject/README.md (stored 0%)
##   adding: myproject/run_analysis.R (stored 0%)
##   adding: myproject/src/ (stored 0%)

We can compress multiple directories using zip. The names of the directories must be separated by a space as shown in the below example where we compress myproject and mypackage into a single ZIP archive.

##   adding: myproject/ (stored 0%)
##   adding: myproject/.gitignore (stored 0%)
##   adding: myproject/data/ (stored 0%)
##   adding: myproject/data/processed/ (stored 0%)
##   adding: myproject/data/raw/ (stored 0%)
##   adding: myproject/output/ (stored 0%)
##   adding: myproject/README.md (stored 0%)
##   adding: myproject/run_analysis.R (stored 0%)
##   adding: myproject/src/ (stored 0%)
##   adding: mypackage/ (stored 0%)
##   adding: mypackage/.gitignore (stored 0%)
##   adding: mypackage/.Rbuildignore (stored 0%)
##   adding: mypackage/data/ (stored 0%)
##   adding: mypackage/DESCRIPTION (stored 0%)
##   adding: mypackage/docs/ (stored 0%)
##   adding: mypackage/LICENSE (stored 0%)
##   adding: mypackage/man/ (stored 0%)
##   adding: mypackage/NAMESPACE (stored 0%)
##   adding: mypackage/NEWS.md (stored 0%)
##   adding: mypackage/R/ (stored 0%)
##   adding: mypackage/README.md (stored 0%)
##   adding: mypackage/src/ (stored 0%)
##   adding: mypackage/tests/ (stored 0%)
##   adding: mypackage/vignettes/ (stored 0%)

8.3.4 Add

To add a new file/folder to an existing archive, specify the name of the archive followed by the name of the file or the folder. In the below example, we add the bash.sh file to the myproject.zip archive created in a previous step.

##   adding: bash.sh (deflated 78%)

8.4 R Functions

8.4.1 tar & tar.gz

In R, we can use the tar() and untar() functions from the utils package to handle .tar and .tar.gz archives.

Command R
tar tvf utils::untar('archive.tar', list = TRUE)
tar tvfz utils::untar('archive.tar.gz', list = TRUE)
tar xvf utils::untar('archive.tar')
tar xvfz utils::untar('archive.tar.gz')
tar cvf utils::tar('archive.tar')
tar cvfz utils::tar('archive.tar', compression = 'gzip')

8.4.2 zip & gzip

The zip package has the functionalities to handle ZIP archives. The tar() and untar() functions from the utils package can handle GZIP archives.

Command R
gzip utils::tar(compression = 'gzip' / R.utils::gzip()
gzip -d utils::untar() / R.utils::gunzip()
gzip -c utils::untar(exdir = filename)
zip -r zip::zip()
zip zip::zipr_append()
unzip zip::unzip()
unzip -d zip::unzip(exdir = dir_name)
unzip -l zip::zip_list()