Chapter 8 File Compression
In this chapter, we will explore file compression using the following
targzipzipunzip
8.1 tar
The tar command is used for file compression. It works with both tar and
tar.gz extensions. It is used to
- list files
- extract files
- create archives
- append file to existing archives
tar creates, maintains, modifies, and extracts files that are archived in the tar format. Tar stands for tape archive and is an archiving file format.
| Command | Description |
|---|---|
tar tvf
|
List an archive |
tar tvfz
|
List a gzipped archive |
tar xvf
|
Extract an archive |
tar xvfz
|
Extract a gzipped archive |
tar cvf
|
Create an uncompressed tar archive |
tar cvfz
|
Create a tar gzipped archive |
tar rvf
|
Add a file to an existing archive |
tar rvfz
|
Add a file to an existing gzipped archive |
We will use different options along with the tar command for listing, extracting, creating and adding files. The vf (v stands for verbosely show .tar file progress and f stands for file name type of the archive file) option is common for all the above operations while the following are specific.
tfor listingxfor extractingcfor creatingrfor adding files
While dealing with tar.gz archives we will use z in addition to vf and the above options.
8.1.1 List
Let us list all the files & folders in release_names.tar. As mentioned above. to list the files in the archive, we use the t option.
tar -tvf release_names.tar ## -rwxrwxrwx aravind/aravind 546 2019-09-16 15:59 release_names.txt
## -rwxrwxrwx aravind/aravind 65 2019-09-16 15:58 release_names_18.txt
## -rwxrwxrwx aravind/aravind 53 2019-09-16 15:59 release_names_19.txt
8.1.2 Extract
Let us extract files from release_names.tar using the x option in addition to vf.
tar -xvf release_names.tar
ls## release_names.txt
## release_names_18.txt
## release_names_19.txt
## analysis.R
## bash-tutorial
## bash.R
## bash.Rmd
## bash.sh
## imports_blorr.txt
## imports_olsrr.txt
## lorem-ipsum.txt
## main_project.zip
## myfiles
## mypackage
## myproject
## myproject.zip
## myproject3
## myproject4
## package_names.txt
## packproj.zip
## pkg_names.tar
## pkg_names.txt
## r
## r2
## r_releases
## release_names.tar
## release_names.tar.gz
## release_names.txt
## release_names_18.txt
## release_names_18_19.txt
## release_names_19.txt
## releases.txt.gz
## sept_15.csv.gz
## urls.txt
## zip_example.zip
8.2 gzip
| Command | Description |
|---|---|
gzip
|
Compress a file |
gzip -d
|
Decompress a file |
gzip -c
|
Compress a file and specify the output file name |
zip -r
|
Compress a directory |
zip
|
Add files to an existing zip file |
unzip
|
Extract files from a zip files |
unzip -d
|
Extract files from a zip file and specify the output file name |
unzip -l
|
List contents of a zip file |
gzip, gunzip, and zcat commands are used to compress or expand files in the GNU GZIP format i.e. files with .gz extension
8.2.1 Compress
Let us compress release_names.txt file using gzip.
gzip release_names.txt
ls## analysis.R
## bash-tutorial
## bash.R
## bash.Rmd
## bash.sh
## imports_blorr.txt
## imports_olsrr.txt
## lorem-ipsum.txt
## main_project.zip
## myfiles
## mypackage
## myproject
## myproject.zip
## myproject3
## myproject4
## package_names.txt
## packproj.zip
## pkg_names.tar
## pkg_names.txt
## r
## r2
## r_releases
## release_names.tar
## release_names.tar.gz
## release_names.txt.gz
## release_names_18.txt
## release_names_18_19.txt
## release_names_19.txt
## releases.txt.gz
## sept_15.csv.gz
## urls.txt
## zip_example.zip
8.2.2 Decompress
Use the -d option with gzip to decompress a file. In the below example, we decompress the sept_15.csv.gz file (downloaded using wget or curl earlier). You can also use gunzip for the same result.
gzip -d sept_15.csv.gz
ls## analysis.R
## bash-tutorial
## bash.R
## bash.Rmd
## bash.sh
## imports_blorr.txt
## imports_olsrr.txt
## lorem-ipsum.txt
## main_project.zip
## myfiles
## mypackage
## myproject
## myproject.zip
## myproject3
## myproject4
## package_names.txt
## packproj.zip
## pkg_names.tar
## pkg_names.txt
## r
## r2
## r_releases
## release_names.tar
## release_names.tar.gz
## release_names.txt
## release_names_18.txt
## release_names_18_19.txt
## release_names_19.txt
## releases.txt.gz
## sept_15.csv
## urls.txt
## zip_example.zip
8.2.3 Specify Filename
Use -c and > to specify a different file name while compressing using gzip. In the below example, gzip will create releases.txt.gz instead of release_names.txt.gz.
gzip -c release_names.txt > releases.txt.gz
ls## analysis.R
## bash-tutorial
## bash.R
## bash.Rmd
## bash.sh
## imports_blorr.txt
## imports_olsrr.txt
## lorem-ipsum.txt
## main_project.zip
## myfiles
## mypackage
## myproject
## myproject.zip
## myproject3
## myproject4
## package_names.txt
## packproj.zip
## pkg_names.tar
## pkg_names.txt
## r
## r2
## r_releases
## release_names.tar
## release_names.tar.gz
## release_names.txt
## release_names_18.txt
## release_names_18_19.txt
## release_names_19.txt
## releases.txt.gz
## sept_15.csv
## urls.txt
## zip_example.zip
8.3 zip & unzip
zip creates ZIP archives while unzip lists and extracts compressed files in a ZIP archive.
8.3.1 List
Let us list all the files and folders in main_project.zip() using unzip and the -l option.
unzip -l main_project.zip## Archive: main_project.zip
## Length Date Time Name
## --------- ---------- ----- ----
## 0 2019-09-23 18:07 myproject/
## 0 2019-09-20 14:02 myproject/.gitignore
## 0 2019-09-23 18:07 myproject/data/
## 0 2019-09-20 14:02 myproject/data/processed/
## 0 2019-09-20 14:02 myproject/data/raw/
## 0 2019-09-20 14:02 myproject/output/
## 0 2019-09-20 14:02 myproject/README.md
## 13 2019-09-20 14:02 myproject/run_analysis.R
## 0 2019-09-20 14:02 myproject/src/
## 0 2019-09-23 18:07 mypackage/
## 0 2019-09-20 14:11 mypackage/.gitignore
## 0 2019-09-20 14:11 mypackage/.Rbuildignore
## 0 2019-09-20 14:10 mypackage/data/
## 0 2019-09-20 14:11 mypackage/DESCRIPTION
## 0 2019-09-20 14:10 mypackage/docs/
## 0 2019-09-20 14:11 mypackage/LICENSE
## 0 2019-09-20 14:10 mypackage/man/
## 0 2019-09-20 14:11 mypackage/NAMESPACE
## 0 2019-09-20 14:11 mypackage/NEWS.md
## 0 2019-09-20 14:10 mypackage/R/
## 0 2019-09-20 14:11 mypackage/README.md
## 0 2019-09-20 14:11 mypackage/src/
## 0 2019-09-20 14:10 mypackage/tests/
## 0 2019-09-20 14:10 mypackage/vignettes/
## 0 2019-09-23 18:07 myfiles/
## 12 2019-09-20 15:30 myfiles/analysis.R
## 7 2019-09-20 15:31 myfiles/NEWS.md
## 9 2019-09-20 15:31 myfiles/README.md
## 546 2019-09-20 15:29 myfiles/release_names.txt
## 65 2019-09-20 15:29 myfiles/release_names_18.txt
## 53 2019-09-20 15:30 myfiles/release_names_19.txt
## 12 2019-09-20 15:30 myfiles/visualization.R
## 15333 2019-10-01 16:58 bash.sh
## 0 2019-09-16 12:42 r/
## --------- -------
## 16050 34 files
8.3.2 Extract
Using unzip, let us now extract files and folders from zip_example.zip.
unzip zip_example.zipUsing the -d option, we can extract the contents of zip_example.zip to a specific folder. In the below example, we extract it to a new folder examples.
unzip zip_example.zip –d myexamples8.3.3 Compress
Use the -r option along with zip to create a ZIP archive. In the below example, we create a ZIP archive of myproject folder.
zip -r myproject.zip myproject
ls## updating: myproject/ (stored 0%)
## updating: myproject/.gitignore (stored 0%)
## updating: myproject/data/ (stored 0%)
## updating: myproject/data/processed/ (stored 0%)
## updating: myproject/data/raw/ (stored 0%)
## updating: myproject/output/ (stored 0%)
## updating: myproject/README.md (stored 0%)
## updating: myproject/run_analysis.R (stored 0%)
## updating: myproject/src/ (stored 0%)
We can compress multiple directories using zip. The names of the directories must be separated by a space as shown in the below example where we compress myproject and mypackage into a single ZIP archive.
zip -r packproj.zip myproject mypackage
ls## updating: myproject/ (stored 0%)
## updating: myproject/.gitignore (stored 0%)
## updating: myproject/data/ (stored 0%)
## updating: myproject/data/processed/ (stored 0%)
## updating: myproject/data/raw/ (stored 0%)
## updating: myproject/output/ (stored 0%)
## updating: myproject/README.md (stored 0%)
## updating: myproject/run_analysis.R (stored 0%)
## updating: myproject/src/ (stored 0%)
## updating: mypackage/ (stored 0%)
## updating: mypackage/.gitignore (stored 0%)
## updating: mypackage/.Rbuildignore (stored 0%)
## updating: mypackage/data/ (stored 0%)
## updating: mypackage/DESCRIPTION (stored 0%)
## updating: mypackage/docs/ (stored 0%)
## updating: mypackage/LICENSE (stored 0%)
## updating: mypackage/man/ (stored 0%)
## updating: mypackage/NAMESPACE (stored 0%)
## updating: mypackage/NEWS.md (stored 0%)
## updating: mypackage/R/ (stored 0%)
## updating: mypackage/README.md (stored 0%)
## updating: mypackage/src/ (stored 0%)
## updating: mypackage/tests/ (stored 0%)
## updating: mypackage/vignettes/ (stored 0%)
8.3.4 Add
To add a new file/folder to an existing archive, specify the name of the archive followed by the name of the file or the folder. In the below example, we add the bash.sh file to the myproject.zip archive created in a previous step.
zip myproject.zip bash.sh## updating: bash.sh (deflated 78%)
8.4 R Functions
8.4.1 tar & tar.gz
In R, we can use the tar() and untar() functions from the utils package to handle .tar and .tar.gz archives.
| Command | R |
|---|---|
tar tvf
|
utils::untar('archive.tar', list = TRUE)
|
tar tvfz
|
utils::untar('archive.tar.gz', list = TRUE)
|
tar xvf
|
utils::untar('archive.tar')
|
tar xvfz
|
utils::untar('archive.tar.gz')
|
tar cvf
|
utils::tar('archive.tar')
|
tar cvfz
|
utils::tar('archive.tar', compression = 'gzip')
|
8.4.2 zip & gzip
The zip package has the functionalities to handle ZIP archives. The tar() and untar() functions from the utils package can handle GZIP archives.
| Command | R |
|---|---|
gzip
|
utils::tar(compression = 'gzip' / R.utils::gzip()
|
gzip -d
|
utils::untar() / R.utils::gunzip()
|
gzip -c
|
utils::untar(exdir = filename)
|
zip -r
|
zip::zip()
|
zip
|
zip::zipr_append()
|
unzip
|
zip::unzip()
|
unzip -d
|
zip::unzip(exdir = dir_name)
|
unzip -l
|
zip::zip_list()
|