Chapter 8 File Compression
In this chapter, we will explore file compression using the following
tar
gzip
zip
unzip
8.1 tar
The tar
command is used for file compression. It works with both tar
and
tar.gz
extensions. It is used to
- list files
- extract files
- create archives
- append file to existing archives
tar
creates, maintains, modifies, and extracts files that are archived in the tar format. Tar stands for tape archive and is an archiving file format.
Command | Description |
---|---|
tar tvf
|
List an archive |
tar tvfz
|
List a gzipped archive |
tar xvf
|
Extract an archive |
tar xvfz
|
Extract a gzipped archive |
tar cvf
|
Create an uncompressed tar archive |
tar cvfz
|
Create a tar gzipped archive |
tar rvf
|
Add a file to an existing archive |
tar rvfz
|
Add a file to an existing gzipped archive |
We will use different options along with the tar
command for listing, extracting, creating and adding files. The vf
(v stands for verbosely show .tar file progress and f stands for file name type of the archive file) option is common for all the above operations while the following are specific.
t
for listingx
for extractingc
for creatingr
for adding files
While dealing with tar.gz
archives we will use z
in addition to vf
and the above options.
8.1.1 List
Let us list all the files & folders in release_names.tar
. As mentioned above. to list the files in the archive, we use the t
option.
tar -tvf release_names.tar
## -rwxrwxrwx aravind/aravind 546 2019-09-16 15:59 release_names.txt
## -rwxrwxrwx aravind/aravind 65 2019-09-16 15:58 release_names_18.txt
## -rwxrwxrwx aravind/aravind 53 2019-09-16 15:59 release_names_19.txt
8.1.2 Extract
Let us extract files from release_names.tar
using the x
option in addition to vf
.
tar -xvf release_names.tar
ls
## release_names.txt
## release_names_18.txt
## release_names_19.txt
## analysis.R
## bash-tutorial
## bash.R
## bash.Rmd
## bash.sh
## imports_blorr.txt
## imports_olsrr.txt
## lorem-ipsum.txt
## main_project.zip
## myfiles
## mypackage
## myproject
## myproject.zip
## myproject3
## myproject4
## package_names.txt
## packproj.zip
## pkg_names.tar
## pkg_names.txt
## r
## r2
## r_releases
## release_names.tar
## release_names.tar.gz
## release_names.txt
## release_names_18.txt
## release_names_18_19.txt
## release_names_19.txt
## releases.txt.gz
## sept_15.csv.gz
## urls.txt
## zip_example.zip
8.1.3 Add
To add a file to an existing archive, use the r
option. Let us add release_names_18.txt
and release_names_19.txt
to the archive we created in the previous step.
tar -rvf release_names.tar release_names_18.txt release_names_19.txt
## release_names_18.txt
## release_names_19.txt
8.1.4 Create
Using the c
option we can create tar archives. In the below example, we are using a single file but you can specify multiple files and folders as well.
tar -cvf pkg_names.tar pkg_names.txt
## pkg_names.txt
8.2 gzip
Command | Description |
---|---|
gzip
|
Compress a file |
gzip -d
|
Decompress a file |
gzip -c
|
Compress a file and specify the output file name |
zip -r
|
Compress a directory |
zip
|
Add files to an existing zip file |
unzip
|
Extract files from a zip files |
unzip -d
|
Extract files from a zip file and specify the output file name |
unzip -l
|
List contents of a zip file |
gzip
, gunzip
, and zcat
commands are used to compress or expand files in the GNU GZIP format i.e. files with .gz
extension
8.2.1 Compress
Let us compress release_names.txt
file using gzip
.
gzip release_names.txt
ls
## analysis.R
## bash-tutorial
## bash.R
## bash.Rmd
## bash.sh
## imports_blorr.txt
## imports_olsrr.txt
## lorem-ipsum.txt
## main_project.zip
## myfiles
## mypackage
## myproject
## myproject.zip
## myproject3
## myproject4
## package_names.txt
## packproj.zip
## pkg_names.tar
## pkg_names.txt
## r
## r2
## r_releases
## release_names.tar
## release_names.tar.gz
## release_names.txt.gz
## release_names_18.txt
## release_names_18_19.txt
## release_names_19.txt
## releases.txt.gz
## sept_15.csv.gz
## urls.txt
## zip_example.zip
8.2.2 Decompress
Use the -d
option with gzip
to decompress a file. In the below example, we decompress the sept_15.csv.gz
file (downloaded using wget
or curl
earlier). You can also use gunzip
for the same result.
gzip -d sept_15.csv.gz
ls
## analysis.R
## bash-tutorial
## bash.R
## bash.Rmd
## bash.sh
## imports_blorr.txt
## imports_olsrr.txt
## lorem-ipsum.txt
## main_project.zip
## myfiles
## mypackage
## myproject
## myproject.zip
## myproject3
## myproject4
## package_names.txt
## packproj.zip
## pkg_names.tar
## pkg_names.txt
## r
## r2
## r_releases
## release_names.tar
## release_names.tar.gz
## release_names.txt
## release_names_18.txt
## release_names_18_19.txt
## release_names_19.txt
## releases.txt.gz
## sept_15.csv
## urls.txt
## zip_example.zip
8.2.3 Specify Filename
Use -c
and >
to specify a different file name while compressing using gzip
. In the below example, gzip
will create releases.txt.gz
instead of release_names.txt.gz
.
gzip -c release_names.txt > releases.txt.gz
ls
## analysis.R
## bash-tutorial
## bash.R
## bash.Rmd
## bash.sh
## imports_blorr.txt
## imports_olsrr.txt
## lorem-ipsum.txt
## main_project.zip
## myfiles
## mypackage
## myproject
## myproject.zip
## myproject3
## myproject4
## package_names.txt
## packproj.zip
## pkg_names.tar
## pkg_names.txt
## r
## r2
## r_releases
## release_names.tar
## release_names.tar.gz
## release_names.txt
## release_names_18.txt
## release_names_18_19.txt
## release_names_19.txt
## releases.txt.gz
## sept_15.csv
## urls.txt
## zip_example.zip
8.3 zip & unzip
zip
creates ZIP archives while unzip
lists and extracts compressed files in a ZIP archive.
8.3.1 List
Let us list all the files and folders in main_project.zip()
using unzip
and the -l
option.
unzip -l main_project.zip
## Archive: main_project.zip
## Length Date Time Name
## --------- ---------- ----- ----
## 0 2019-09-23 18:07 myproject/
## 0 2019-09-20 14:02 myproject/.gitignore
## 0 2019-09-23 18:07 myproject/data/
## 0 2019-09-20 14:02 myproject/data/processed/
## 0 2019-09-20 14:02 myproject/data/raw/
## 0 2019-09-20 14:02 myproject/output/
## 0 2019-09-20 14:02 myproject/README.md
## 13 2019-09-20 14:02 myproject/run_analysis.R
## 0 2019-09-20 14:02 myproject/src/
## 0 2019-09-23 18:07 mypackage/
## 0 2019-09-20 14:11 mypackage/.gitignore
## 0 2019-09-20 14:11 mypackage/.Rbuildignore
## 0 2019-09-20 14:10 mypackage/data/
## 0 2019-09-20 14:11 mypackage/DESCRIPTION
## 0 2019-09-20 14:10 mypackage/docs/
## 0 2019-09-20 14:11 mypackage/LICENSE
## 0 2019-09-20 14:10 mypackage/man/
## 0 2019-09-20 14:11 mypackage/NAMESPACE
## 0 2019-09-20 14:11 mypackage/NEWS.md
## 0 2019-09-20 14:10 mypackage/R/
## 0 2019-09-20 14:11 mypackage/README.md
## 0 2019-09-20 14:11 mypackage/src/
## 0 2019-09-20 14:10 mypackage/tests/
## 0 2019-09-20 14:10 mypackage/vignettes/
## 0 2019-09-23 18:07 myfiles/
## 12 2019-09-20 15:30 myfiles/analysis.R
## 7 2019-09-20 15:31 myfiles/NEWS.md
## 9 2019-09-20 15:31 myfiles/README.md
## 546 2019-09-20 15:29 myfiles/release_names.txt
## 65 2019-09-20 15:29 myfiles/release_names_18.txt
## 53 2019-09-20 15:30 myfiles/release_names_19.txt
## 12 2019-09-20 15:30 myfiles/visualization.R
## 15333 2019-10-01 16:58 bash.sh
## 0 2019-09-16 12:42 r/
## --------- -------
## 16050 34 files
8.3.2 Extract
Using unzip
, let us now extract files and folders from zip_example.zip
.
unzip zip_example.zip
Using the -d
option, we can extract the contents of zip_example.zip
to a specific folder. In the below example, we extract it to a new folder examples
.
unzip zip_example.zip –d myexamples
8.3.3 Compress
Use the -r
option along with zip
to create a ZIP archive. In the below example, we create a ZIP archive of myproject
folder.
zip -r myproject.zip myproject
ls
## updating: myproject/ (stored 0%)
## updating: myproject/.gitignore (stored 0%)
## updating: myproject/data/ (stored 0%)
## updating: myproject/data/processed/ (stored 0%)
## updating: myproject/data/raw/ (stored 0%)
## updating: myproject/output/ (stored 0%)
## updating: myproject/README.md (stored 0%)
## updating: myproject/run_analysis.R (stored 0%)
## updating: myproject/src/ (stored 0%)
We can compress multiple directories using zip
. The names of the directories must be separated by a space as shown in the below example where we compress myproject
and mypackage
into a single ZIP archive.
zip -r packproj.zip myproject mypackage
ls
## updating: myproject/ (stored 0%)
## updating: myproject/.gitignore (stored 0%)
## updating: myproject/data/ (stored 0%)
## updating: myproject/data/processed/ (stored 0%)
## updating: myproject/data/raw/ (stored 0%)
## updating: myproject/output/ (stored 0%)
## updating: myproject/README.md (stored 0%)
## updating: myproject/run_analysis.R (stored 0%)
## updating: myproject/src/ (stored 0%)
## updating: mypackage/ (stored 0%)
## updating: mypackage/.gitignore (stored 0%)
## updating: mypackage/.Rbuildignore (stored 0%)
## updating: mypackage/data/ (stored 0%)
## updating: mypackage/DESCRIPTION (stored 0%)
## updating: mypackage/docs/ (stored 0%)
## updating: mypackage/LICENSE (stored 0%)
## updating: mypackage/man/ (stored 0%)
## updating: mypackage/NAMESPACE (stored 0%)
## updating: mypackage/NEWS.md (stored 0%)
## updating: mypackage/R/ (stored 0%)
## updating: mypackage/README.md (stored 0%)
## updating: mypackage/src/ (stored 0%)
## updating: mypackage/tests/ (stored 0%)
## updating: mypackage/vignettes/ (stored 0%)
8.3.4 Add
To add a new file/folder to an existing archive, specify the name of the archive followed by the name of the file or the folder. In the below example, we add the bash.sh
file to the myproject.zip
archive created in a previous step.
zip myproject.zip bash.sh
## updating: bash.sh (deflated 78%)
8.4 R Functions
8.4.1 tar & tar.gz
In R, we can use the tar()
and untar()
functions from the utils
package to handle .tar
and .tar.gz
archives.
Command | R |
---|---|
tar tvf
|
utils::untar('archive.tar', list = TRUE)
|
tar tvfz
|
utils::untar('archive.tar.gz', list = TRUE)
|
tar xvf
|
utils::untar('archive.tar')
|
tar xvfz
|
utils::untar('archive.tar.gz')
|
tar cvf
|
utils::tar('archive.tar')
|
tar cvfz
|
utils::tar('archive.tar', compression = 'gzip')
|
8.4.2 zip & gzip
The zip package has the functionalities to handle ZIP archives. The tar()
and untar()
functions from the utils
package can handle GZIP archives.
Command | R |
---|---|
gzip
|
utils::tar(compression = 'gzip' / R.utils::gzip()
|
gzip -d
|
utils::untar() / R.utils::gunzip()
|
gzip -c
|
utils::untar(exdir = filename)
|
zip -r
|
zip::zip()
|
zip
|
zip::zipr_append()
|
unzip
|
zip::unzip()
|
unzip -d
|
zip::unzip(exdir = dir_name)
|
unzip -l
|
zip::zip_list()
|