Chapter 5 Search & Regular Expression
In this chapter, we will explore commands that will
- search for a given string in a file
- find files using names
- and search for binary executable files
Command | Description |
---|---|
grep
|
Search for a given string in a file |
find
|
Find files using filenames |
which
|
Search for binary executable files |
5.1 grep
The grep
command is used for pattern matching. Along with additional options, it can be used to
- match pattern in input text
- ignore case
- search recursively for an exact string
- print filename and line number for each match
- invert match for excluding specific strings
grep
processes text line by line, and prints any lines which match a specified pattern. grep
, which stands for global regular expression print is a powerful tool for matching a regular expression against text in a file, multiple files, or a stream of input.
Command | Description |
---|---|
grep
|
Matches pattern in input text |
grep -i
|
Ignore case |
grep -RI
|
Search recursively for an exact string |
grep -E
|
Use extended regular expression |
grep -Hn
|
Print file name & corresponding line number for each match |
grep -v
|
Invert match for excluding specific strings |
5.1.1 Match Pattern in Input Text
Using grep
let us search for packages that inlcude the letter R
in their names.
grep R package_names.txt
## 14. RJDBC
## 30. logNormReg
## 27. gLRTH
## 35. fermicatsR
## 42. OptimaRegion
## 61. PropScrRand
## 25. RPyGeo
## 47. SMARTp
## 24. SCRT
## 56. MARSS
## 85. edfReader
## 32. SPEDInstabR
## 98. SmallCountRounding
5.1.2 Ignore Case
In the previous case, grep
returned only those packages whose name included R
but not r
i.e. it did not ignore the case of the letter. Using the -i
option, we will now search while ignoring the case of the letter.
grep -i R package_names.txt
## 14. RJDBC
## 58. alfr
## 64. viridisLite
## 99. Survgini
## 30. logNormReg
## 27. gLRTH
## 71. kfigr
## 72. overlapping
## 90. widyr
## 33. tailr
## 40. MaxentVariableSelection
## 33. tailr
## 72. overlapping
## 16. randtests
## 12. ltxsparklines
## 91. rgw
## 35. fermicatsR
## 21. corclass
## 68. AzureStor
## 42. OptimaRegion
## 61. PropScrRand
## 74. crsra
## 80. SpatioTemporal
## 23. disparityfilter
## 49. SemiParSampleSel
## 76. errorlocate
## 88. SphericalK
## 28. splithalfr
## 89. foretell
## 25. RPyGeo
## 50. mbir
## 51. interplot
## 6. BinOrdNonNor
## 47. SMARTp
## 38. BenfordTests
## 79. mvShapiroTest
## 92. BioCircos
## 55. hindexcalculator
## 41. rstudioapi
## 57. generalhoslem
## 24. SCRT
## 95. TSeriesMMA
## 82. breakfast
## 56. MARSS
## 70. rsed
## 68. AzureStor
## 85. edfReader
## 20. rless
## 75. pmdplyr
## 32. SPEDInstabR
## 3. redcapAPI
## 70. rsed
## 98. SmallCountRounding
5.1.3 Highlight
The --color
option will highlight the matched strings.
grep -i --color R package_names.txt
## 14. RJDBC
## 58. alfr
## 64. viridisLite
## 99. Survgini
## 30. logNormReg
## 27. gLRTH
## 71. kfigr
## 72. overlapping
## 90. widyr
## 33. tailr
## 40. MaxentVariableSelection
## 33. tailr
## 72. overlapping
## 16. randtests
## 12. ltxsparklines
## 91. rgw
## 35. fermicatsR
## 21. corclass
## 68. AzureStor
## 42. OptimaRegion
## 61. PropScrRand
## 74. crsra
## 80. SpatioTemporal
## 23. disparityfilter
## 49. SemiParSampleSel
## 76. errorlocate
## 88. SphericalK
## 28. splithalfr
## 89. foretell
## 25. RPyGeo
## 50. mbir
## 51. interplot
## 6. BinOrdNonNor
## 47. SMARTp
## 38. BenfordTests
## 79. mvShapiroTest
## 92. BioCircos
## 55. hindexcalculator
## 41. rstudioapi
## 57. generalhoslem
## 24. SCRT
## 95. TSeriesMMA
## 82. breakfast
## 56. MARSS
## 70. rsed
## 68. AzureStor
## 85. edfReader
## 20. rless
## 75. pmdplyr
## 32. SPEDInstabR
## 3. redcapAPI
## 70. rsed
## 98. SmallCountRounding
5.1.4 Print Filename
If there is more than one file to search, use the -H
option to print the filename for each match.
grep -i --color -H bio package_names.txt
## package_names.txt:84. BIOMASS
## package_names.txt:92. BioCircos
## package_names.txt:7. bayesbio
5.1.5 Print Corresponding Line Number
The -n
option will print the corresponding line number of the match in the file.
grep -i --color -n bio package_names.txt
## 59:84. BIOMASS
## 71:92. BioCircos
## 88:7. bayesbio
5.1.6 Print Filename & Line Number
Let us print both the file name and the line number for each match.
grep -i --color -Hn R package_names.txt
## package_names.txt:1:14. RJDBC
## package_names.txt:3:58. alfr
## package_names.txt:8:64. viridisLite
## package_names.txt:14:99. Survgini
## package_names.txt:15:30. logNormReg
## package_names.txt:16:27. gLRTH
## package_names.txt:18:71. kfigr
## package_names.txt:20:72. overlapping
## package_names.txt:21:90. widyr
## package_names.txt:22:33. tailr
## package_names.txt:23:40. MaxentVariableSelection
## package_names.txt:26:33. tailr
## package_names.txt:27:72. overlapping
## package_names.txt:30:16. randtests
## package_names.txt:31:12. ltxsparklines
## package_names.txt:32:91. rgw
## package_names.txt:33:35. fermicatsR
## package_names.txt:37:21. corclass
## package_names.txt:38:68. AzureStor
## package_names.txt:41:42. OptimaRegion
## package_names.txt:42:61. PropScrRand
## package_names.txt:43:74. crsra
## package_names.txt:51:80. SpatioTemporal
## package_names.txt:52:23. disparityfilter
## package_names.txt:54:49. SemiParSampleSel
## package_names.txt:55:76. errorlocate
## package_names.txt:57:88. SphericalK
## package_names.txt:61:28. splithalfr
## package_names.txt:62:89. foretell
## package_names.txt:63:25. RPyGeo
## package_names.txt:64:50. mbir
## package_names.txt:65:51. interplot
## package_names.txt:66:6. BinOrdNonNor
## package_names.txt:67:47. SMARTp
## package_names.txt:68:38. BenfordTests
## package_names.txt:69:79. mvShapiroTest
## package_names.txt:71:92. BioCircos
## package_names.txt:75:55. hindexcalculator
## package_names.txt:78:41. rstudioapi
## package_names.txt:80:57. generalhoslem
## package_names.txt:84:24. SCRT
## package_names.txt:85:95. TSeriesMMA
## package_names.txt:87:82. breakfast
## package_names.txt:96:56. MARSS
## package_names.txt:97:70. rsed
## package_names.txt:98:68. AzureStor
## package_names.txt:100:85. edfReader
## package_names.txt:101:20. rless
## package_names.txt:102:75. pmdplyr
## package_names.txt:103:32. SPEDInstabR
## package_names.txt:104:3. redcapAPI
## package_names.txt:106:70. rsed
## package_names.txt:107:98. SmallCountRounding
5.1.7 Invert Match
Use the -v
option to select non-matching lines. In the below example, we search for packages whose name does not include R
while ignoring the case.
grep -v -i R package_names.txt
## 36. mlflow
## 10. aweek
## 31. BIGDAWG
## 22. vqtl
## 29. sspline
## 39. mev
## 66. SuppDists
## 15. MIAmaxent
## 31. BIGDAWG
## 29. sspline
## 60. Eagle
## 83. WPKDE
## 11. hdnom
## 26. blink
## 18. gazepath
## 52. ClimMobTools
## 44. expstudies
## 65. mined
## 81. mgcViz
## 45. solitude
## 9. pAnalysis
## 65. mined
## 94. ICAOD
## 48. geoknife
## 45. solitude
## 67. tictactoe
## 46. cbsem
## 93. PathSelectMP
## 96. poisbinom
## 17. ASIP
## 5. pls
## 84. BIOMASS
## 59. AdMit
## 77. SetMethods
## 53. MVB
## 2. odk
## 86. mongolite
## 4. TIMP
## 97. AnalyzeTS
## 87. WGScan
## 63. dagitty
## 69. FField
## 13. MaXact
## 73. VineCopula
## 7. bayesbio
## 34. ibd
## 8. MVTests
## 19. mcmcabn
## 43. accept
## 78. sybilccFBA
## 62. lue
## 100. addhaz
## 37. CombinePValue
## 1. cyclocomp
## 54. OxyBS
5.1.8 Recursive Search
Use the -r
option to search recursively. In the below example, we search all files with the .txt
extension for the string bio
while ignoring the case.
grep -i --color -r bio *.txt
## package_names.txt:84. BIOMASS
## package_names.txt:92. BioCircos
## package_names.txt:7. bayesbio
## pkg_names.txt:BIOMASS
## pkg_names.txt:BioCircos
## pkg_names.txt:BIOMASS
## pkg_names.txt:bayesbio
5.2 find
The find
command can be used for searching files and directories. Using additional options, we can
- search files by extension type
- ignore case while searching files/directories
find
is a powerful tool for working with the files. It can be used on its own to locate files, or in conjunction with other programs to perform operations on those files.
Command | Description |
---|---|
find
|
Find files or directories under the given directory; recursively |
find -name '*.txt'
|
Find files by extension |
find -type d -iname
|
Find directories matching a given name, in case-insensitive mode |
find -type d -name
|
Find directories matching a given name, in case-sensitive mode |
5.2.1 Search Recursively
Let us use find
to search for the file release_names.txt
recursively. The -name
option is used to specify the name of the file we are searching.
find -name release_names.txt
## ./bash-tutorial/myfiles/release_names.txt
## ./bash-tutorial/release_names.txt
## ./release_names.txt
## ./r_releases/release_names.txt
There are two files with the name release_names.txt
present in the current working directory and in r_releases
directory.
5.2.2 Search by Extension
Let us search for all files with .txt
extension in the r_releases
folder.
find r_releases -name '*.txt'
## r_releases/release_names.txt
## r_releases/release_names_2.txt
## r_releases/release_names_3.txt
There are 3 files with the .txt
extension in r_releases
folder.
5.2.3 Case-insensitive Mode
Search for all folders with the name R
or r
. Here we use the -iname
option to ignore case while searching. The -type
option is used to specify whether we are searching for files or folders. Since we are searching for folder/directory, we use it along with d
i.e. directory to indicate that we are searching for directories and not files.
find -type d -iname R
## ./bash-tutorial/mypackage/R
## ./bash-tutorial/r
## ./mypackage/R
## ./r
## ./r2/r
5.2.4 Case-sensitive Mode
Search for all folders with the name r
. It should exclude any folder with the name R
.
find -type d -name r
## ./bash-tutorial/r
## ./r
## ./r2/r