Chapter 5 Search & Regular Expression

In this chapter, we will explore commands that will

  • search for a given string in a file
  • find files using names
  • and search for binary executable files
Command Description
grep Search for a given string in a file
find Find files using filenames
which Search for binary executable files

5.1 grep

The grep command is used for pattern matching. Along with additional options, it can be used to

  • match pattern in input text
  • ignore case
  • search recursively for an exact string
  • print filename and line number for each match
  • invert match for excluding specific strings

grep processes text line by line, and prints any lines which match a specified pattern. grep, which stands for global regular expression print is a powerful tool for matching a regular expression against text in a file, multiple files, or a stream of input.

Command Description
grep Matches pattern in input text
grep -i Ignore case
grep -RI Search recursively for an exact string
grep -E Use extended regular expression
grep -Hn Print file name & corresponding line number for each match
grep -v Invert match for excluding specific strings

5.1.1 Match Pattern in Input Text

Using grep let us search for packages that inlcude the letter R in their names.

grep R package_names.txt
## 14. RJDBC
## 30. logNormReg
## 27. gLRTH
## 35. fermicatsR
## 42. OptimaRegion
## 61. PropScrRand
## 25. RPyGeo
## 47. SMARTp
## 24. SCRT
## 56. MARSS
## 85. edfReader
## 32. SPEDInstabR
## 98. SmallCountRounding

5.1.2 Ignore Case

In the previous case, grep returned only those packages whose name included R but not r i.e. it did not ignore the case of the letter. Using the -i option, we will now search while ignoring the case of the letter.

grep -i R package_names.txt
## 14. RJDBC
## 58. alfr
## 64. viridisLite
## 99. Survgini
## 30. logNormReg
## 27. gLRTH
## 71. kfigr
## 72. overlapping
## 90. widyr
## 33. tailr
## 40. MaxentVariableSelection
## 33. tailr
## 72. overlapping
## 16. randtests
## 12. ltxsparklines
## 91. rgw
## 35. fermicatsR
## 21. corclass
## 68. AzureStor
## 42. OptimaRegion
## 61. PropScrRand
## 74. crsra
## 80. SpatioTemporal
## 23. disparityfilter
## 49. SemiParSampleSel
## 76. errorlocate
## 88. SphericalK
## 28. splithalfr
## 89. foretell
## 25. RPyGeo
## 50. mbir
## 51. interplot
## 6. BinOrdNonNor
## 47. SMARTp
## 38. BenfordTests
## 79. mvShapiroTest
## 92. BioCircos
## 55. hindexcalculator
## 41. rstudioapi
## 57. generalhoslem
## 24. SCRT
## 95. TSeriesMMA
## 82. breakfast
## 56. MARSS
## 70. rsed
## 68. AzureStor
## 85. edfReader
## 20. rless
## 75. pmdplyr
## 32. SPEDInstabR
## 3. redcapAPI
## 70. rsed
## 98. SmallCountRounding

5.1.3 Highlight

The --color option will highlight the matched strings.

grep -i --color R package_names.txt
## 14. RJDBC
## 58. alfr
## 64. viridisLite
## 99. Survgini
## 30. logNormReg
## 27. gLRTH
## 71. kfigr
## 72. overlapping
## 90. widyr
## 33. tailr
## 40. MaxentVariableSelection
## 33. tailr
## 72. overlapping
## 16. randtests
## 12. ltxsparklines
## 91. rgw
## 35. fermicatsR
## 21. corclass
## 68. AzureStor
## 42. OptimaRegion
## 61. PropScrRand
## 74. crsra
## 80. SpatioTemporal
## 23. disparityfilter
## 49. SemiParSampleSel
## 76. errorlocate
## 88. SphericalK
## 28. splithalfr
## 89. foretell
## 25. RPyGeo
## 50. mbir
## 51. interplot
## 6. BinOrdNonNor
## 47. SMARTp
## 38. BenfordTests
## 79. mvShapiroTest
## 92. BioCircos
## 55. hindexcalculator
## 41. rstudioapi
## 57. generalhoslem
## 24. SCRT
## 95. TSeriesMMA
## 82. breakfast
## 56. MARSS
## 70. rsed
## 68. AzureStor
## 85. edfReader
## 20. rless
## 75. pmdplyr
## 32. SPEDInstabR
## 3. redcapAPI
## 70. rsed
## 98. SmallCountRounding

5.1.7 Invert Match

Use the -v option to select non-matching lines. In the below example, we search for packages whose name does not include R while ignoring the case.

grep -v -i R package_names.txt
## 36. mlflow
## 10. aweek
## 31. BIGDAWG
## 22. vqtl
## 29. sspline
## 39. mev
## 66. SuppDists
## 15. MIAmaxent
## 31. BIGDAWG
## 29. sspline
## 60. Eagle
## 83. WPKDE
## 11. hdnom
## 26. blink
## 18. gazepath
## 52. ClimMobTools
## 44. expstudies
## 65. mined
## 81. mgcViz
## 45. solitude
## 9. pAnalysis
## 65. mined
## 94. ICAOD
## 48. geoknife
## 45. solitude
## 67. tictactoe
## 46. cbsem
## 93. PathSelectMP
## 96. poisbinom
## 17. ASIP
## 5. pls
## 84. BIOMASS
## 59. AdMit
## 77. SetMethods
## 53. MVB
## 2. odk
## 86. mongolite
## 4. TIMP
## 97. AnalyzeTS
## 87. WGScan
## 63. dagitty
## 69. FField
## 13. MaXact
## 73. VineCopula
## 7. bayesbio
## 34. ibd
## 8. MVTests
## 19. mcmcabn
## 43. accept
## 78. sybilccFBA
## 62. lue
## 100. addhaz
## 37. CombinePValue
## 1. cyclocomp
## 54. OxyBS

5.2 find

The find command can be used for searching files and directories. Using additional options, we can

  • search files by extension type
  • ignore case while searching files/directories

find is a powerful tool for working with the files. It can be used on its own to locate files, or in conjunction with other programs to perform operations on those files.

Command Description
find Find files or directories under the given directory; recursively
find -name '*.txt' Find files by extension
find -type d -iname Find directories matching a given name, in case-insensitive mode
find -type d -name Find directories matching a given name, in case-sensitive mode

5.2.1 Search Recursively

Let us use find to search for the file release_names.txt recursively. The -name option is used to specify the name of the file we are searching.

find -name release_names.txt
## ./bash-tutorial/myfiles/release_names.txt
## ./bash-tutorial/release_names.txt
## ./release_names.txt
## ./r_releases/release_names.txt

There are two files with the name release_names.txt present in the current working directory and in r_releases directory.

5.2.2 Search by Extension

Let us search for all files with .txt extension in the r_releases folder.

find r_releases -name '*.txt'
## r_releases/release_names.txt
## r_releases/release_names_2.txt
## r_releases/release_names_3.txt

There are 3 files with the .txt extension in r_releases folder.

5.2.3 Case-insensitive Mode

Search for all folders with the name R or r. Here we use the -iname option to ignore case while searching. The -type option is used to specify whether we are searching for files or folders. Since we are searching for folder/directory, we use it along with d i.e. directory to indicate that we are searching for directories and not files.

find -type d -iname R
## ./bash-tutorial/mypackage/R
## ./bash-tutorial/r
## ./mypackage/R
## ./r
## ./r2/r

5.2.4 Case-sensitive Mode

Search for all folders with the name r. It should exclude any folder with the name R.

find -type d -name r
## ./bash-tutorial/r
## ./r
## ./r2/r