Files
oam/knowledge base/grep.md
2022-05-01 11:11:01 +02:00

2.4 KiB

Grep

TL;DR

# base search
grep 'pattern' path/to/search

# recursive search
grep -R 'pattern' path/to/search/recursively
grep -R --exclude-dir excluded/dir 'pattern' path/to/search/recursively   # gnu grep >= 2.5.2

# show line numbers
grep -n 'pattern' path/to/search

# parallel execution
# mind the files with spaces in their name
find . -type f | parallel -j 100% grep 'pattern'
find . -type f -print0 | xargs -0 -n 1 -P $(nproc) grep 'pattern'

Grep variants

  • egrep to use regular expressions in search patterns, same as grep -E
  • fgrep to use patterns as fixed strings, same as grep -F
  • archive-related variants for searching into compressed files
  • pdfgrep for searching into PDF files
  • xzgrep (with xzegrep and xzfgrep)
  • zstdgrep for zstd archives
  • many many others

PDFgrep

For simple searches, you might want to use [pdfgrep].

Should you need more advanced grep capabilities not incorporated by pdfgrep, you might want to convert the file to text and search there.
You can to this using pdftotext as shown in this example ([source][stackoverflow answer about how to search contents of multiple pdf files]):

find /path -name '*.pdf' -exec sh -c 'pdftotext "{}" - | grep --with-filename --label="{}" --color "your pattern"' ';'

Gotchas

  • Standard editions of grep run in a single thread; use another executor like parallel or xargs to parallelize grepping multiple files:

    find . -type f | parallel -j 100% grep 'pattern'
    find . -type f -print0 | xargs -0 -n 1 -P $(nproc) grep 'pattern'
    

    mind files with spaces in their name.

Further readings

  • [Grep the standard error stream]
  • Knowledge base on [pdfgrep]

[grep the standard error stream]: grep\ the\ standard\ error\ stream.md [pdfgrep]: pdfgrep.md

Sources