Files
oam/knowledge base/grep.md
2024-02-17 14:25:53 +01:00

2.9 KiB

Grep

Table of contents

  1. TL;DR
  2. Variants
    1. Archive-related variants
    2. PDFgrep
  3. Gotchas
  4. Further readings
  5. Sources

TL;DR

# Basic search.
grep 'pattern' 'path/to/search'

# Search recursively.
grep -R 'pattern' 'path/to/search/recursively'
grep -R --exclude-dir 'excluded/dir' 'pattern' 'path/to/search/recursively'   # gnu grep >= 2.5.2

# Show line numbers.
grep -n 'pattern' 'path/to/search'

# Only print the part matching the pattern.
ps | grep -o '/.*/fish' | head -n '1'

# Multiple parallel searches.
# Mind files with spaces in their name.
find . -type f | parallel -j +100% grep 'pattern'
find . -type f -print0 | xargs -0 -n 1 -P "$(nproc)" grep 'pattern'

# Highlight numbers in strings.
grep --color '[[:digit:]]' 'file.txt'

Variants

  • egrep to use regular expressions in search patterns, same as grep -E
  • fgrep] to use patterns as fixed strings, same as grep -F
  • archive-related variants for searching into compressed files
  • pdfgrep for searching into PDF files
  • xzgrep (with xzegrep and xzfgrep)
  • zstdgrep for zstd archives
  • many many others

PDFgrep

For simple searches, you might want to use pdfgrep.

Should you need more advanced grep capabilities not incorporated by pdfgrep, you might want to convert the file to text and search there.
You can to this using pdftotext as shown in this example ([source][stackoverflow answer about how to search contents of multiple pdf files]):

find /path -name '*.pdf' -exec sh -c 'pdftotext "{}" - | grep --with-filename --label="{}" --color "your pattern"' ';'

Gotchas

  • Standard editions of grep run in a single thread; use another executor like parallel or xargs to parallelize grepping multiple files:

    find . -type f | parallel -j 100% grep 'pattern'
    find . -type f -print0 | xargs -0 -n 1 -P $(nproc) grep 'pattern'
    

    mind files with spaces in their name.

Further readings

Sources

All the references in the further readings section, plus the following: