mirror of
https://gitea.com/mcereda/oam.git
synced 2026-02-09 05:44:23 +00:00
2.4 KiB
2.4 KiB
Grep
TL;DR
# base search
grep 'pattern' path/to/search
# recursive search
grep -R 'pattern' path/to/search/recursively
grep -R --exclude-dir excluded/dir 'pattern' path/to/search/recursively # gnu grep >= 2.5.2
# show line numbers
grep -n 'pattern' path/to/search
# parallel execution
# mind the files with spaces in their name
find . -type f | parallel -j 100% grep 'pattern'
find . -type f -print0 | xargs -0 -n 1 -P $(nproc) grep 'pattern'
Grep variants
egrepto use regular expressions in search patterns, same asgrep -Efgrepto use patterns as fixed strings, same asgrep -F- archive-related variants for searching into compressed files
pdfgrepfor searching into PDF files
Archive-related variants
PDFgrep
For simple searches, you might want to use [pdfgrep].
Should you need more advanced grep capabilities not incorporated by pdfgrep, you might want to convert the file to text and search there.
You can to this using pdftotext as shown in this example ([source][stackoverflow answer about how to search contents of multiple pdf files]):
find /path -name '*.pdf' -exec sh -c 'pdftotext "{}" - | grep --with-filename --label="{}" --color "your pattern"' ';'
Gotchas
-
Standard editions of
greprun in a single thread; use another executor likeparallelorxargsto parallelize grepping multiple files:find . -type f | parallel -j 100% grep 'pattern' find . -type f -print0 | xargs -0 -n 1 -P $(nproc) grep 'pattern'mind files with spaces in their name.
Further readings
- [Grep the standard error stream]
- Knowledge base on [pdfgrep]
[grep the standard error stream]: grep\ the\ standard\ error\ stream.md [pdfgrep]: pdfgrep.md