Using Grep

By | July 20, 2009

GREP

grep essentially prints lines matching a pattern.
grep –options pattern file
The grep command has a lot of useful option, including:

  • -c —- This option shows only a numeric count of the matches found, no output of filenames or matches.
  • -C # —- This option surrounds the matched string with X number of lines of context.
  • -H —- This option prints the filename for each match; it’s useful when you want to then edit that file.
  • -h —- This option suppresses the filename display for each file.
  • -i —- This option searches for the pattern with no case-sensitivity; all matches are shown.
  • -l —- This option shows only the filename of the matching file; no lines of matching output are shown.
  • -L —- This option displays the filename of files that don’t have a match for the string.
  • -w —- This option selects only lines that have the string as a whole word, not part of another word.
  • -r —- This option reads and processes all the directories specified, along with all the files in them.
  • -x —- This option causes only exact line matches to be returned; every character on the line must match.
  • -v —- This option shows all the lines in a file that don’t match the string; this is the exact opposite of the default behaviour.

Examples:

ls /etc | grep rc

Lists all files that have rc in them

grep “unix” *.htm

Search all .htm files in the current directory for any reference of unix and give results similar to the below example text.

asoftwar.htm:hre=”win95.htm”>Windows 95</a>,<ahref=”unix.htm”>Unix</a>,<a href=”msdox.htm”>MS-DOS</a>,

asoftwar.htm:<td><font face=”Times New Roman”><a name=Ü”></a><a href=ünix.htm”><strong>Unix</strong></a></font></td>

learnhtm.htm:<a href=”unix.htm”>Unix help</a><br>

os.htm:<a href=”unix.htm”>Unix</a><br>

As seen above the grep command has found references of UNIX in some of the HTML files in our home directory.  The file name that contains UNIX is listed at the beginning of the line followed by a colon and the text continuing UNIX.

Argument list too long

When using grep or other commands that requires a listing or search through several thousand files you may get the “Argument list to long” or */bin/grep: Argument list too long.” error.  When this occurs you may want to use a command similar to the below, using the find command and xargs command in conjunction with the grep.

find Member/ -type f –pring() | xargs -0 grep “examplestring”

In the above example the find command finds all files in the Members directory each file that is found is then searched using grep for the text “examplestring”.  This above example had no problems searching over 100 thousand files.

A more complex example of grep being used is to combine it with another command, such as find:

find / -name readme –exec grep –iw kernel {} \;

The previous command finds all the files on the system named readme and then executes the grep command on each file, searching for any instance of the whole word kernel regardless of case.  A whole word search finds the string kernel but not kernels.

Using Regular Expressions and grep

Using grep to find particular words and phrases can be difficult unless you use regular expressions.  A regular expression has the ability to search for something that you don’t know exactly, either through partial strings or using the following special characters:

. Matches single character.
* Wild character Example C* if found would pull up CC or CAT…
{} Matches any character contained within the bracket.
^ Represents the beginning of the line, so if you did ^T it would search for any sentence starting with a T.
$ Represents the end of the line, so if you did $, then it would pull up any lines that ended with .
\ Means to take the next character serious so you could search for C\ C.

Note: Be careful using the characters $, *, [,^,|,(,), and \ in the pattern list because they are also meaningful to the shell.  It is safest to enclose the entire pattern list in single quotes ‘…’.

Examples:

Finding just the word kernel in the source tree with the following command:

grep –rl Kernel /usr/src/linux-2.4 | wc –l

The command finds 331 files that contain at least one match for Kernel.  Now try finding just the word Kernel as a whole word with this command:

grep –rlw Kernel /usr/src/linux-2.4 | wc –l

322
Now try the same command again, but modify it so that the word Kernel is searched for, but only followed by a period:

grep –rwl Kernel\, /usr/src/linux-2.4 | wc –l

or at the beginning of a line

grep –rwl ^Kernel /usr/src/linux-2.4 | wc –l

Another example of regular expressions in action in searching for a particular phrase or word, but not another that is very similar.

The following file is watch.txt and contains the following lines:

01 The first sentence contains broad

02 The second contains bring

03 The third contains brush

04 The fourth has BRIDGE as the last word: bridge

broad 05 the fifth begins with BROAD

06 The sixth contains none of the four

07 This contains bringing, broadened, brushed

To find all the words that begin with br but exclude any that have the third letter as I, you’d use the following command:

grep “\,br[î]”watch.txt”

returns:

01 The first sentence contains BROAD

03 The third contains brush

broad 05 The fifth begins with BROAD

The \< string just means that the word begins with those letters.  The use of the [^î} characters is to find all but the letter I in that position.  If you used a ^ in front of a search term inside a program such as vi, it would search at the front of a line, but using the ^ symbol inside a set of square brackets is to exclude that character from being found.

To find a set of words that ends with a certain set, use this command:

grep “ad\>” watch.txt

returns:

01 The first sentence contains broad

broad 05 The fifth begins with BROAD

As with the previous example, using the \> characters on the end of a search looks for words that end in that string.

Search strings that grep allows include:

  • broad —- Searches for exactly broad, but as part of other words (such as broadway or broadening) unless you use –w to cause broad to be searched for a standalone word.
  • ^broad —- Searches for the word broad at the beginning of any line.
  • broad$ —- Searches for the word broad at the end of the line.
  • [bB]road —- Searches for the words broad and Broad.
  • br[iou]ng —- Searches for bring, brong, and brung.
  • br[^i]ng —- Searches for and returns all but bring.
  • ^……$ —- Searches for any line that contains exactly six characters.
  • [bB][rR]in[gG] —- Searches for Bring, Bring, BRinG, or any combination.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.