lf64, UNIXBasics: Finding files

This article is available in: English Castellano Deutsch Francais Nederlands Portugues Russian Turkce Korean

by Guido Socher
<guido(at)bearix.oche.de>

About the author:
Guido loves linux because it is a free system and it is also a lot of fun to work with people from the Linux community all over the world. He spends his spare time with his girl-friend, listens to BBC World Service radio, rides his bike through the countryside and enjoys playing with Linux.
Content:

Finding files by name
Getting an overview over the file system
Finding files by content (searching for text strings in files)

Finding files

Abstract:

This aricle gives an introduction on how to find files by name and by content.

Finding files by name

Probably you can remember this problem: You had some kind of file and you can't remember where you put it.

This is where the find command comes handy. How to use it? find comes of course with a large man page but let's just look at some "normal cases". To traverse the directory tree starting with the current directory and search for a file named lostfile.txt:

find . -name lostfile.txt -print

find takes also wild cards. Remember to quote the wild-cards otherwise the shell tries to expand them before they get handed over to find. Here is an example:

find . -name "lost*" -print

This command can be quite slow when you need to search in a large directory tree. Here the locate command can help. It does not really search directly for a file in the file system. It searches a database. This is a lot faster but the database might be out of date. In some distributions the located database is updated every night but you can of course run the updatedb command manually from time to time in order to update the database. locate does substring searches.

locate lostfile

This will locate the files lostfile.txt or mylostfile.txt etc...

So far we were searching for files where we knew quite exactly what the file name was. Maybe the file name was not lostfile.txt but lastfile.txt or leastfile.txt or lostfiles.txt or Lotsfile.txt or lostfile.text and you can't remember the name exactly. How can you still find the file? In this case you could use a fault tolerant file find utility called ftff. ftff searches for the file and allows for a number of "spelling errors". This command would find all these slightly "misspelled" file names:

ftff lostfiles.txt

The number of allowed spelling errors depends on the file length of the file but can also be set with the -t option. To allow a maximum of 2 errors and use also a wild-card just type:

ftff -t2 "lostfiles*"

Ftff is a program of my own design and is part of a package called whichman-1.5 that can be downloaded from

linuxfocus.org/~guido/

Some times you would like to find all files in the directory tree that do not contain a certain string. For example all files except .o and .c. Here are a couple of possibilities to do it:

Getting an overview over the file system

Sometimes you would like to get an overview about the file system in front of you. E.g. you get a new CD and would like to see what's on it. You could just use ls -R. Personally I prefer for readability one of the following. tree ( sunsite.unc.edu/pub/Linux/utils/file/tree-1.2.tgz ) draws a tree diagram.

tree

or with long file names:
tree -fF

Then there is of course the good old find. The gnu version of find, which comes usually with Linux, has also the possibility to change the print format to print e.g the file size with the name:

find . -ls
find . -print

or with gnu find:
find
find . -printf "%7s %p\n"

There is also a small perl wrapper around the ls command that does similar things. It can be downloaded here: lsperl.gz. You can probably find much more file viewing tools but this is for most cases more than sufficient.

Finding files by content (searching for text strings in files)

The standard utilities for searching text strings in files are grep/egrep for regular expression searches and fgrep to search for literal strings. To search for an expression in all files in the current directory just type:

egrep -i "search expression" *

To search for strings in all files down the directory tree you can combine the find or other file name search commands with e.g egrep. This can be done in several ways:

egrep -i "expression" `find . -type f -print`
find . -type f -exec egrep -i "expression" /dev/null {} \;
find . -type f -print | xargs egrep -i "expression"

If you find this hard to remember then you can just use a small shell script downloadable form here: grepfind.gz. The script makes also sure that non-printable characters are removed in case you egrep accidentally through a binary file.

A very interesting search program is agrep. agrep works basically like egrep but does fault tolerant searches. This way you can also find slightly mis-spelled words. To search for an expression and allow for a maximum of 2 spelling errors you use:

agrep -i -2 "search exprission" *

The agrep program can be downloaded form sunsite sunsite.unc.edu/pub/Linux/utils/text/agrep-2.04.tar.Z or the original site ftp://ftp.cs.arizona.edu/agrep/

Then there is also glimpse. glimpse is a very powerful search utility. It uses a concept similar to locate. First a database needs to be build, but then searches are very fast. To build a search index for the contents of all files starting from the current directory:

glimpseindex .

After that you can search for a string in all files that were previously indexed

glimpse -i -2 "search exprission"

glimpse is also fault tolerant (as agrep) and the -2 allows for two errors. glimpse is available from http://glimpse.cs.arizona.edu/

There are many search utilities available for Unix and Linux especially. This article can be by no means complete. If you are interested in more tricks and utilities like this then have a look at the .lsm files under http://sunsite.unc.edu/pub/Linux/utils

Have fun, happy searching.

Translation information:

en --> -- : Guido Socher <guido(at)bearix.oche.de>

2002-10-20, generated by lfparser version 2.32