UNIX Power Tools

UNIX Power ToolsSearch this book
Previous: 25.7 Show Non-Printing Characters with cat -v or od -c Chapter 25
Showing What's in a File
Next: 25.9 Adding and Deleting White Space
 

25.8 Finding File Types

Many different kinds of files live on the typical UNIX system: database files, executable files, regular text files, files for fancy editors like Interleaf, tar files, mail messages, directories, font files, and so on.

You often want to check to make sure you have the right "kind" of file before doing something. For example, you'd like to read the file tar. But before typing more tar, you'd like to know whether this file is your set of notes on carbon-based sludge, or the tar executable. If you're wrong, the consequences might be unpleasant. Sending the tar executable to your screen might screw up your terminal settings (42.4), log you off, or do any number of hostile things.

The file utility tells you what sort of file something is. [2] It's fairly self-explanatory:

[2] Another solution to this problem is findtext (16.26).

% file /bin/sh
/bin/sh:       sparc demand paged executable
% file 2650
2650:          [nt]roff, tbl, or eqn input text
% file 0001,v
0001,v:        ascii text
% file foo.sh
foo.sh:          shell commands

file is actually quite clever [though it isn't always correct - some versions are better than others - JP ]. It doesn't just tell you if something's binary or text; it looks at the beginning of the file and tries to figure out what it's doing. So, for example, you see that file 2650 is an nroff (43.13) file and foo.sh is a shell script. It isn't quite clever enough to figure out that 0001,v is an RCS (20.14) archive, but it does know that it's a plain ASCII (51.3) text file.

System V and SunOS let you customize the file command so that it will recognize additional file types. The file /etc/magic tells file how to recognize different kinds of files. It's capable of a lot (and should be capable of even more), but we'll satisfy ourselves with an introductory explanation. Our goal will be to teach file to recognize RCS archives.

/etc/magic has four fields:

offset data-type value file-type

These are:

offset

The offset into the file at which magic will try to find something. If you're looking for something right at the beginning of the file, the offset should be 0. (This is usually what you want.)

data-type

The type of test to make. Use string for text comparisons, byte for byte comparisons, short for two-byte comparisons, and long for four-byte comparisons.

value

The value you want to find. For string comparisons, any text string will do; you can use the standard UNIX escape sequences (like \n for newline). For numeric comparisons (byte, short, long), this field should be a number, expressed as a C constant (e.g., 0x77 for the hexadecimal byte 77).

file-type

The string that file will print if this test succeeds.

So, we know that RCS archives begin with the word head. This word is right at the beginning of the file (offset 0). And we obviously want a string comparison. So we make the the following addition to /etc/magic:

0     string     head     RCS archive

This says, "The file is an RCS archive if you find the string head at an offset of 0 bytes from the beginning of the file." Does it work?

% file RCS/0002,v
RCS/0002,v:        RCS archive

As I said, the tests can be much more complicated, particularly if you're working with binary files. To recognize simple text files, this is all you need to know.

- ML


Previous: 25.7 Show Non-Printing Characters with cat -v or od -c UNIX Power ToolsNext: 25.9 Adding and Deleting White Space
25.7 Show Non-Printing Characters with cat -v or od -c Book Index25.9 Adding and Deleting White Space

The UNIX CD Bookshelf NavigationThe UNIX CD BookshelfUNIX Power ToolsUNIX in a NutshellLearning the vi Editorsed & awkLearning the Korn ShellLearning the UNIX Operating System