Book Home Programming PerlSearch this book

2.11. Input Operators

There are several input operators we'll discuss here because they parse as terms. Sometimes we call them pseudoliterals because they act like quoted strings in many ways. (Output operators like print parse as list operators and are discussed in Chapter 29, "Functions".)

2.11.1. Command Input (Backtick) Operator

First of all, we have the command input operator, also known as the backtick operator, because it looks like this:

$info = `finger $user`;
A string enclosed by backticks (grave accents, technically) first undergoes variable interpolation just like a double-quoted string. The result is then interpreted as a command line by the system, and the output of that command becomes the value of the pseudoliteral. (This is modeled after a similar operator in Unix shells.) In scalar context, a single string consisting of all the output is returned. In list context, a list of values is returned, one for each line of output. (You can set $/ to use a different line terminator.)

The command is executed each time the pseudoliteral is evaluated. The numeric status value of the command is saved in $? (see Chapter 28, "Special Names" for the interpretation of $?, also known as $CHILD_ERROR). Unlike the csh version of this command, no translation is done on the return data--newlines remain newlines. Unlike in any of the shells, single quotes in Perl do not hide variable names in the command from interpretation. To pass a $ through to the shell you need to hide it with a backslash. The $user in our finger example above is interpolated by Perl, not by the shell. (Because the command undergoes shell processing, see Chapter 23, "Security", for security concerns.)

The generalized form of backticks is qx// (for "quoted execution"), but the operator works exactly the same way as ordinary backticks. You just get to pick your quote characters. As with similar quoting pseudofunctions, if you happen to choose a single quote as your delimiter, the command string doesn't undergo double-quote interpolation;

$perl_info  = qx(ps $$);            # that's Perl's $$
$shell_info = qx'ps $$';            # that's the shell's $$

2.11.2. Line Input (Angle) Operator

The most heavily used input operator is the line input operator, also known as the angle operator or the readline function (since that's what it calls internally). Evaluating a filehandle in angle brackets (STDIN, for example) yields the next line from the associated filehandle. (The newline is included, so according to Perl's criteria for truth, a freshly input line is always true, up until end-of-file, at which point an undefined value is returned, which is conveniently false.) Ordinarily, you would assign the input value to a variable, but there is one situation where an automatic assignment happens. If and only if the line input operator is the only thing inside the conditional of a while loop, the value is automatically assigned to the special variable $_. The assigned value is then tested to see whether it is defined. (This may seem like an odd thing to you, but you'll use the construct frequently, so it's worth learning.) Anyway, the following lines are equivalent:

while (defined($_ = <STDIN>)) { print $_; }   # the longest way
while ($_ = <STDIN>) { print; }               # explicitly to $_
while (<STDIN>) { print; }                    # the short way
for (;<STDIN>;) { print; }                    # while loop in disguise
print $_ while defined($_ = <STDIN>);         # long statement modifier
print while $_ = <STDIN>;                     # explicitly to $_
print while <STDIN>;                          # short statement modifier
Remember that this special magic requires a while loop. If you use the input operator anywhere else, you must assign the result explicitly if you want to keep the value:
while (<FH1> && <FH2>) { ... }          # WRONG: discards both inputs
if (<STDIN>)      { print; }            # WRONG: prints old value of $_
if ($_ = <STDIN>) { print; }            # suboptimal: doesn't test defined
if (defined($_ = <STDIN>)) { print; }   # best
When you're implicitly assigning to $_ in a $_ loop, this is the global variable by that name, not one localized to the while loop. You can protect an existing value of $_ this way:
while (local $_ = <STDIN>) { print; }   # use local $_
Any previous value is restored when the loop is done. $_ is still a global variable, though, so functions called from inside that loop could still access it, intentionally or otherwise. You can avoid this, too, by declaring a lexical variable:
while (my $line = <STDIN>) { print $line; } # now private
(Both of these while loops still implicitly test for whether the result of the assignment is defined, because my and local don't change how assignment is seen by the parser.) The filehandles STDIN, STDOUT, and STDERR are predefined and pre-opened. Additional filehandles may be created with the open or sysopen functions. See those functions' documentation in Chapter 29, "Functions" for details on this.

In the while loops above, we were evaluating the line input operator in a scalar context, so the operator returns each line separately. However, if you use the operator in a list context, a list consisting of all remaining input lines is returned, one line per list element. It's easy to make a large data space this way, so use this feature with care:

$one_line = <MYFILE>;   # Get first line.
@all_lines = <MYFILE>;  # Get the rest of the lines.
There is no while magic associated with the list form of the input operator, because the condition of a while loop always provides a scalar context (as does any conditional).

Using the null filehandle within the angle operator is special; it emulates the command-line behavior of typical Unix filter programs such as sed and awk. When you read lines from <>, it magically gives you all the lines from all the files mentioned on the command line. If no files were mentioned, it gives you standard input instead, so your program is easy to insert into the middle of a pipeline of processes.

Here's how it works: the first time <> is evaluated, the @ARGV array is checked, and if it is null, $ARGV[0] is set to "-", which when opened gives you standard input. The @ARGV array is then processed as a list of filenames. More explicitly, the loop:

while (<>) {
    ...                     # code for each line
}
is equivalent to the following Perl-like pseudocode:
@ARGV = ('-') unless @ARGV;     # assume STDIN iff empty
while (@ARGV) {
    $ARGV = shift @ARGV;        # shorten @ARGV each time
    if (!open(ARGV, $ARGV)) {
        warn "Can't open $ARGV: $!\n";
        next;
    }
    while (<ARGV>) {
        ...                     # code for each line
    }
}
except that it isn't so cumbersome to say, and will actually work. It really does shift array @ARGV and put the current filename into the global variable $ARGV. It also uses the special filehandle ARGV internally--<> is just a synonym for the more explicitly written <ARGV>, which is a magical filehandle. (The pseudocode above doesn't work because it treats <ARGV> as nonmagical.)

You can modify @ARGV before the first <> as long as the array ends up containing the list of filenames you really want. Because Perl uses its normal open function here, a filename of "-" counts as standard input wherever it is encountered, and the more esoteric features of open are automatically available to you (such as opening a "file" named "gzip -dc < file.gz|"). Line numbers ($.) continue as if the input were one big happy file. (But see the example under eof in Chapter 29, "Functions" for how to reset line numbers on each file.)

If you want to set @ARGV to your own list of files, go right ahead:

# default to README file if no args given
@ARGV = ("README") unless @ARGV;
If you want to pass switches into your script, you can use one of the Getopt::* modules or put a loop on the front like this:
while (@ARGV and $ARGV[0] =~ /^-/) {
    $_ = shift;
    last if /^--$/;
    if (/^-D(.*)/) { $debug = $1 }
    if (/^-v/)     { $verbose++  }
    ...             # other switches
}
while (<>) {
    ...             # code for each line
}
The <> symbol will return false only once. If you call it again after this, it will assume you are processing another @ARGV list, and if you haven't set @ARGV, it will input from STDIN.

If the string inside the angle brackets is a scalar variable (for example, <$foo>), that variable contains an indirect filehandle, either the name of the filehandle to input from or a reference to such a filehandle. For example:

$fh = \*STDIN;
$line = <$fh>;
or:
open($fh, "<data.txt");
$line = <$fh>;

2.11.3. Filename Globbing Operator

You might wonder what happens to a line input operator if you put something fancier inside the angle brackets. What happens is that it mutates into a different operator. If the string inside the angle brackets is anything other than a filehandle name or a scalar variable (even if there are just extra spaces), it is interpreted as a filename pattern to be "globbed".[19] The pattern is matched against the files in the current directory (or the directory specified as part of the fileglob pattern), and the filenames so matched are returned by the operator. As with line input, names are returned one at a time in scalar context, or all at once in list context. The latter usage is more common; you often see things like:

@files = <*.xml>;
As with other kinds of pseudoliterals, one level of variable interpolation is done first, but you can't say <$foo> because that's an indirect filehandle as explained earlier. In older versions of Perl, programmers would insert braces to force interpretation as a fileglob: <${foo}>. These days, it's considered cleaner to call the internal function directly as glob($foo), which is probably the right way to have invented it in the first place. So instead you'd write
@files = glob("*.xml");
if you despise overloading the angle operator for this. Which you're allowed to do.

[19]Fileglobs have nothing to do with the previously mentioned typeglobs, other than that they both use the * character in a wildcard fashion. The * character has the nickname "glob" when used like this. With typeglobs, you're globbing symbols with the same name from the symbol table. With a fileglob, you're doing wildcard matching on the filenames in a directory, just as the various shells do.

Whether you use the glob function or the old angle-bracket form, the fileglob operator also does while magic like the line input operator, assigning the result to $_. (That was the rationale for overloading the angle operator in the first place.) For example, if you wanted to change the permissions on all your C code files, you might say:

while (glob "*.c") {
    chmod 0644, $_;
}
which is equivalent to:
while (<*.c>) {
    chmod 0644, $_;
}
The glob function was originally implemented as a shell command in older versions of Perl (and in even older versions of Unix), which meant it was comparatively expensive to execute and, worse still, wouldn't work exactly the same everywhere. Nowadays it's a built-in, so it's more reliable and a lot faster. See the description of the File::Glob module in Chapter 32, "Standard Modules" for how to alter the default behavior of this operator, such as whether to treat spaces in its operand (argument) as pathname separators, whether to expand tildes or braces, whether to be case insensitive, and whether to sort the return values--amongst other things.

Of course, the shortest and arguably the most readable way to do the chmod command above is to use the fileglob as a list operator:

chmod 0644, <*.c>;
A fileglob evaluates its (embedded) operand only when starting a new list. All values must be read before the operator will start over. In a list context, this isn't important because you automatically get them all anyway. In a scalar context, however, the operator returns the next value each time it is called, or a false value if you've just run out. Again, false is returned only once. So if you're expecting a single value from a fileglob, it is much better to say:
($file) = <blurch*>;  # list context
than to say:
$file = <blurch*>;    # scalar context
because the former returns all matched filenames and resets the operator, whereas the latter alternates between returning filenames and returning false.

If you're trying to do variable interpolation, it's definitely better to use the glob operator because the older notation can cause confusion with the indirect filehandle notation. This is where it becomes apparent that the borderline between terms and operators is a bit mushy:

@files = <$dir/*.[ch]>;         # Works, but avoid.
@files = glob("$dir/*.[ch]");   # Call glob as function.
@files = glob $some_pattern;    # Call glob as operator.

We left the parentheses off of the last example to illustrate that glob can be used either as a function (a term) or as a unary operator; that is, a prefix operator that takes a single argument. The glob operator is an example of a named unary operator, which is just one kind of operator we'll talk about in the next chapter. Later, we'll talk about pattern-matching operators, which also parse like terms but behave like operators.



Library Navigation Links

Copyright © 2001 O'Reilly & Associates. All rights reserved.