Regular expressions
and case insensitivity

As previously mentioned, you can make matching case insensitive with the i flag:

/\b[Uu][Nn][Ii][Xx]\b/;    # explicitly giving case folding
/\bunix\b/i;               # using ''i'' flag to fold code

Really matching with "."

As mentioned before, usually the "." (dot, period, full stop) matches any character except newline. You make it match newline with the s flag:

/"(.|\n)*"/;    # match any quoted string, even with newl
/"(.*)"/s;      # same meaning, using ''s'' flag

N.B. - I like to use the flags ///six; as a personal default set of flags with Perl regular expressions.

Going global with g

You can make your matching global with the g flag. For ordinary matches, this means making them stateful: Perl will remember where you left off with each reinvocation of the match unless you change the value of the variable, which will reset the match.

Demonstraing "g" in action

#!/usr/bin/perl -w
# shows the //g as stateful...
while(<>)
{
  while(/[A-Z]{2,}/g)
  {
    print "$&\n" if (defined($&));
  }
}

Interpolating variables
in patterns

You can even specify a variable inside of a pattern - but you want to make sure that it gives a legitimate regular expression.

Interpolating variables
in patterns

my $var1 = "[A-Z]*";
if( "AB" =~ /$var1/ )
{
  print "$&";
}
else
{
  print "nopers";
}
# yields
AB

Regular expressions
and substitution

  • The s/.../.../ form can be used to make substitutions in the specified string.
  • If paired delimiters are used, then you have to use two pairs of the delimiters.

Regular expressions
and substitution

  • "g" after the last delimiter indicates to replace more than just the first occurrence.
  • The substitution can be bound to a string. Otherwise it makes the substitutions in $_.
  • The operation returns the number of replacements performed, which can be more than one with the 'g' option.

Example

#!/usr/bin/perl -w
# shows s///g... by removing acronyms
use strict;
while(<>)
{
  s/([A-Z]{2,})//g;
  print;
}

Examples

s/\bfigure (\d+)/Figure $1/  # capitalize references to figures
s{//(.*)}{/\*$1\*/}          # use old style C comments
s!\bif(!if (!                # put a blank
s(!)(.)                      # tone down that message
s[!][.]g                     # replace all occurrences of '!' with '.'

Case shifting

You can use \U and \L to change follows them to upper and lower case:

$text = " the acm and the ieee are the best! ";
$text =~ s/acm|ieee/\U$&/g;
print "$text\n";
# yields
the ACM and the IEEE are the best!

Case shifting

$text = "CDA 1001 and COP 3101
are good classes, but CIS 4385 is better!";
$text =~ s/\b(COP|CDA|CIS) \d+/\L$&/g;
print "$text\n";
# yields
cda 1001 and cop 3101
are good classes, but CIS 4385 is better!

Using tr/// (also known as y///)

  • In Perl you can also convert one set of characters to another using the tr/.../.../ form. (Or if you like, you can use y///.)
  • Much like the program tr, you specify two lists of characters, the first to be substituted, and the second what to substitute.
  • tr returns the number of items substituted (or deleted.)
  • The modifer d deletes characters not replaced.
  • The modifer s "squashes" any repeated characters.

Examples (from the perlop man page)

$ARGV[1] =~ tr/A-Z/a-z/;   # canonicalize to lower case
$cnt = tr/*/*/;            # count the stars in $_
$cnt = $sky =~ tr/*/*/;    # count the stars in $sky
$cnt = tr/0-9//;           # count the digits in $_

More examples

# get rid of redundant blanks in $_
tr/ //s;
# replace [ and { with ( in $text
$text =~ tr/[{/(/;

Using split

The split function breaks up a string according to a specified separator pattern and generates a list of the substrings.

Substrings

For example:

$line = " This sentence contains five words. ";
@fields = split / /, $line;
$count = 0;
map { print "$count --> $fields[$count]\n"; $count++; } @fields;
# yields
-->
1 --> This
2 --> sentence
3 --> contains
4 --> five
5 --> words.

Using the join function

The join function does the reverse of the split function: it takes a list and converts to a string.
However, it is different in that it doesn't take a pattern as its first argument, it just takes a string:

@fields = qw/ apples pears cantaloupes cherries /;
$line = join "<-->", @fields;
print "$line\n";
# yields
apples<-->pears<-->cantaloupes<-->cherries

Filehandles

[Also see man perlfaq5 for more detail on this subject.]

  • A filehandle is an I/O connection between your process and some device or file. Perl output is buffered.
  • Perl has three predefined filehandles: STDIN, STDOUT, and STDERR.

Filehandles

Unlike other variables, you don't declare filehandles. The convention is to use all uppercase letters for filehandle names. (Especially important if you deal with anonymous filehandles!) The open operator takes two arguments, a filehandle name and a connection (e.g. filename).

Closing filehandles

The close operator closes a filehandle. This causes any remaining output data associated with this filehandle to be flushed to the file. Perl automatically closes filehandles at the end of a process, or if you reopen it.

Examples

close IN;  # closes the IN filehandle
close OUT; # closes the OUT filehandle
close LOG; # closes the LOG filehandle

Testing open

You can check the status of opening a file by examining the result of the open operation. It returns a true value if it succeeded, and a false one if it failed.

Reopening a filehandle

You can reopen a standard filename. This allows you to perform input or output in a normal fashion, but to redirect the I/O from/to a file within the Perl program.

File testing

Like BASH, file tests exist in Perl (source: man perlfunc):

-r File is readable by effective uid/gid.
-w File is writable by effective uid/gid.
-x File is executable by effective uid/gid.
-o File is owned by effective uid.
-R File is readable by real uid/gid.
-W File is writable by real uid/gid.
-X File is executable by real uid/gid.
-O File is owned by real uid.

File testing

-e File exists.
-z File has zero size (is empty).
-s File has nonzero size (returns size in bytes).
-f File is a plain file.
-d File is a directory.
-l File is a symbolic link.
-p File is a (named) pipe (FIFO)

File testing

-S File is a socket.
-b File is a block special file.
-c File is a character special file.
-t Filehandle is opened to a tty.
-u File has setuid bit set.
-g File has setgid bit set.
-k File has sticky bit set.

File testing

-T File is an ASCII text file (heuristic guess).
-B File is a "binary" file (opposite of -T).
-M Script start time minus file modification time, in days.
-A Same for access time.
-C Same for inode change time (Unix, may differ for other platforms).

Using file status

You can use file status like this, for instance, as pre-test:

while (<>) {
chomp;
next unless -f $_; # ignore non-files
#...
}

Using file status

Or you can use them as a post-test:

if(! open(FH, $fn))
{
  if(! -e "$fn")
  {
    die "File $fn doesn't exist.";
  }
  if(! -r "$fn")
  {
    die "File $fn isn't readable.";
  }
  if(-d "$fn")
  {
    die "$fn is a directory, not a regular file.";
  }
  die "$fn could not be opened.";
}

Subroutines in Perl

You can declare subroutines in Perl with sub, and call them with the "&" syntax:

my @list = qw( /etc/hosts /etc/resolv.conf /etc/init.d );
map ( &filecheck , @list) ;
sub filecheck
{
  if(-f "$_")
  {
    print "$_ is a regular file\n";
  }
  else
  {
    print "$_ is not a regular file\n";
  }
}

Subroutine arguments

To send arguments to a subroutine, just use a list after the subroutine invocation, just as you do with built-in functions in Perl. Arguments are received in the @_ array:
#!/usr/bin/perl -w
# shows subroutine argument lists
use strict;
my $val = max(10,20,30,40,11,99);
print "max = $val\n";
sub max
{
  print "Using $_[0] as first value...\n";
  my $memory = shift(@_);
  foreach(@_)
  {
    if($_ > $memory)
    {
      $memory = $_;
    }
  }
  return $memory;
}

Using my variables
in subroutines

You can locally define variables for a subroutine with my:

sub func
{
my $ct = @_;
...;
}

The variable $ct is defined only within the subroutine func.

sort() and map()

The built-ins functions sort() and map() can accept a subroutine rather than just an anonymous block:

@list = qw/ 1 100 11 10 /;
@default = sort(@list);
@mysort = sort {&mysort} @list;
print "default sort: @default\n";
print "mysort: @mysort\n";
sub mysort
{
return $a <=> $b;
}
# yields
default sort: 1 10 100 11
mysort: 1 10 11 100

As you can see, sort() sends along two special, predefined variables, $a and $b.

cmp and friends

As discussed earlier, <=> returns a result of -1,0,1 if the left hand value is respectively numerically less than, equal to, or greater than the right hand value.

cmp returns the same, but uses lexical rather numerical ordering.

grep

A very similar operator is grep, which only returns a list of the items that matched an expression (sort and map should always return a list exactly as long as the input list.)

For example:

@out = grep {$_ % 2} qw/1 2 3 4 5 6 7 8 9 10/;
print "@out\n";
# yields
1 3 5 7 9

Notice that the block item should return 0 for non-matching items.

Directory operations

chdir $DIRNAME;
# change directory to $DIRNAME
glob $PATTERN;
# return a list of matching patterns
# example:
@list = glob "*.pl";
print "@list \n";
Script16.pl Script18.pl Script19.pl Script20.pl Script21.pl [...]

Manipulating files and directories

unlink $FN1, $FN2, ...; # remove a hard or soft link to files
rename $FN1, $FN2;      # rename $FN1 to new name $FN2
mkdir $DN1;             # create directory with umask default permi
rmdir $DN1, $DN2, ...;  # remove directories
chmod perms, $FDN1;     # change permissions

Traversing a directory with opendir and readdir

You can pull in the contents of a directory with opendir and readdir:

opendir(DH,"/tmp");
@filenams = readdir(DH);
closedir(DH);
print "@filenams\n";
# yields
.s.PGSQL.5432.lock .. mapping-root ssh-WCWcZf4199 xses-langley.joHONt

Calling other processes

  • In Perl, you have four convenient ways to call (sub)processes: the backtick function, the system() function, fork()/exec(), and open().
  • The backtick function is the most convenient one for handling most output from subprocesses. For example
    @lines = 'head -10 /etc/hosts';
    print "@lines\n";
    

Calling other processes

  • You can do this type of output very similarly with open, but open also allows you do conveniently send input to subprocesses.
  • exec() lets you change the present process to another executable; generally, this is done with a fork() to create a new child subprocess first.
  • The system() subroutine is a short-cut way of writing fork/exec. Handling input and output, just as with fork/exec is not particularly convenient.