Accessing array elements

  • Accessing array elements in Perl is syntactically similar to C.
  • Perhaps somewhat counterintuitively, you use $x[<num>] to specify a scalar element of an array named @x.
  • The index <num> is evaluated as a numeric expression.
  • By default, the first index in an array is 0.

Examples of array access

$x[0] = 1;         # assign numeric constant
$x[1] = "string";  # assign string constant
print $m[$y];      # access via variable
$x[$c] = $b[$d];   # copy elements
$x[$i] = $b[$i];   #
$x[$i+$j] = 0;     # expressions are okay also
$x[$i]++;          # increment element
$y = $x[$i++];     # increment index (not the element);

Assign list literals

You can assign a list literal to an array or to a list of scalars:

($x, $y, $z) = (1, 2, 3);    # $x = 1, $y = 2, $z = 3
($m, $n) = ($n, $m)          # this swap actually works
@nums = (1..10);             # $nums[0]=1, $nums[1] = 2, ...
($x,$y,$z) = (1,2)           # $x=1, $y=2, $z is undef
@t = ();                     # t is defined with no elements
($x[1],$x[0])=($x[0],$x[1]);   # swap works!
@kudomono=('apple','orange');  # list with 2 elements
@kudomono=qw/ apple orange /;  # same list with 2 elements

Array-wide access

Sometimes you can operate over an entire array. Use the @array name:

@x = @y;          # copy y to x
@y = 1..1000;     # parentheses are not required
@lines = <STDIN>  # very useful!
print @lines;     # works in Perl 5

Print entire arrays

  • If an array is simply printed, it comes out something like
    @a = ('a','b','c','d');
    print @a;
    abcd
  • If an array is interpolated in a string, you get spaces:
    @a = ('a','b','c','d');
    print "@a";
    a b c d

Arrays in a scalar context

Generally, if you specify an array in a scalar context, the value returned is the number of elements in the array.

@array1 = ('a',3,'b',4,'c',5);  # assign array1 the values of list
@array2 = @array1;              # assign array2 the values in array1
$m = @array2;                   # $m now has value 6
$n = $m + @array1;              # $n now has the value 12

Size of arrays

Perl arrays can be of any size, and the number of elements can very during execution.

my @fruit;             # has no elements initially
$fruit[0] = "apple";   # now has one element
$fruit[1] = "orange";  # now has two elements
$fruit[99] = "plum";   # now has 100 elements, most of which are undef

Last element index

Perl has a special scalar form $#arrayname that returns a scalar value that is equal to the index of the last element in the array.

for($i = 0; $i <= $#arr1; $i++)
{
  print "$arr1[$i]\n";
}

Uses of last element index

You can also this special scalar form to truncate an array:

@arr = (0..99);    # arr has 100 elements
$#arr = 9;         # now it has 10;
print "@arr";
0 1 2 3 4 5 6 7 8 9

Using negative array indices

A negative array index is treated as being relative to the end of the array:

@arr = 0..99;
print $arr[-1];    # similar to using $arr[$#arr]
99
print $arr[-2];
98

Arrays as stacks

  • Arrays can be used as stacks, and Perl has built-ins that are useful for manipulating arrays as stacks: push, pop, shift, and unshift.
  • push takes two arguments: an array to onto, and what is to be pushed. If the new element is itself an array, then the elements of that array are appended to the original array as scalars
  • A push puts the new element(s) at the end of the original array.
  • A pop removes the last element from the array specified.

Examples of push and pop

push @nums, $i;
push @answers, "yes";
push @a, 1..5;
push @a, @answers;    # appends the elements of @answers to @a
pop @a;
push(@a,pop(@b));     # moves the last element of @b to end of @a
@a = (); @b = (); push(@b,pop(@a)); # @b now has one undef value

shift and unshift

  • shift removes the first element from an array
  • unshift inserts an element at the beginning of any array

Examples of shift
and unshift

@a = 0..9;
unshift @a, 99;        # now @a = (99,0,1,2,3,4,5,6,7,8,9)
unshift @a, ('a','b'); # now @a = ('a','b',99,0,1,2,3,4,5,6,7,8,9)
$x = shift @a;         # now @x = 'a';

foreach control structure

You can use foreach to process each element of an array or list. It follows the form:

for each $SCALAR (@ARRAY or LIST)
{
  <statement list>
}

(You can also use map for similar purposes.)

foreach examples

foreach $x (@x)
{
  print "$x\n";
}
map {print "$_\n";} @a;

foreach $item(qw/ apple pear lemon /)
{
  push @fruits, $item;
}
map {push @fruits, $_} qw/ apple pear lemon /;

The default variable $_

$_ is the default variable (and is used in the previous map() examples.) It is used as a default at various times, such as when reading input, writing output, and in the foreach and map constructions.

The default variable $_

while(<STDIN>)
{
  print;
}

$sum = 0;
foreach(@arr)
{
  $sum += $_;
}

map { $sum += $_ } @arr;

Input from the "diamond"
operator

Reading from <> causes a program to readin from the files specified on the command line or stdin if no files are specified.

Example with the diamond operator

#!/usr/bin/perl -w
use strict;
while(<>)
{
  print;
}

You can use this either with stdin or by naming files as arguments.

The @ARGV array

There is a built-in array called ARGV which contains the command line arguments passed in by the calling program.

Note that unlike C, $ARGV[0] is the first argument, not the name of the Perl program being invoked.

A simple "echo" program

#!/usr/bin/perl -w
# do the equivalent of a shell’s echo:
use strict;
my $a;
while($a = shift @ARGV)
{
  print "$a ";
}
print "\n";

Counting arguments

#!/usr/bin/perl -w
# count the number of arguments
use strict;
my $count = 0;
map { $count++ } @ARGV;
print "$count\n";

Looping modifiers

Perl has three interesting operators to affect looping: next, last, and redo.

  • next → start the next iteration of a loop immediately
  • last → terminate the loop immediately
  • redo → restart this iteration (very rare in practice)

The next operator

The next operator starts the next iteration of a loop immediately, much as continue does in C.

The next operator

#!/usr/bin/perl -w
# sum the positive elements of an array to demonstrate next
use strict;
my $sum = 0;
my @arr1 = -10..10;
foreach(@arr1)
{
  if($_ < 0)
  {
    next;
  }
  $sum += $_;
}
print $sum;

The last operator

#!/usr/bin/perl -w
# read up to 100 items, print their sum
use strict;
my $sum = 0;
my $count = 0;
while()
{
  $sum += $_;
  $count++;
  if($count == 100)
  {
    last;
  }
}
print "\$count == $count, \$sum == $sum \n";

The redo operator

The rarely used redo operator goes back to the beginning a loop block, but it does not do any retest of boolean conditions, it does not execute any increment-type code, and it does not change any positions within arrays or lists.

The redo operator

#!/usr/bin/perl -w
# demonstrate the redo operator
use strict;
my @strings = qw/ apple plum pear peach strawberry /;
my $answer;
foreach(@strings)
{
  print "Do you wish to print '$_'? ";
  chomp($answer = uc(<>));
  if($answer eq "YES")
  {
    print "PRINTING $_ ...\n";
    next;
  }
  if($answer ne "NO")
  {
    print "I don't understand your answer '$answer'!
Please use either YES or NO!\n";
    redo;
  }
}

The reverse function

If used to return a list, then it reverses the input list.

If used to return a scalar, then it first concatenates the elements of the input list and then reverses all of the characters in that string. Also, you can reverse a hash, by which the returned hash has the keys and values swapped from the original hash. (Duplicate value → key in the original hash are chosen randomly for the new key → value.)

Examples of reverse

#!/usr/bin/perl -w
# demonstrate the reverse function
use strict;
my @strings = qw/ apple plum pear peach strawberry /;
print "\@strings = @strings\n";
my @reverse_list = reverse(@strings);
my $reverse_string = reverse(@strings);
print "\@reverse_list = @reverse_list\n";
print "\$reverse_string = $reverse_string\n";

Example of reversing a hash

#!/usr/bin/perl -w
# demonstrate the reverse operator
use strict;
my %strings = ( 'a-key' , 'a-value', 
                'b-key', 'b-value', 
                'c-key', 'c-value' );
print "\%strings = ";
map {print " ( \$key = $_ , \$value = $strings{$_} ) "} 
    (sort keys %strings);
print " \n";
my %reverse_hash = reverse(%strings);
print "\%reverse_hash = ";
map {print " ( \$key = $_ , \$value = $reverse_hash{$_} ) "} 
    (sort keys %reverse_hash);
print " \n ";

Example of reverse on hash
with duplicate values

#!/usr/bin/perl -w
# demonstrate the reverse operator for hash with duplicate values
use strict;
my %strings = ( 'a-key' , 'x-value',
                'b-key',  'x-value', 
                'c-key',  'x-value' );
print "\%strings = ";
map {print " ( \$key = $_ , \$value = $strings{$_} ) "} 
    (sort keys %strings);
print " \n";
my %reverse_hash = reverse(%strings);
print "\%reverse_hash = ";
map {print " ( \$key = $_ , \$value = $reverse_hash{$_} ) "} 
    (sort keys %reverse_hash);
print " \n ";

The reverse function
on a list and returning
a scalar

#!/usr/bin/perl -w
# demonstrate the reverse operator
use strict;
my $test = reverse(qw/ 10 11 12 /);
print "\$test = $test\n";

The sort function

The sort function is only defined to work on lists, and will only return sensible items in a list context. By default, sort sorts lexically.

Lexical sort example

# Example of lexical sorting
@list = 1..100;
@list = sort @list;
print "@list ";
1 10 100 11 12 13 14 15 16 17 18 19 2 20 21 22
23 24 25 26 27 28 29 3 30 31 32 33 34 35 36 37
38 39 4 40 41 42 43 44 45 46 47 48 49 5 50 51
52 53 54 55 56 57 58 59 6 60 61 62 63 64 65 66
67 68 69 7 70 71 72 73 74 75 76 77 78 79 8 80
81 82 83 84 85 86 87 88 89 9 90 91 92 93 94 95
96 97 98 99

More on sort

You can define an arbitrary sort function. Our earlier mention of the <=> operator comes in handy now:

# Example of numerical sorting
@list = 1..100;
@list = sort { $a <=> $b } @list;
print "@list ";
@list = 1..100;
@list = sort { $a <=> $b } @list;
print "@list";
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93
94 95 96 97 98 99 100

Two very special variables

The $a and $b in the function block are actually package global variables, and should not be declared by you as my variables.

More on sorting

You can also use the cmp operator quite effectively in these type of anonymous sort functions:
@words = qw/ apples Pears bananas Strawberries cantaloupe grapes Blueberries;
@words_alpha = sort @words;
@words_noncase = sort { uc($a) cmp uc($b) } @words;
print "\@words_alpha = @words_alpha\n";
print "\@words_noncase = @words_noncase\n";
# yields:
@words_alpha = Blueberries Pears Strawberries apples bananas cantaloupe grapes;
@words_noncase = apples bananas Blueberries cantaloupe grapes Pears Strawberries;

Hashes (aka "associative arrays")

We have already used a few examples of hashes. Let's go over exactly what is happening with them:

  • A hash is similar to an array in that it has an index and in that it may take an arbitrary number of elements.
  • An index for a hash is a string, not a number as in an array.
  • Hashes are also known as "associative arrays."

  • The elements of a hash have no particular order.
  • A hash contains key-value pairs; the keys will be unique, and the values are not necessarily so.
  • Hashes are identified by the % character.
  • The name space for hashes is separate from that of scalar variables and arrays.
  • One uses the syntax $hash{$key} to access the value associated with key $key in hash %hash.
  • Perl expects to see a string as the key, and will silently convert scalars to a string, and will convert arrays silently.

Examples

$names[12101] = 'James';
$names[12101] = 'Bob';
$name = $names[12101];
$name = $names[11111];
# overwrites value 'James'
# retrieve value 'Bob';
# undefined value returns undef
%hash = ('1', '1-value', 'a', 'a-value', 'b', 'b-value');
@array = ('a');
print $hash{@array};
# yields
1-value

Examples

%names = (1, 'Bob', 2, 'James');
foreach(sort(keys(%names)))
{
  print "$_ --> $names{$_}\n";
}
# yields
1 --> Bob
2 --> James
map { print "$_ --> $names{$_}\n"; } sort(keys(%names));
# yields
1 --> Bob
2 --> James

Referring to hashes as a whole

As might have been gleaned from before, you can use the % character to refer a hash as a whole:

%new_hash = %old_hash;
%fruit_colors = ( 'apple' , 'red' , 'banana' , 'yellow' );
%fruit_colors = ( 'apple' => ’red’ , 'banana' => 'yellow' );
print "%fruit_colors\n";
# only prints '%fruit_colors', not keys
@fruit_colors = %fruit_colors;
print "@fruit_colors\n";
# now you get output...
# yields
banana yellow apple red

The keys and values
functions

You can extract just the hash keys into an array with the keys function. You can extract just the hash values into an array with the values function.

%fruit_colors = ( 'apple' => 'red' , 'banana' => 'yellow' );
@keys = keys(%fruit_colors);
@values = values(%fruit_colors);
print "\@keys = '@keys' , \@values = '@values'\n";
# yields
@keys = 'banana apple' , @values = 'yellow red'

The each

Perl has a "stateful" function each that allows you to iterate through the keys or the key-value pairs of a hash.

%fruit_colors = ( 'apple' => 'red' , 'banana' => 'yellow' );
while( ($key, $value) = each(%fruit_colors) )
{
  print "$key --> $value\n";
}

Resetting the each iterator

Note: if you need to reset the iterator referred to by each, you can just make a call to either keys(%fruit_colors) or values(%fruit_colors) – so don’t do that accidentally!

%fruit_colors = ( 'apple' => 'red' , 'banana' => 'yellow' );
while( ($key, $value) = each(%fruit_colors) )
{
  print "$key --> $value\n";
  # ...
  @k = keys(%fruit_colors);
  # resets iterator!!!
}
# yields loop!
banana --> yellow
banana --> yellow
banana --> yellow
banana --> yellow
banana --> yellow
...

The exists function

You can check if a key exists in hash with the exists function:

if(exists($hash{'SOMEVALUE'})
{
}

The delete function

You can remove a key-value pair from a hash with delete:

delete($hash{'SOMEVALUE'});

printf

printf in Perl is very similar to that of C.

printf is most useful when when printing scalars. Its first (non-filehandle) argument is the format string, and any other arguments are treated as a list of scalars:

printf "%s %s %s %s", ("abc", "def") , ("ghi", "jkl");
# yields
abc def ghi jkl

printf

Some of the common format attributes are

  • %[-][N]s → format a string scalar, N indicates maximum characters expected for justification, - indicates to left-justify rather than default right-justify.
  • %[-|0][N]d → format a numerical scalar as integer, N indicates maximum expected for justification, "-" indicates to left-justify, "0" indicates zero-fill (using both "-" and "0" results in left-justify, no zero-fill.)

printf

  • %[-|0]N.Mf → format a numerical scalar as floating point. "N" gives the total length of the output, and "M" give places after the decimal. After the decimal is usually zero-filled out (you can toggle this off by putting "0" before "M".) "0" before N will zero-fill the left-hand side; "-" will left-justify the expression.

Examples of printf

printf "%7d\n", 123;
# yields
123
printf "%10s %-10s\n","abc","def";
# yields
abc def

Examples of printf

printf "%10.5f %010.5f %-10.5f\n",12.1,12.1,12.1;
# yields
12.10000 0012.10000 12.10000
$a = 10;
printf "%0${a}d\n", $a;
# yields
0000000010

Perl regular expressions

Much information can be found at man perlre.

Perl builds support for regular expressions as a part of the language like awk but to a greater degree. Most languages instead simply give access to a library of regular expressions (C, PHP, Javascript, and C++, for instance, all go this route.)

Perl regular expressions

Perl regular expressions can be used in conditionals, where if you find a match then it evaluates to true, and if no match, false.

$_ = "howdy and hello are common";
if(/hello/)
{
print "Hello was found!\n";
}
else
{
print "Hello was NOT found\n";
}
# yields
Hello was found!

What do Perl regexes
consist of?

  • Literal characters are matched directly
  • "." (period, full stop) matches any one character (except newline unless coerced to do so)
  • "*" (asterisk) matches the preceding item zero or more times
  • "+" (plus) matches the preceding item one or more times

What do Perl regexes
consist of?

  • "?" (question mark) matches the preceding item zero or one time
  • "(" and ")" (parentheses) are used for grouping
  • "" (pipe) expresses alternation
  • "[" and "]" (square brackets) express a range, match one character in that range

Examples of regexes

/abc/           # Matches "abc"
/a.c/           # Matches "a" followed by any character (except newline) and then a "c"
/ab?c/          # Matches "ac" or "abc"
/ab*c/          # Matches "a" followed by zero or more "b" and then a "c"
/ab|cd/         # Matches "abd" or "acd"
/a(b|c)+d       # Matches "a" followed by one or more "b" or "c", and then a "d"
/a[bcd]e/       # Matches "abe", "ace", or "ade"
/a[a-zA-Z0-9]c/ # Matches "a" followed one alphanumeric followed by "c"
/a[^a-zA-Z]/    # Matches "a" followed by anything other than alphabetic character

Character classes

You can use the following as shortcuts to represent character classes:

  • \d A digit (i.e., 0-9)
  • \w A word character (i.e., [0-9a-zA-Z_])
  • \s A whitespace character (i.e., [\f\t\n ])
  • \D Not a digit (i.e., [^0-9])
  • \W Not a word (i.e., [^0-9a-zA-Z_])
  • \S Not whitespace

Exact repetitions

You can specify numbers of repetitions using a curly bracket syntax:

a{1,3}   # "a", "aa", or "aaa"
a{2}     # "aa"
a{2,}    # two or more "a"

Anchors

Perl regular expression syntax lets you work with context by defining a number of "anchors": \A, ^, \Z, $, \b.

/\ba/    # Matches if "a" appears at the beginning of a word
/a$/     # Matches if "a" appears at the end of a line
/\Aa$\Z/ # Matches if a line is exactly "a" (uncommon)
/^a$/    # Matches if a line is exactly "a" (much more common)

\b refers to a word boundary.

Substring matching

Parentheses are also used to remember substring matches.

Backreferences can be used within the pattern to refer to already matched bits.

Memory variables can be used after the pattern has been matched against.

Substring matching

A backreference looks like \1, \2, etc.

It refers to an already matched memory reference.

Count the left parentheses to determine the back reference number.

Backreference examples

/(a|b)\1/          # match "aa" or "bb"
/((a|b)c)\1/       # match "acac" or "bcbc"
/((a|b)c)\2/       # match "aba" or "bcb"
/(.)\1/            # match any doubled characters except newline
/\b(\w+)\s+\b\1\s/ # match any doubled words
/(['"])(.*)\1/     # match strings enclosed by single or double quotes

Perl matching is greedy

For example, consider the last backreference example:

$_ = "asfasdf 'asdlfkjasdf ' werklwerj'";
if(/(['"])(.*)\1/)
{
  print "matches $2\n";
}
# yields
matches asdlfkjasdf ' werklwerj

Memory variables

A memory variable has the form $1, $2, etc.

It indicates a match from a grouping operator, just as back reference does, but after the regular expression has been executed.

$_ = " the larder ";
if(/\s+(\w+)\s+/)
{
  print "match = '$1'\n";
}
# yields
match = 'the'

Regex binding operators

Up to this point, we have considered only operations against $_. Any scalar can be tested against with the =~ and !~ operators.

"STRING" =~ /PATTERN/;
"STRING" !~ /PATTERN/;

Examples

$line = "not an exit line";
if($line !~ /^exit$/)
{
print "$line\n";
}
# yields
not an exit line
# skip over blank lines...
if($line =~ /$^/)
{
next;
}

Automatic match variables

You don't have to necessarily use explicit backreferences and memory variables. Perl also gives you three default variables that can be used after the application of any regular expression; they refer to the portion of the string matched by the whole regular expression.

$`   # refers to the portion of the string before the match
$&   # refers to the match itself
$'   # refers to the portion of the string after the match

Example of automatic match variables

$_ = "this is a test";
/is/;
print "before:  $`  \n";
print "after:   $'  \n";
print "match:   $&  \n";
# yields
before:   th 
after:     a test 
match:    is 

Example of automatic match variables

#!/usr/bin/perl -w
use strict;
while(<>)
{
  /=/;
  print "$` =: $'\n";
}

You can use other delimiters (some are paired items) rather than just a slash, but you must use the "m" to indicate this. (See man perlop for a good discussion.)

# not so readable way to look for a URL reference
if ($s =~ /http:\/\//)
# better
if ($s =~ m^http://^ )

There are a number of modifiers that you can apply to your regular expression pattern:

  • i → case insensitive
  • s → treat string as a single line
  • g → find all occurrences