Arrays
Arrays are ordered collections of things. Order is important, because later we'll talk about hashes and bags that are unordered. Arrays have lots of handy functions. Let's explore.
Length, Order
How long is that array? Here's a list of the main dogs in my life from when I was a kid to now:
> my @dogs = <Chaps Patton Bowzer Logan Lulu Patch>
[Chaps Patton Bowzer Logan Lulu Patch]
> @dogs.elems
6
> +@dogs
6
> @dogs.Int
6
> @dogs.Numeric
6
The elems
methods returns the number of elements and is the most obvious way to find it. Putting a +
plus sign in front coerces the array into a numerical context which returns the length of the array as does calling the Int
and Numeric
methods.
If I want my dogs going from current to first, I can reverse
them:
> @dogs.reverse
[Patch Lulu Logan Bowzer Patton Chaps]
I can sort them either by their name:
> @dogs.sort
(Bowzer Chaps Logan Lulu Patch Patton)
Or, by passing a "unary" operator (one that takes a single argument), by the results of that operation such as the number of characters in the name (i.e., the length of the string):
> @dogs.sort(*.chars)
(Lulu Chaps Logan Patch Patton Bowzer)
> @dogs.sort(*.chars).reverse
(Bowzer Patton Patch Logan Chaps Lulu)
I can find all possible pairs of dogs:
> @dogs.combinations(2)
((Chaps Patton) (Chaps Bowzer) (Chaps Logan) (Chaps Lulu) (Chaps Patch) (Patton Bowzer) (Patton Logan) (Patton Lulu) (Patton Patch) (Bowzer Logan) (Bowzer Lulu) (Bowzer Patch) (Logan Lulu) (Logan Patch) (Lulu Patch))
I can find just the length of each dog's name by using map
to apply a the chars
function to each element:
> @dogs.map(*.chars)
(5 6 6 5 4 5)
> @dogs.map(&chars)
(5 6 6 5 4 5)
The first version is using the chars
method called on each list object while the second version is applying the chars
function to each element (https://docs.perl6.org/routine/chars). The leading ampersand &
is passing the chars
function as a reference. Note that you cannot call it like so:
> @dogs.map(chars)
===SORRY!=== Error while compiling:
Calling chars() will never work with proto signature ($)
------> @dogs.map(⏏chars)
The map
function is something I'd really like you to understand, so let's break it down a bit. I can get the same answer (sort of) with a for
loop:
> for @dogs -> $dog { say $dog.chars }
5
6
6
5
4
5
And I can capture the numbers by using do
:
> my @chars = do for @dogs -> $dog { $dog.chars }
[5 6 6 5 4 5]
Or, more briefly:
> my @chars = do for @dogs { .chars }
[5 6 6 5 4 5]
So the map
function just turns that around a bit:
> my @chars = map { .chars }, @dogs
[5 6 6 5 4 5]
And the map
method (of an Array) turns that around again to look more like the for
version:
> my @chars = @dogs.map({ .chars })
[5 6 6 5 4 5]
> my @chars = @dogs.map(*.chars)
[5 6 6 5 4 5]
> my @chars = @dogs.map: *.chars
[5 6 6 5 4 5]
You see there is more than one way to write a map. We'll break this down later.
The elements in a Array are not limited to scalars. Using map
, I create a Array of lists that each combines a dog's name with its length using the Z
"zip" operator (https://docs.perl6.org/routine/Z):
> @dogs Z @dogs.map(*.chars)
((Chaps 5) (Patton 6) (Bowzer 6) (Logan 5) (Lulu 4) (Patch 5))
Does that makes sense? Zip takes two lists and combines them element-by-element, stopping on the shorter list:
> 1..10 Z 'a'..'z'
((1 a) (2 b) (3 c) (4 d) (5 e) (6 f) (7 g) (8 h) (9 i) (10 j))
> 1..* Z 'Bowzer'.comb
((1 B) (2 o) (3 w) (4 z) (5 e) (6 r))
Lastly, I'll show you how sum
(https://docs.perl6.org/routine/sum) total number of characters in all the dog names:
> @dogs.map(*.chars).sum
31
Iterating
One of the most common array operations is to iterate over the members while keeping track of the position. Here's a script that breaks a string (here maybe some DNA) into a list using the comb
method and prints the position and the letter:
$ cat -n iterate1.pl6
1 #!/usr/bin/env perl6
2
3 sub MAIN (Str $dna) {
4 my $i = 0;
5 for $dna.comb -> $letter {
6 $i++;
7 say "$i: $letter";
8 }
9 }
$ ./iterate1.pl6 AACTAG
1: A
2: A
3: C
4: T
5: A
6: G
This is so common that Arrays have shorter ways to do this:
$ cat -n iterate2.pl6
1 #!/usr/bin/env perl6
2
3 sub MAIN (Str $dna) {
4 for $dna.comb.kv -> $k, $v {
5 say "{$k+1}: $v";
6 }
7 }
$ ./iterate2.pl6 AACTAG
1: A
2: A
3: C
4: T
5: A
6: G
Positions in Perl Arrays and Strings start at 0, so I have to add 1 to $k
. Notice that I can run code inside a string by putting {}
curly braces around it.
I don't have to give the pointy block signature -> $k, $v
bit. I can use $^k
and $^v
(or $^a
and $^b
or whatever) to refer to the first and second arguments (in sorted Unicode order) to the block (https://docs.perl6.org/language/variables#index-entry-%24%5E):
$ cat -n iterate3.pl6
1 #!/usr/bin/env perl6
2
3 sub MAIN (Str $dna) {
4 for $dna.comb.kv { say join ": ", $^k + 1, $^v }
5 }
$ ./iterate3.pl6 AACTAG
1: A
2: A
3: C
4: T
5: A
6: G
Here's a version using pairs
to get a List of Pair types (https://docs.perl6.org/type/Pair) with the index (position) as the "key" and the letter as the "value":
$ cat -n iterate4.pl6
1 #!/usr/bin/env perl6
2
3 sub MAIN (Str $dna) {
4 for $dna.comb.pairs -> $pair {
5 printf "%s: %s\n", $pair.key + 1, $pair.value;
6 }
7 }
$ ./iterate3.pl6 AACTAG
1: A
2: A
3: C
4: T
5: A
6: G
Again, I don't have to have the -> $pair
bit if I use the ^
twigil. I can just refer to the one positional argument as $^pair
and call the key
and value
methods on that:
$ cat -n iterate5.pl6
1 #!/usr/bin/env perl6
2
3 sub MAIN (Str $dna) {
4 for $dna.comb.pairs { say join ': ', $^pair.key + 1, $^pair.value }
5 }
[saguaro@~/work/metagenomics-book/perl6/lists]$ ./iterate5.pl6 AACTAG
1: A
2: A
3: C
4: T
5: A
6: G
Or I can use the :
twigil (https://docs.perl6.org/language/variables#The_:_Twigil) a way to declare named parameters:
$ cat -n iterate6.pl6
1 #!/usr/bin/env perl6
2
3 sub MAIN (Str $dna) {
4 for $dna.comb.pairs -> (:$key, :$value) {
5 say join ': ', $key + 1, $value;
6 }
7 }
$ ./iterate6.pl6 AACTAG
1: A
2: A
3: C
4: T
5: A
6: G
Filtering
Often you want to choose or remove certain members of an array. Let's find only the Gs and Cs in a string (https://en.wikipedia.org/wiki/GC-content). Note that I uppercase (uc
) the $dna
first so that I only have to check for one case of letters:
$ cat -n gc1.pl6
1 #!/usr/bin/env perl6
2
3 sub MAIN (Str $dna) {
4 my @gc;
5 for $dna.uc.comb -> $base {
6 @gc.push($base) if $base eq 'G' || $base eq 'C';
7 }
8 say "$dna has {@gc.elems}";
9 }
$ ./gc1.pl6 AACTAG
AACTAG has 2
But grep
is a much shorter way to find all the elements matching a given condition. Like map
, grep
takes a block of code that will be executed for each member of the array. Any elements for which the block evaluates to "True-ish" are allowed through. The $_
(topic, thing, "it") variable has the current element, so the code is asking "if the thing is a 'G' or if the thing is a 'C'". One can use the *
to represent "it" and eschew the curly brackets:
> grep {$_ > 5}, 1..10
(6 7 8 9 10)
> grep * > 5, 1..10
(6 7 8 9 10)
Here's the GC filter written with grep
:
$ cat -n gc2.pl6
1 #!/usr/bin/env perl6
2
3 sub MAIN (Str $dna) {
4 my @gc = $dna.uc.comb.grep({$_ eq 'G' || $_ eq 'C'});
5 say "$dna has {@gc.elems}";
6 }
Here I'll use a Junction (https://docs.perl6.org/type/Junction) to compare to "G or C" in one go:
$ cat -n gc3.pl6
1 #!/usr/bin/env perl6
2
3 sub MAIN (Str $dna) {
4 my @gc = $dna.uc.comb.grep(* eq 'G' | 'C');
5 say "$dna has {@gc.elems}";
6 }
Another way to write the |
Junction is with any
. The so
routine (https://docs.perl6.org/routine/so) collapses the various Booleans down to a single value.
> 'G' eq 'G' | 'C'
any(True, False)
> so 'G' eq 'G' | 'C'
True
> 'G' eq any(<G C>)
any(True, False)
> so 'G' eq any(<G C>)
True
It's extremely common to use regular expressions (https://docs.perl6.org/type/Regex) to filter lists. We'll cover these more later, but here I'm using a character class to represent either "G or C":
$ cat -n gc4.pl6
1 #!/usr/bin/env perl6
2
3 sub MAIN (Str $dna) {
4 my @gc = $dna.uc.comb.grep(/<[GC]>/);
5 say "$dna has {@gc.elems}";
6 }
Here's how you can find the prime numbers between 1 and 10:
> (1..10).grep(*.is-prime)
(2 3 5 7)
Classification
We can group elements based on predicates we supply. Here is how we can split up the numbers 1 through 10 based on whether they are or are not even divisible by 2:
> 2 %% 2
True
> 3 %% 2
False
> (1..10).classify(* %% 2)
{False => [1 3 5 7 9], True => [2 4 6 8 10]}
Going back to our G-C counter, we can group each base into whether it is or isn't a "G" or a "C":
$ cat -n gc5.pl6
1 #!/usr/bin/env perl6
2
3 sub MAIN (Str $dna) {
4 my %hash = $dna.uc.comb.categorize({?/<[GC]>/});
5 say "$dna has {%hash<True>.elems}";
6 }
In my opinion, it's not intuitive to use "True" or "False," so let's provide our own String value for the name of the bucket we want:
$ cat -n gc6.pl6
1 #!/usr/bin/env perl6
2
3 sub MAIN (Str $dna) {
4 my %hash = $dna.uc.comb.categorize({ /<[GC]>/ ?? 'GC' !! 'Other' });
5 say "$dna has {%hash<GC>.elems}";
6 }
It might help to see that one in the REPL:
> 'AACTAG'.uc.comb.classify({?/<[GC]>/})
{False => [A A T A], True => [C G]}
> 'AACTAG'.uc.comb.classify({/<[GC]>/ ?? 'GC' !! 'Other'})
{GC => [C G], Other => [A A T A]}
classify
takes a code block and uses the resulting string to put the element into a bucket. Here I've used the same regular expression /<[GC]>/
to return the string "GC" if it's a match or "Other" if it's not. The combination of the ?? !!
is the "ternary" operator that we'll talk about more later. The resulting Hash has a key called "GC" and its value is a list containing the "G" and "C" found in the string.
So you're seeing that lists can be inside of other Lists as well as inside of Hashes and other data structures.
I can classify my @dogs
based on the length of their names using that same syntax variations we saw for map
:
> @dogs.classify({.chars})
{4 => [Lulu], 5 => [Chaps Logan Patch], 6 => [Patton Bowzer]}
> @dogs.classify(*.chars)
{4 => [Lulu], 5 => [Chaps Logan Patch], 6 => [Patton Bowzer]}
> @dogs.classify(&chars)
{4 => [Lulu], 5 => [Chaps Logan Patch], 6 => [Patton Bowzer]}
Lists can also be composed of Pairs (https://docs.perl6.org/type/Pair). Here I'll redeclare my @dogs
with their names as the "key" and thier sex as the "value." Then I can classify
them on their value
:
> my @dogs = Chaps => 'male', Patton => 'male', Bowzer => 'male', Logan => 'male', Lulu => 'female', Patch => 'male'
[Chaps => male Patton => male Bowzer => male Logan => male Lulu => female Patch => male]
> @dogs.classify(-> $dog {$dog.value})
{female => [Lulu => female], male => [Chaps => male Patton => male Bowzer => male Logan => male Patch => male]}
> @dogs.classify({$^dog.value})
{female => [Lulu => female], male => [Chaps => male Patton => male Bowzer => male Logan => male Patch => male]}
> @dogs.classify({.value})
{female => [Lulu => female], male => [Chaps => male Patton => male Bowzer => male Logan => male Patch => male]}
> @dogs.classify(*.value)
{female => [Lulu => female], male => [Chaps => male Patton => male Bowzer => male Logan => male Patch => male]}
Picking random elements
To finish off, let's write a Shakespearean insult generator that uses the pick
method to randomly choose some perjoratives:
$ cat -n insult.pl6
1 #!/usr/bin/env perl6
2
3 sub MAIN (Int :$n=1) {
4 my @adjectives = qw{scurvy old filthy scurilous lascivious
5 foolish rascaly gross rotten corrupt foul loathsome irksome
6 heedless unmannered whoreson cullionly false filthsome
7 toad-spotted caterwauling wall-eyed insatiate vile peevish
8 infected sodden-witted lecherous ruinous indistinguishable
9 dishonest thin-faced slanderous bankrupt base detestable
10 rotten dishonest lubbery};
11 my @nouns = qw{knave coward liar swine villain beggar
12 slave scold jolthead whore barbermonger fishmonger carbuncle
13 fiend traitor block ape braggart jack milksop boy harpy
14 recreant degenerate Judas butt cur Satan ass coxcomb dandy
15 gull minion ratcatcher maw fool rogue lunatic varlet worm};
16
17 printf "You %s, %s, %s %s!\n",
18 @adjectives.pick(3), @nouns.pick for ^$n;
19 }
$ ./insult.pl6 -n=5
You foul, dishonest, old recreant!
You irksome, gross, false degenerate!
You old, dishonest, toad-spotted jack!
You ruinous, unmannered, foolish slave!
You scurry, slanderous, peevish harpy!
You can also use roll
so that each selection is made independently. Above I'm using the qw{}
"quote-word" operator (https://docs.perl6.org/language/quoting#index-entry-qw_word_quote) to create a list of words rather than writing:
my @adjectives = "scurvy", "old", "filthy";
You should spend a few minutes reading about all the different quoting options available as they will come in handy.