Sets
Bags
Bags are like instant histograms. They're perfect for counting the number of times each item occurs in a set. Here's a quick way to count the lengths of each line in a file:
$ cat -n line-count-histogram1.pl6
1 #!/usr/bin/env perl6
2
3 sub MAIN (Str $file) {
4 my $bag = $file.IO.lines.map(*.chars).Bag;
5 put $bag.pairs.sort(*.key).map(*.join("\t")).join("\n");
6 }
Lots of stuff happening in just two lines! Let's break it down. Given this file:
$ cat words
a
an
the
frog
cat
dog
horse
Here is how we get there. Read all the lines from the file:
> 'words'.IO.lines
(a an the frog cat dog horse)
Now map
each line into a function to count how many characters are in the line. The *
stands for the current line as it passes through the function:
> 'words'.IO.lines.map(*.chars)
(1 2 3 4 3 3 5)
Now "Bag" those -- Perl will count each unique entry. Since there are three words that have three characters, you see "3(3)":
> my $bag = 'words'.IO.lines.map(*.chars).Bag
bag(5, 4, 3(3), 1, 2)
Get the pairs
from the Bag where the key
represents the length of the word and the value
is the number of times it was seen:
> $bag.pairs
(5 => 1 4 => 1 3 => 3 1 => 1 2 => 1)
You see these are not returned in any particular order, so let's have them sorted by the key
:
> $bag.pairs.sort(*.key)
(1 => 1 2 => 1 3 => 3 4 => 1 5 => 1)
To keep this on one line, I'm going to join the key
/value
with a colon and each member of the bag with a comma:
> $bag.pairs.sort(*.key).map(*.kv.join(":"))
(1:1 2:1 3:3 4:1 5:1)
> $bag.pairs.sort(*.key).map(*.kv.join(":")).join(", ")
1:1, 2:1, 3:3, 4:1, 5:1
If I would rather see the words sorted by the number of times they occur with the most frequent ones first, it would look like this:
> $bag.pairs.sort(*.value).reverse.map(*.kv.join(":")).join(", ")
3:3, 2:1, 1:1, 4:1, 5:1
Here is the output from the script:
$ ./line-count-histogram1.pl6 words
1 1
2 1
3 3
4 1
5 1
$ ./line-count-histogram1.pl6 /usr/share/dict/words
1 52
2 160
3 1420
4 5272
5 10230
6 17706
7 23869
8 29989
9 32403
10 30878
11 26013
12 20462
13 14939
14 9765
15 5925
16 3377
17 1813
18 842
19 428
20 198
21 82
22 41
23 17
24 5
SetHash, BagHash
Sets and Bags are immutable:
> my $bag = <foo bar foo baz>.Bag
bag(foo(2), baz, bar)
> $bag<foo> = 3
Cannot modify an immutable Int
in block <unit> at <unknown file> line 1
If you want to change them, use the SetHash
(https://docs.perl6.org/type/SetHash) and BagHash
(https://docs.perl6.org/type/BagHash) variants:
> my $baghash = <foo bar foo baz>.BagHash
BagHash.new(foo(2), baz, bar)
> $baghash<foo>++
2
> $baghash
BagHash.new(foo(3), baz, bar)
These are both excellent data structures for when you have a unique group of items from which you would like to draw such that they are removed from the container. When I was writing my Blackjack game, I found a Set
to be the natural container for a deck of 52 unique cards:
> my @suites = <D H C S>
[D H C S]
> my @faces = <2 3 4 5 6 7 8 9 10 J Q K A>
[2 3 4 5 6 7 8 9 10 J Q K A]
> my $deck = (@faces X @suites).map(~*).SetHash
SetHash.new(3 C, 5 H, 7 S, 8 D, 10 S, 2 D, 2 H, 9 D, Q S, A C, 6 H, 6 C, 10 C, Q H, J S, 3 H, 9 H, A H, J D, 7 D, 8 H, Q C, 10 D, J C, 3 D, 4 D, 6 D, 8 C, 5 C, 5 D, 4 H, 2 S, K D, 8 S, 7 C, 4 S, K C, 7 H, K H, K S, 6 S, 5 S, J H, A D, A S, 10 H, 9 S, 4 C, 2 C, 9 C, 3 S, Q D)
> $deck.elems
52
> $deck.grab(2).join(', ')
9 C, 9 D
> $deck.elems
50
The X
(https://docs.perl6.org/routine/X) operator crosses two lists to produce a Pair
for every combination, then I map
each Pair
into a stringification operation ~*
because the keys of a Set or Bag are strings. Rather than stringifying the pairs, I could flatten the whole list and use pairup
:
> ((@faces X @suites) xx 2).flat.pairup
(2 => D 2 => H 2 => C 2 => S 3 => D 3 => H 3 => C 3 => S 4 => D 4 => H 4 => C 4 => S 5 => D 5 => H 5 => C 5 => S 6 => D 6 => H 6 => C 6 => S 7 => D 7 => H 7 => C 7 => S 8 => D 8 => H 8 => C 8 => S 9 => D 9 => H 9 => C 9 => S 10 => D 10 => H 10 => C 10 => S J => D J => H J => C J => S Q => D Q => H Q => C Q => S K => D K => H K => C K => S A => D A => H A => C A => S 2 => D 2 => H 2 => C 2 => S 3 => D 3 => H 3 => C 3 => S 4 => D 4 => H 4 => C 4 => S 5 => D 5 => H 5 => C 5 => S 6 => D 6 => H 6 => C 6 => S 7 => D 7 => H 7 => C 7 => S 8 => D 8 => H 8 => C 8 => S 9 => D 9 => H 9 => C 9 => S 10 => D 10 => H 10 => C 10 => S J => D J => H J => C J => S Q => D Q => H Q => C Q => S K => D K => H K => C K => S ...)
> ((@faces X @suites) xx 2).flat.pairup.elems
104
When I grab
cards (at random) from the Set, they are removed so that I don't have to worry about seeing them again. When I realized that casinos regularly draw from a stack of many decks of cards, I realized I'd have to turn to a BagHash
so that I could keep track of how many, e.g., Jack of Diamonds I have. I used the xx
(https://docs.perl6.org/routine/xx) list repetition operator. Take a look at that:
> 'YYZ' xx 3
(YYZ YYZ YYZ)
> 10 xx 3
(10 10 10)
> foo => 'bar' xx 3
foo => (bar bar bar)
> { foo => 'bar' } xx 3
({foo => bar} {foo => bar} {foo => bar})
> [1,2] xx 3
([1 2] [1 2] [1 2])
Depending on how many decks I decide to deal from, it's easy to keep track of how many of each card has been dealt so far:
> my $baghash = (((@faces X @suites).map(~*)) xx 2).flat.BagHash
BagHash.new(3 C(2), 5 H(2), 7 S(2), 8 D(2), 10 S(2), 2 D(2), 2 H(2), 9 D(2), Q S(2), A C(2), 6 H(2), 6 C(2), 10 C(2), Q H(2), J S(2), 3 H(2), 9 H(2), A H(2), J D(2), 7 D(2), 8 H(2), Q C(2), J C(2), 3 D(2), 4 D(2), 6 D(2), 8 C(2), 10 D(2), K D(2), 2 S(2), 4 H(2), 5 D(2), 5 C(2), K C(2), 4 S(2), 7 C(2), 8 S(2), K S(2), K H(2), 7 H(2), A S(2), A D(2), J H(2), 5 S(2), 6 S(2), 2 C(2), 4 C(2), 9 S(2), 10 H(2), Q D(2), 3 S(2), 9 C(2))
> $baghash.grab(60).elems
60
> $baghash.elems
37
Mixes
Mixes and MixHashes are like Bags and BagHashes except that the weights are Reals rather than Ints.