Programming in JuliaSets and Dictionaries
Sets
Sets are unordered collections of unique values. The main advantage of having a special type for sets is that the design of the data structure can be optimized for membership checking. Figuring out whether a given value is in an array requires going through each element in the array, so the amount of time it takes increases with the length of the array. By contrast, checking membership in a set can be done very quickly even if the set is large.
A = [1,2,3] S = Set(A) 2 in S # evaluates to true pop!(S, 2) # removes 2 push!(S, 11) # puts 11 in the set 2 in S # evaluates to false now
Exercise
Make a set which contains the first 10,000 prime numbers.
Hint: It suffices to look for primes among the first 110,000 integers. Compare how long it takes to check whether a given number is in that set to the time it takes to compute whether the number is prime using Primes.isprime
.
using Primes: isprime primes = # add code here primes_set = Set(primes) @time 98779 in primes @time 98779 in primes_set @time isprime(98779)
Solution. To get exactly 10,000 primes, we index the list obtained by filtering out the composite numbers:
using Primes: isprime primes = [k for k in 2:110_000 if isprime(k)][1:10000] primes_set = Set(primes) @time 98779 in primes @time 98779 in primes_set @time isprime(98779)
Put the three methods in order from fastest to slowest:
Dictionaries
The internal mechanism that sets use to check membership extremely fast is also useful when the information you want to retrieve is more complex than just true
or false
.
For example, suppose you want to store a collection of color names together with the RGB values for each one. We'll store the names as
It's possible to do this by putting the names in an array and the values in a list of the same length:
names = ["fuchsia", "firebrick", "goldenrod"]
rgbs = [(256, 0, 256), (178, 34, 34), (218, 165, 32)]
However, this solution gets very tedious quickly. For example, modifying this structure requires
The Julia data structure tailored to the problem of encoding a map from one finite set to another is called a dictionary. Dictionaries are created by supplying pairs to the Dict
function. For example, the dictionary encoding the map described above looks like this:
rgb = Dict( "fuchsia" => (256, 0, 256), "firebrick" => (178, 34, 34), "goldenrod" => (218, 165, 32) )
The domain elements "fuchsia"
, "firebrick"
and "goldenrod"
are called the keys of the dictionary, and the codomain elements (256,0,256)
, (178,34,34)
, and (218,165,32)
are called the values.
We can also form new dictionaries from lists of pairs using the dict
function:
Dict([
("fuchsia", (256, 0, 256)),
("firebrick", (178, 34, 34)),
("goldenrod", (218, 165, 32))
])
We can perform a dictionary lookup using indexing rgb["fuchsia"]
returns (256,0,256)
. We can also change the value associated with a given key or introduce a new key-value pair using indexing and assignment:
rgb = Dict( "fuchsia" => (256, 0, 256), "firebrick" => (178, 34, 34), "goldenrod" => (218, 165, 32) ) rgb["crimson"] = (220, 20, 60) rgb
keys
and values
functions may be used to obtain the keys and values.
rgb = Dict( "fuchsia" => (256, 0, 256), "firebrick" => (178, 34, 34), "goldenrod" => (218, 165, 32) ) collect(keys(rgb))
Exercise
Consider a dictionary which encodes flight arrival times:
import Dates
arrival_times = Dict(
"JetBlue 924" => Dates.Time(7,9),
"United 1282" => Dates.Time(7,42),
"Southwest 196" => Dates.Time(7,3)
)
You can most easily use this dictionary to
Suppose you want to reverse the lookup direction: for any given time, you want to see which flight arrives at that time. One problem is that
Assuming that the codomain values are distinct, however, you can form a new dictionary that allows you to look up keys for values by using an array comprehension that iterates over the key-value pairs of the dictionary (obtainable using the pairs
function).
Implement this idea in the block below. Check that your dictionary works by indexing it with Dates.time(7,9)
.
import Dates arrival_times = Dict( "JetBlue 924" => Dates.Time(7,9), "United 1282" => Dates.Time(7,42), "Southwest 196" => Dates.Time(7,3) )
Solution. We use the Dict
function to convert the list of pairs back into a dictionary: Dict([(b,a) for (a,b) in pairs(arrival_times)])
.
Exercises
Exercise
You can construct dictionaries using a comprehension in Julia. For example, here's a dictionary that maps each one-digit positive integer to its square:
square_dict = Dict(k => k*k for k in 1:9)
Use a dictionary comprehension to make a dictionary which maps each of the first 100 powers of 2 to its units digit. Note: you'll need to use big(2)
instead of 2
to calculate its 100th power, because is larger than the largest 64-bit integer.
Solution. We convert to a string, get the last character, and convert back to an integer:
Dict([big(2)^k => parse(Int64, string(big(2)^k)[end]) for k in 1:100])
Exercise
Suppose you want to store student IDs in a part of a web application where the main thing you need to do is check whether an ID input by a student is a valid student ID (so you can flag it if it has been mistyped). Among the given options, the best data structure for this purpose would be a
Solution. This is an ideal use case for sets. Lists and tuples will be slower for checking membership, and dictionaries aren't quite appropriate because it isn't clear what the values would be.