Spark is built on the primitive of the RDD, which isn't a distribute map, its more of a collection of rows similar to an RDBMS. You could of course stream one set of data from the master to the nodes and search on each nodes data(building hashmaps there) to do a lookup. But other than that I'm not sure how you'd see it working (You might want to look into more of the documents they have on the site.. each node would need to do arbitary fetches at map lookup time to a random other node across the network...which isn't ideal or a spark function).
You can't join strings and lists, if they are both the keys... but you can't lookup a hashmap of strings with a list of strings either. You can run a map function against the RDD's to pull out the join field you wish to use as the lookup against the RDD and then join on that. The 5 RDD's will be partitioned based on the key's so plenty of work will be avoided if they don't intersect at the remote nodes. You can play with using custom partitioners during the earlier loading/processing phases to avoid more cross network traffic but i'd just see about getting it working first....
In summary
spark.RDD[A] map's to spark.RDD[(B, A)]
spark.RDD[C] map's to spark.RDD[(B, C)]
These RDD's can now be joined to produce (B, A, C) as an RDD which should give the effect you are looking for. But as ever without knowing the data types and skew and so on...YMMV
Ultimately if this doesn't fit well maybe spark isn't really what you should be using for your task...