Peace. Michael
string.split("\\W+").groupBy(identity).mapvalues(_.length)
Though, perhaps, \P{Alpha}+ woudl be better than \W+, as it would get
rid of numbers. It really comes down to what exactly do you want. When
it comes to regex, you really should be precise in what you mean.
>
> Am 18.03.2012 21:16, schrieb Ken McDonald:
>> I can't seem to wrap my mind around the problem of counting how many
>> times a word occurs in a paragraph. That is, roughly, a function
>> Seq[String] => Map[String, Int]. I feel like it should be a one liner,
>> but all the "obvious" solutions to me are considerably longer. Thanks
>> for any advice.
>>
>> Thanks,
>> Ken
>
--
Daniel C. Sobral
I travel to the future all the time.
-------- Original-Nachricht --------
> Datum: Sun, 18 Mar 2012 17:08:36 -0400
> Von: Luke Vilnis <lvi...@gmail.com>
> An: HamsterofDeath <h-s...@gmx.de>
> CC: scala...@googlegroups.com
> Betreff: Re: [scala-user] Best way to count frequency of words in a paragraph?
-------- Original-Nachricht --------
> Datum: Thu, 22 Mar 2012 09:07:54 -0700 (PDT)
> Von: Ken McDonald <ykke...@gmail.com>
> An: scala...@googlegroups.com
> CC: HamsterofDeath <h-s...@gmx.de>
> Betreff: Re: [scala-user] Best way to count frequency of words in a paragraph?
> Thanks, everyone, for the replies. You even designed my regex for me,
--
Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir
belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de
We are looking for a volounter happy to code a DSL in scala to create regexp :))
Edmondo
2012/3/23 √iktor Ҡlang <viktor...@gmail.com>:
Regexp have historically been one of the most tedious part to learn in
programming.
We are looking for a volounter happy to code a DSL in scala to create regexp :))
We are looking for a volounter happy to code a DSL in scala to create regexp :))
('?\\P{Alpha}+)|(\\P{Alpha}+'?\\P{Alpha}+)|(\\P{Alpha}+'?)|(\\P{Alpha}+)
-------- Original-Nachricht --------
> Datum: Fri, 23 Mar 2012 16:13:15 +0100
> Von: "√iktor Ҡlang" <viktor...@gmail.com>
> CC: scala...@googlegroups.com
> Viktor Klang
>
> Akka Tech Lead
> Typesafe <http://www.typesafe.com/> - The software stack for applications
> that scale
>
> Twitter: @viktorklang
--
NEU: FreePhone 3-fach-Flat mit kostenlosem Smartphone!
Jetzt informieren: http://mobile.1und1.de/?ac=OM.PW.PW003K20328T7073a
https://github.com/KenMcDonald/rex
That's harder. You could use [^A-Za-z'], which would catch these
words, but would also pick up ' used in other contexts. It also ignore
non-ASCII alphabetic characters. It might be possible to come up with
some other split pattern, but I'm guessing it would be annoyingly
hard. The whole point of using \P{Alpha} is to pick everything that
is *NOT* alphanumeric, which mirrors what split is doing: identifying
everything that is not what you want.
A better solution would be to use findAllIn with a pattern that
describes what words look like, in which case would could have
something like (\p{Alpha}(('\p{Alpha})|\p{Alpha})*).
I did some testing on the REPL:
"bat".split("[^bcr]at")
res165: Array[java.lang.String] = Array(bat)
"cat".split("[^bcr]at")
res166: Array[java.lang.String] = Array(cat)
"hat".split("[^bcr]at")
res167: Array[java.lang.String] = Array()
Well, this is quite the opposite behavior to what is stated on Javas Regex tutorial website:
They say: "To match all characters except those listed, insert the "^" metacharacter at the beginning of the character class. This technique is known as negation.
Enter your regex: [^bcr]at
Enter input string to search: bat
No match found.
Enter your regex: [^bcr]at
Enter input string to search: cat
No match found.
Enter your regex: [^bcr]at
Enter input string to search: hat
I found the text "hat" starting at index 0 and ending at index 3."
So, what am I missing here? Can someone enlighten me?
-------- Original-Nachricht --------
> Datum: Fri, 23 Mar 2012 19:02:49 -0300
> Von: Daniel Sobral <dcso...@gmail.com>
> CC: scala...@googlegroups.com
Your first and second ones do not match anything, so the string itself
is returned. The third one matches everything in the string, so there
is nothing left.
Try something like "bathatcat".split("[^bcr]at")
You are using split. Split doesn't return the matching strings (that's
what the findAllIn suggestion would do). Split *removes* all matching
strings, and returns an array of the remaining strings, broken up at
the point where the removal happens.
I'm playing around with Actors and having problems figuring out why method fetch is running only one time. Does anyone know why?
Here is the code...
import scala.xml._
import XML._
import scala.actors._
import Actor._
import java.net.URL
object RssFetch extends App {
val messenger = actor {
loop {
react {
case Response(msg: String) => println(msg)
}
}
}
val rssFetcher = new Fetcher(messenger)
rssFetcher.start
}
case class FetchFeeds()
case class InitFeeder()
case class Response(title: String)
class Fetcher(messenger: Actor) extends Actor {
this ! InitFeeder
private def periodicFetch() {
val feeder = self
actor {
loop {
println("Starting periodic fetching...")
Thread.sleep(3000)
feeder ! FetchFeeds
}
}
}
def act {
loop {
react {
case FetchFeeds => fetch()
case InitFeeder => periodicFetch()
}
}
}
def fetch(): Unit = {
val rssFeed = XML.load(new URL("http://www.google.com/news?pz=1&cf=all&ned=us&hl=en&output=rss").openConnection.getInputStream)
val items = rssFeed \ "channel" \ "item"
for {
title <- items \ "title"
} messenger ! Response(title text)