Can you pattern match across entries in an array?

265 views
Skip to first unread message

Mark Nelson

unread,
Jul 13, 2011, 6:01:46 AM7/13/11
to scala...@googlegroups.com
Hi All,

I have an Array[Byte] - much bigger than the example below - and I need to go through it and find all occurrences of (26, 20, 64) and remove them from the array - so x (below) should become (1, 2, 3, 5, 6, 7).

scala> x
res23: Array[Byte] = Array(1, 2, 3, 26, 20, 64, 5, 6, 7)

I am trying to find a good way to this, without writing a loop that just goes through and checks for a 26, and then checks to see if it is followed by a 20 and a 64, something a bit "more Scala".

Any ideas greatly appreciated.

Best regards,

--
Mark Nelson (mark.x...@gmail.com)

Kevin Wright

unread,
Jul 13, 2011, 6:13:54 AM7/13/11
to Mark Nelson, scala...@googlegroups.com
The easiest way to locate all instances of that sequence is:

    (x sliding 3).zipWithIndex collect {case (Array(26,20,64), i) => i}

It's up to you how you then remove such occurrences :)

--
Kevin Wright

gtalk / msn : kev.lee...@gmail.com
google+: http://gplus.to/thecoda
mail: kevin....@scalatechnology.com
vibe / skype: kev.lee.wright
quora: http://www.quora.com/Kevin-Wright
twitter: @thecoda

"My point today is that, if we wish to count lines of code, we should not regard them as "lines produced" but as "lines spent": the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger" ~ Dijkstra

Mark Nelson

unread,
Jul 13, 2011, 6:28:06 AM7/13/11
to Kevin Wright, scala...@googlegroups.com
thanks kevin :)
--
Mark Nelson (mark.x...@gmail.com)

Kevin Wright

unread,
Jul 13, 2011, 6:28:35 AM7/13/11
to Mark Nelson, scala...@googlegroups.com
Actually, no it isn't...

This should do the trick though:

    def eliminate[T](xs: Array[T], slice: Seq[T]): Array[T] = {
      val i = xs indexOfSlice slice
      if(i == 0) xs else eliminate(xs.take(i) ++ xs.drop(i+3), slice)

Daniel Kristensen

unread,
Jul 13, 2011, 6:57:33 AM7/13/11
to Kevin Wright, Mark Nelson, scala...@googlegroups.com
Shouldn't "i == 0" be "i == -1"? And "3" should probably be slice.size

Also, it looks like this solution has complexity O(m * n), where m is the number of occurrences of the seq to eliminate. I'd hesitate to use such an algorithm unless I know m will be small.

Best regards
Daniel

Tony Morris

unread,
Jul 13, 2011, 7:02:35 AM7/13/11
to scala...@googlegroups.com

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Array(1, 2, 3, 26, 20, 64, 5, 6, 7) filter (!Set(26, 20, 64).contains(_))

Prefer combinators to pattern-matching. Pattern-matching is a last resort.
- --
Tony Morris
http://tmorris.net/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk4de0sACgkQmnpgrYe6r61fgQCdGXh7YG0MAmqi02mm+Tlp5IAR
kXcAn0e5pOI3oYuATd83o1LeVSubYjDa
=qeZW
-----END PGP SIGNATURE-----

Kevin Wright

unread,
Jul 13, 2011, 7:10:24 AM7/13/11
to tmo...@tmorris.net, scala...@googlegroups.com
On 13 July 2011 12:02, Tony Morris <tonym...@gmail.com> wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Array(1, 2, 3, 26, 20, 64, 5, 6, 7) filter (!Set(26, 20, 64).contains(_))

Prefer combinators to pattern-matching. Pattern-matching is a last resort.


That won't work, as the goal is to match the entire sequence, not just any occurrence of 26,20 or 64

Philippe Lhoste

unread,
Jul 13, 2011, 7:11:05 AM7/13/11
to scala...@googlegroups.com
On 13/07/2011 13:02, Tony Morris wrote:
> Array(1, 2, 3, 26, 20, 64, 5, 6, 7) filter (!Set(26, 20, 64).contains(_))
>
> Prefer combinators to pattern-matching. Pattern-matching is a last resort.

The problem is that the code above eliminates all occurrences of 20, 26 and 64 numbers in
the array, not checking it is actually a sequence of 26, 20, 64.
The first sentence of the request was ambiguous but the second part clarified the
requirement...

--
Philippe Lhoste
-- (near) Paris -- France
-- http://Phi.Lho.free.fr
-- -- -- -- -- -- -- -- -- -- -- -- -- --

Kevin Wright

unread,
Jul 13, 2011, 7:12:55 AM7/13/11
to Daniel Kristensen, Mark Nelson, scala...@googlegroups.com
On 13 July 2011 11:57, Daniel Kristensen <daniel.k...@gmail.com> wrote:
Shouldn't "i == 0" be "i == -1"? And "3" should probably be slice.size

Yup, it should :)
 
Also, it looks like this solution has complexity O(m * n), where m is the number of occurrences of the seq to eliminate. I'd hesitate to use such an algorithm unless I know m will be small.

It's completely sub-optimal, Array's really aren't the best choice of structure for this kind of task...


 
Best regards
Daniel




On Wed, Jul 13, 2011 at 12:28 PM, Kevin Wright <kev.lee...@gmail.com> wrote:
Actually, no it isn't...

This should do the trick though:

    def eliminate[T](xs: Array[T], slice: Seq[T]): Array[T] = {
      val i = xs indexOfSlice slice
      if(i == 0) xs else eliminate(xs.take(i) ++ xs.drop(i+3), slice)
    }

Mark Nelson

unread,
Jul 13, 2011, 7:16:44 AM7/13/11
to Kevin Wright, Daniel Kristensen, scala...@googlegroups.com
The Array[Byte] is coming from a Java API... but I could change it into some other data structure.. what would be better? 
--
Mark Nelson (mark.x...@gmail.com)

Mark Nelson

unread,
Jul 13, 2011, 7:23:34 AM7/13/11
to Kevin Wright, Daniel Kristensen, scala...@googlegroups.com
Hi Kevin,

The REPL is not liking that ++

scala> def eliminate[T](xs: Array[T], slice: Seq[T]): Array[T] = {

     |       val i = xs indexOfSlice slice
     |       if(i == 0) xs else eliminate(xs.take(i) ++ xs.drop(i+3), slice)
     | }
<console>:9: error: type mismatch;
 found   : scala.collection.mutable.ArraySeq[T]
 required: Array[T]

             if(i == 0) xs else eliminate(xs.take(i) ++ xs.drop(i+3), slice)
                                                     ^

Sorry I am a newbie - that is meant to concat xs from 0 up to i to xs from i+3 to end right?

Thanks,
mark
--
Mark Nelson (mark.x...@gmail.com)

Mark Nelson

unread,
Jul 13, 2011, 7:35:43 AM7/13/11
to Kevin Wright, Daniel Kristensen, scala...@googlegroups.com
So I now have:

        val b = new Array[Byte](size);
        // get the data into b
        val pattern = new Array[Byte](26.toByte, 20.toByte, 64.toByte)
        val c = eliminate[Byte](b, pattern)
        c.foreach(byte => print(byte.toChar))
       
    ...

   
    def eliminate[T](xs: Array[T], slice: Seq[T]): Array[T] = {
        val i = xs indexOfSlice slice
        if(i == 0) xs else eliminate(xs.take(i) ++ xs.drop(i + slice.size), slice)
    }

still getting an error on the ++

c:\src\JarDepWalker\JarDepWalker.scala:46: error: type mismatch;

 found   : scala.collection.mutable.ArraySeq[T]
 required: Array[T]
                if(i == 0) xs else eliminate(xs.take(i) ++ xs.drop(i + slice.size), slice)
                                                        ^
one error found

Appreciate any advice - thanks again
--
Mark Nelson (mark.x...@gmail.com)

Ruediger Keller

unread,
Jul 13, 2011, 7:37:18 AM7/13/11
to Mark Nelson, Kevin Wright, Daniel Kristensen, scala...@googlegroups.com
Try this:

def eliminate[@specialized(Byte) T : ClassManifest](xs: Array[T],


slice: Seq[T]): Array[T] = {
val i = xs indexOfSlice slice

if(i == -1) xs else xs.take(i) ++ eliminate(xs.drop(i + slice.length), slice)
}


2011/7/13 Mark Nelson <mark.x...@gmail.com>:

Tony Morris

unread,
Jul 13, 2011, 7:38:30 AM7/13/11
to Kevin Wright, scala...@googlegroups.com

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 13/07/11 21:10, Kevin Wright wrote:
> On 13 July 2011 12:02, Tony Morris <tonym...@gmail.com> wrote:
>

>> **


>>
>> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
>>
>> Array(1, 2, 3, 26, 20, 64, 5, 6, 7) filter (!Set(26, 20,
>> 64).contains(_))
>>
>> Prefer combinators to pattern-matching. Pattern-matching is a
>> last resort.
>>
>>
> That won't work, as the goal is to match the entire sequence, not
> just any occurrence of 26,20 or 64
>

Use tails.

- --
Tony Morris
http://tmorris.net/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk4dg7YACgkQmnpgrYe6r61+mACfRlrsRTbHznGmuMsDR36KwYoG
EgsAn2kGVkO0aaokDOyCp+v9VQjetu05
=XOZg
-----END PGP SIGNATURE-----

Ruediger Keller

unread,
Jul 13, 2011, 7:46:10 AM7/13/11
to Mark Nelson, Kevin Wright, Daniel Kristensen, scala...@googlegroups.com
Or, simpler when you know you're only working with Array[Byte]:

def eliminate(xs: Array[Byte], slice: Seq[Byte]): Array[Byte] = {


val i = xs indexOfSlice slice

if(i == -1) xs else xs.take(i) ++ eliminate(xs.drop(i + slice.length), slice)
}

Although, as others have noted before, this isn't very efficient.

Regards,
Rüdiger


2011/7/13 Mark Nelson <mark.x...@gmail.com>:

Mark Nelson

unread,
Jul 13, 2011, 7:51:42 AM7/13/11
to Ruediger Keller, scala...@googlegroups.com
sorry forgot to cc the group

On Wed, Jul 13, 2011 at 9:48 PM, Mark Nelson <mark.x...@gmail.com> wrote:
Thank you all, with your help, I got it working :-)

For the record, the code I ended up with was:


        val b = Array[Byte](size)
        // get data into b
        val pattern = Array(26.toByte, 20.toByte, 64.toByte)

        val c = eliminate[Byte](b, pattern)
        c.foreach(byte => print(byte.toChar))
       
    }
   
    def eliminate[@specialized(Byte) T : ClassManifest](xs: Array[T], slice: Seq[T]): Array[T] = {
        val i = xs indexOfSlice slice
        if(i == -1) xs else xs.take(i) ++ eliminate(xs.drop(i + slice.length), slice)
    }

Best regards,



--
Mark Nelson (mark.x...@gmail.com)

Lars Hupel

unread,
Jul 13, 2011, 7:56:32 AM7/13/11
to scala...@googlegroups.com
> It's completely sub-optimal, Array's really aren't the best choice of
> structure for this kind of task...

If I'm not mistaken Arrays are not the problem here. Actually, as they
provide O(1) index access, they're a pretty good structure for searching.

To improve from O(m*n) to O(n) in the worst case, I'd recommend using
the Boyer-Moore algorithm
(<http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm>)
instead. That's unfortunately in imperative style, but I don't know any
other efficient method.

Daniel Sobral

unread,
Jul 13, 2011, 11:11:57 AM7/13/11
to Mark Nelson, scala...@googlegroups.com
This can be solved through regular expression or even simple string
matching. In particular, since its an array of bytes, you can do this:

x.map(_.toChar).mkString.replaceAll(Array(26,20,64).map(_.toChar).mkString,
"").map(_.toByte)

If you have something that won't map into chars, you can use Scala's
generic automata/regex stuff (scala.util.regexp, the package, not to
be confused with scala.util.matching.Regex, the class).

--
Daniel C. Sobral

I travel to the future all the time.

Chris Marshall

unread,
Jul 13, 2011, 12:01:45 PM7/13/11
to tmo...@tmorris.net, scala...@googlegroups.com
As Set is itself a function, this can be modified to 

Array(1, 2, 3, 26, 20, 64, 5, 6, 7) filterNot Set(26, 20, 64)

:-)

Chris



Date: Wed, 13 Jul 2011 21:02:35 +1000
From: tonym...@gmail.com
To: scala...@googlegroups.com
Subject: Re: [scala-user] Can you pattern match across entries in an array?

Chris Marshall

unread,
Jul 13, 2011, 12:06:46 PM7/13/11
to mark.x...@gmail.com, scala...@googlegroups.com
One question springs to mind, do you only do this once? 

Array(26, 20, 26, 20, 64, 64) ~> what should this end up as?

Chris



Date: Wed, 13 Jul 2011 20:01:46 +1000
Subject: [scala-user] Can you pattern match across entries in an array?
From: mark.x...@gmail.com
To: scala...@googlegroups.com

Vlad Patryshev

unread,
Jul 13, 2011, 12:45:52 PM7/13/11
to Daniel Kristensen, Mark Nelson, Kevin Wright, scala...@googlegroups.com

To avoid complexity multiplication one should scan once.

-Vlad

Rex Kerr

unread,
Jul 13, 2011, 1:47:56 PM7/13/11
to Mark Nelson, Ruediger Keller, scala...@googlegroups.com
Do you need this code to be fast, or to be correct?  There is a _sizable_ difference in how you should code in each case.

Your answer below has quadratic time complexity in the number of items to remove, and is vulnerable to stack overflows if you're patient enough for it to recurse that many times.

If you have very few items to remove, that's okay.  Otherwise it's a disaster, and you should at least use Daniel Sobral's string replace solution.  Unfortunately, that one is about 10-14x slower than a "low-level" solution but mostly independent of the number of elisions.  The solution you have is "only" about 8x slower than a "low-level" solution if you have a single match, but that rises to ~20x with five matches, and gets increasingly terrible from there.

The bottom line is that Scala does not come with a library that has primitive features that make this operation simultaneously elegant and efficient.  If you don't need it to be all that efficient, then there are a variety of reasonable ways to do it, many of which have been shown here.  But if efficiency is a significant concern, in this case and in many others, you really need to use while loops on arrays.  In this case, specializing on types is pretty easy, but this isn't always the case.  Anyway, a fast solution would look like so:

def elim[@specialized T: ClassManifest](xs: Array[T], slice: Array[T]) = {
  val ans = new Array[T](xs.length)
  var i,n = 0
  while (i < xs.length) {
    if (xs(i) != slice(0)) {
      ans(n) = xs(i)
      i += 1
      n += 1
    }
    else if (xs.length - i < slice.length) {
      while (i<xs.length) {
        ans(n) = xs(i)
        i += 1
        n += 1
      }
    }
    else {
      var j=1
      while (j < slice.length && xs(i+j)==slice(j)) j += 1;
      if (j<slice.length) {
        while (j > 0) {
          ans(n) = xs(i)
          i += 1
          n += 1
          j -= 1
        }
      }
      else i += j
    }
  }
  if (n<xs.length) {
    val a2 = new Array[T](n)
    Array.copy(ans,0,a2,0,n)
    a2
  }
  else ans

}

and gives the timings (compared to other solutions) that I listed above.  Note that this is error-prone (easy to make mistakes on indexes!) and inelegant (note regions of repeated/similar code!), but it is fast.  If it's important to do it fast, write something like this (or just take this), test it carefully, and then stick it in a library somewhere where you don't have to think about the internal workings.

  --Rex

P.S. If you think you might pass an empty slice in, add an if (slice.length==0) xs at the top.

Andrew Garman

unread,
Jul 13, 2011, 2:39:10 PM7/13/11
to scala-user
Haskell has Boyer-Moore string search implemented in Daniel Fischer's
stringsearch package.

The code is quite a dense, fun read.
Reply all
Reply to author
Forward
0 new messages