Converting ISO-8859-1 to UTF-8 for MultipartFormData in Play2 + Scala

874 views
Skip to first unread message

jakob dobrzynski

unread,
May 10, 2013, 10:01:14 AM5/10/13
to play-fr...@googlegroups.com
Hello!

I have hooked up my Play2+Scala application to Sendgrid Parse Api and I'm really struggling in decoding and encoding the content of the email.

Since the emails could be in different encodings Sendgrid provides us with a JSON object charsets:

{"to":"UTF-8","cc":"UTF-8","subject":"UTF-8","from":"UTF-8","text":"iso-8859-1","html":"iso-8859-1"}

In my test case "text" is "Med Vänliga Hälsningar Jakobs Webshop" If I extract that from the multipart request and print it out:

Logger.info(request.body.dataParts.get("text").get)

I get:

Med V?nliga H?lsningar Jakobs Webshop

Ok so with the given info from Sendgrid let's fix the string so that it is UTF-8.

def parseMail = Action(parse.multipartFormData) {
    request => {

    val inputBuffer = request.body.dataParts.get("text").map {
        v => ByteBuffer.wrap(v.head.getBytes())
    }

    val fromCharset = Charset.forName("ISO-8859-1")
    val toCharset = Charset.forName("UTF-8")

    val data = fromCharset.decode(inputBuffer.get)
    Logger.info(""+data)

    val outputBuffer = toCharset.encode(data)
    val text = new String(outputBuffer.array())

    // Save stuff to MongoDB instance

}

This results in:

Med V�nliga H�lsningar Jakobs Webshop

So this is very strange. This should work. I wonder what actually happens in the body parserparse.multipartFormData and the datapart handler:

def handleDataPart: PartHandler[Part] = {
        case headers @ PartInfoMatcher(partName) if !FileInfoMatcher.unapply(headers).isDefined =>
          Traversable.takeUpTo[Array[Byte]](DEFAULT_MAX_TEXT_LENGTH)
            .transform(Iteratee.consume[Array[Byte]]().map(bytes => DataPart(partName, new String(bytes, "utf-8")))(play.core.Execution.internalContext))
            .flatMap { data =>
              Cont({
                case Input.El(_) => Done(MaxDataPartSizeExceeded(partName), Input.Empty)
                case in => Done(data, in)
              })
            }(play.core.Execution.internalContext)
      } 

When consuming the data a new String is created with the encoding utf-8:

.transform(Iteratee.consume[Array[Byte]]().map(bytes => DataPart(partName, new String(bytes, "utf-8")))(play.core.Execution.internalContext))

Does this mean that my ISO-8859-1 encoded string text is encoded with utf-8 when parsed? If so, how should I create my parser to decode and then encode my params according to the provided JSON object charsets? Clearly I'm doing something wrong but I can't figure it out!

Christian Papauschek

unread,
May 10, 2013, 5:37:44 PM5/10/13
to play-fr...@googlegroups.com
I had a similar problem when processing payments from PayPal, which are encoded in ISO-8859-1.
the problem is that play decodes everything with UTF-8 by default, and there is no way to change that, other then implementing your own parser.

so you have to go ahead and copy the implementation of the parse.multipartFormData function, changing the decodings from utf8 to iso8859-1, and use that in your Action.

I believe this is a mistake/bug in Play's default bodyparsers, because the default charset of HTTP1.1 is ISO-8859-1, not UTF8, when no charset is given.

-- cp

jakob dobrzynski

unread,
May 13, 2013, 3:04:13 PM5/13/13
to play-fr...@googlegroups.com
Hi!

Thank you for your answer!
Hmm changing everything to ISO-8859-1 wont help either since there can be different encodings 

Currently I extract the encodings from the request.body.dataParts like this:

    val charsets = extract(request.body.dataParts, "charsets", _.as[Charsets]).getOrElse(Charsets(Some(""), Some(""), Some(""), Some(""), Some("")))

    def extract[T](env: Map[String, Seq[String]], key: String, conv: JsValue => T): Option[T] = {
        env.get(key).flatMap(_.headOption).map(Json.parse).map(conv)
    }

    case class Charsets(to: Option[String], html: Option[String], subject: Option[String], from: Option[String], text: Option[String])

    object Charsets {
        implicit val charsetReads = Json.format[Charsets]
    }

but that wont do since everything might have the wrong encoding set by the parser. I have looked at the parser parse.multipartFormData
and I'm not sure where to start since I want the extraction of the charset object to happen first, then I would like to instead of: 
Iteratee.consume[Array[Byte]]().map(bytes => DataPart(partName, new String(bytes, "utf-8")))

do

Iteratee.consume[Array[Byte]]().map(bytes => DataPart(partName, new String(bytes, charsets.getEncodingFor(partName)))

or maybe 

Iteratee.consume[Array[Byte]]().map(bytes => CustomDataPart(partName, bytes))
and then in my controller work with the byte array. 

The filepart i don't care about right now.
Do you have any suggestion or code example of how I should achieve this?

Kind regards,

Jakob

jakob dobrzynski

unread,
May 23, 2013, 1:35:47 PM5/23/13
to play-fr...@googlegroups.com
Bump =)
Reply all
Reply to author
Forward
0 new messages