How can I write a string into Redis such that a Java app might see its contents as a primitive string?

1,707 views
Skip to first unread message

gingers...@gmail.com

unread,
Jul 8, 2015, 4:35:49 PM7/8/15
to clo...@googlegroups.com

I am not sure if this is a Clojure question or a Java question. I don't know Java, so I could use whatever help folks can offer.

"primitive string" here means what I can write when I am at the terminal.

We have 2 apps, one in Clojure, one in Java. They talk to each other via Redis. I know the Java app can read stuff out of Redis, using our "transaction-id", if I use the terminal and open up "redis-clj" and write a string directly from the terminal. But I have this Clojure code, which depends on Peter Taoussanis's Carmine library:
(defn worker [document]
  {:pre [(string? (:transaction-id document))]}
  (let [transaction-id  (:transaction-id document)
        document-as-string (str "{'transaction-id' : '" transaction-id "', 'debrief' : '" (:debrief document) "'}" )
        redis-connection {:pool {} :spec {:host "127.0.0.1" :port 6379 }}]
    (timbre/log :trace " message we will send to NLP  " document-as-string)
    (carmine/wcar redis-connection (carmine/set transaction-id document))
    (loop [document-in-redis (carmine/wcar redis-connection (carmine/get transaction-id))]

      (if-not (.contains (first document-in-redis) "processed")
        (recur (carmine/wcar redis-connection (carmine/get transaction-id)))
        (do
          (carmine/wcar redis-connection (carmine/del transaction-id))
          document-in-redis)))))
 

This line in particular, I have tried doing this several ways:

        document-as-string (str "{'transaction-id' : '" transaction-id "', 'debrief' : '" (:debrief document) "'}" )

In Redis, I expect to see:

{'transaction-id' : '42e574e7-3b80-424a-b9ff-01072f1e0358', 'debrief' : 'Smeek Hallie of Withers, Smeg, Harrington and Norvig responded to our proposal and said his company is read to move forward. The rate of $400 per ton of shredded paper was acceptable to them, and they shred about 2 tons of documents every month. $96,000 in potential revenue annually. I will meet with him tomorrow and we will sign the contract.'}

But if I then launch redis-cli, I see:


127.0.0.1:6379> keys *
1) "42e574e7-3b80-424a-b9ff-01072f1e0358"

127.0.0.1:6379> get "42e574e7-3b80-424a-b9ff-01072f1e0358"
"\x00>NPY\b\x00\x00\x01\xfc\xf1\xfe\x1b\x00\x00\x00\nj\nip-addressi\x0e165.254.84.238j\x05tokeni$46b87d64-cff3-4b8b-895c-e089ac59544dj\x0bapi-versioni\x02v1j\x0etransaction-idi$42e574e7-3b80-424a-b9ff-01072f1e0358j\adebrief\r\x00\x00\x01YSmeek Hallie of Withers, Smeg, Harrington and Norvig responded to our proposal and said his company is rea-\x00\xf1\x06move forward. The raty\x00\xf0\x0c$400 per ton of shredded pa\x16\x00\xf1\bwas acceptable to them,q\x00Bthey0\x00\x80 about 2E\x00\x10sF\x00\xf1Ldocuments every month. $96,000 in potential revenue annually. I will meet with him tomorrow{\x00\"we#\x00@sign\x92\x00\xa0 contract."


I don't know what all of those extra characters are. The Java app is not picking this item up, so I assume the Java app is not seeing this as a string. I expected this to look the same as if I had written this at the terminal:

{'transaction-id' : '42e574e7-3b80-424a-b9ff-01072f1e0358', 'debrief' : 'Smeek Hallie of Withers, Smeg, Harrington and Norvig responded to our proposal and said his company is read to move forward. The rate of $400 per ton of shredded paper was acceptable to them, and they shred about 2 tons of documents every month. $96,000 in potential revenue annually. I will meet with him tomorrow and we will sign the contract.'}

I assume it is easy to get a string into a format that can be understood by both a Clojure app and a Java app. I don't care what format that is, but it needs to be consistent.

Can anyone make suggestions about what I can do to make sure the Clojure app and the Java app both write to Redis using a format that the other will understand? In particular, both apps need to see the "'transaction-id".







gingers...@gmail.com

unread,
Jul 8, 2015, 4:44:16 PM7/8/15
to clo...@googlegroups.com
Ah, I just saw this, which might help me:

https://github.com/ptaoussanis/carmine/issues/83

gingers...@gmail.com

unread,
Jul 8, 2015, 5:10:05 PM7/8/15
to clo...@googlegroups.com

For anyone else like me, who has learned Clojure but knows nothing about Java, you need to convert to a byte array and then use carmine/raw to write to Redis. So in a let statement I have something like this:

    document-as-string (str "{\"transaction-id\" : \"" transaction-id "\", \"debrief\" : \"" (:debrief document) "\"}" )
    document-as-byte-array (bytes (byte-array (map (comp byte int) document-as-string)))

and then I:

    (carmine/wcar redis-connection (carmine/set transaction-id (carmine/raw document-as-byte-array)))

Francis Avila

unread,
Jul 8, 2015, 5:15:35 PM7/8/15
to clo...@googlegroups.com
You are running into Carmine's automatic nippy serialization. https://github.com/ptaoussanis/carmine#serialization

Redis only stores byte arrays (what it calls "strings"). Carmine uses the nippy library (the meaning of "NPY" in your byte stream) to represent rich types compactly as bytes. https://github.com/ptaoussanis/nippy

If you give Carmine a byte array to store, it will store it directly without nippy-encoding it. E.g. (.getBytes "{}" "UTF-8")

BTW your document-as-string example is extremely unsafe: how will you reliably read this message out again? e.g. what if the 'debrief' string contains a single quote? Use a proper serialization format.

So the key is to have both your Clojure and Java app store *bytes* in Redis using the same serialization. You can store anything you want (nippy, utf-8-encoded json, fressian, bson, utf-8 xml, utf-16 java strings, whatever) as long as it's bytes and it's read and written the same way in all your apps.

The Redis library your Java app is using may have its own automatic de/serialization, too. You need to find out what it's doing and either work with this or turn it off, just like with Carmine.

Nippy unfortunately does not have a Java API out of the box:  https://github.com/ptaoussanis/nippy/issues/66

gingers...@gmail.com

unread,
Jul 8, 2015, 5:38:20 PM7/8/15
to clo...@googlegroups.com
Francis Avila,

Thank you for your response. The Java app is using Jedis and the Clojure app is using Carmine. I'm wondering if you can suggest what you think would be the easiest way to allow these 2 apps to understand each other's strings?

You were correct about how unsafe the above code was. I tested it for less than 15 minutes and ran into the fact that a \n newline made a mess of everything.

gingers...@gmail.com

unread,
Jul 8, 2015, 6:31:27 PM7/8/15
to clo...@googlegroups.com
And I have another stupid question. Using the above code, I am sometimes getting strings in Redis that have escaped quotation marks, like this:

" \"transaction-id\" : \" 1ec47c2e-21ee-427c-841c-80a0f89f55d7 \"  \"debrief\" :  \" Susan Hilly at Citi called to get a quotation for discounted weekly car rental for approximately 3 cars per week, or 150 rentals annually. \"  "

Why is that happening?

Francis Avila

unread,
Jul 8, 2015, 8:58:53 PM7/8/15
to clo...@googlegroups.com
Who is saving these strings, and who is reading them? Do you have complete control over both apps, or does one of them need to be aligned with the other?

If the Java app is the baseline, you need to know the exact details of the format of the data it saves. Just knowing "it's JSON" is not enough, because there may be other data types (e.g. dates) that don't have native JSON representations. (The go-to JSON de/encoder for clojure is cheshire: https://github.com/dakrone/cheshire )

I took a quick look at Jedis and the easy, default way of using it is with strings. It will encode strings to UTF-8 before sending to Redis and decode from UTF-8 on read. You can set raw byte arrays too (which will not be altered in any way before sending), but it's not clear to me how it can read out raw byte arrays. (I'm sure there's a way, but it's not immediately obvious.)

As for the escaped quotes, you may be using pr or prn to print, or maybe you are using pr-str to produce the string representation. I can't be sure. 

gingers...@gmail.com

unread,
Jul 8, 2015, 11:02:22 PM7/8/15
to clo...@googlegroups.com

Thank you. Yes, we have complete control over both apps.

gingers...@gmail.com

unread,
Jul 9, 2015, 11:41:48 AM7/9/15
to clo...@googlegroups.com

> As for the escaped quotes, you may be using pr or prn to print, or maybe you are
> using pr-str to produce the string representation. I can't be sure.

At the moment I create the string like this:

        document-as-string (str "{\"transaction-id\" : \"" transaction-id "\", \"message\" : \"" message "\"}")

        document-as-byte-array (bytes (byte-array (map (comp byte int) document-as-string)))

The crazy thing is that this sometimes works, but other times the quote marks appear in Redis as escaped quote marks. As near as I can tell, the important factor is the length of the string. A short string is likely to have its quote marks escaped. A long string does not have its quote marks escaped.

My co-worker is working on the Java app. I am working on the Clojure app. We can both adjust out apps freely, just so long as we can get data to and from each other, in a manner that allows us to eventually cast the data to and from JSON.

Any suggestions are welcome.

Francis Avila

unread,
Jul 9, 2015, 12:01:51 PM7/9/15
to clo...@googlegroups.com
your document-as-byte-array is wrong--it only handles ascii, and very inefficiently too.

Just say (.getBytes document-as-string "UTF-8") to get a utf-8 encoding of the string as a byte array.

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to a topic in the Google Groups "Clojure" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/clojure/9swfBxAbT90/unsubscribe.
To unsubscribe from this group and all its topics, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

gingers...@gmail.com

unread,
Jul 9, 2015, 5:18:51 PM7/9/15
to clo...@googlegroups.com
I am sorry, I should have explained this sooner. When I do this:

(.getBytes document-as-string "UTF-8")

The quote marks are always escaped. When I do this:

        document-as-byte-array (bytes (byte-array (map (comp byte int) document-as-string)))

The quote marks are sometimes escaped, but most of the time they are not.

I need to avoid having those quote marks escaped. I am not clear what is causing the escaping.

Francis Avila

unread,
Jul 9, 2015, 7:52:54 PM7/9/15
to clo...@googlegroups.com
This is what I am doing with Carmine and it seems to work properly.

First some setup in the repl:

user=>(require '[taoensso.carmine :as car])
nil
user=> (def conn {:pool {} :spec {}})
#'user/conn
user=> (defn raw-str-set [k ^String v] (car/set k (car/raw (.getBytes v "UTF-8"))))
#'user/raw-str-set
user=> (defn raw-str-get [k] (car/parse (car/get k) #(String. ^bytes % "UTF-8")))
#'user/raw-str-get


Now I save a test string:

user=> (car/wcar conn (raw-str-set "test" "a\"b"))
"OK"

In a redis-cli terminal:

$ redis-cli
127.0.0.1:6379> get "test"
"a\"b"

The escaped \" you see in redis-cli is just for printing. The actual bytes stored at the "test" key are 0x61 0x22 0x62

Now lets pull this out of redis with Carmine:


user=> (car/wcar conn (raw-str-get "test"))
"a\"b"
user=> (count *1)
3

As you can see, we round-tripped without any strange extra escaping.

Probably your java app can just use Jedis client.get("test") and client.set("test","a-string"), since I think it does UTF-8 encode and decode of strings and no other serialization.


You still need to use JSON on top of this so you can get richer data structures instead of just strings.

gingers...@gmail.com

unread,
Jul 10, 2015, 6:04:01 PM7/10/15
to clo...@googlegroups.com

Hmm, well, I am grateful to you for running such a detailed test, and I should have tested this myself:

>The actual bytes stored at the "test" key are 0x61 0x22 0x62

However, something in our code fails when the quote marks are escaped, but everything works the way we expect when the quote marks do not appear to be escaped. I amazed by this:

user=> (count *1)
3

Perhaps the error is something entirely different from what I was assuming.

gingers...@gmail.com

unread,
Jul 15, 2015, 10:16:57 AM7/15/15
to clo...@googlegroups.com

I have not yet found a solution for this problem. If I do this:

(.getBytes v "UTF-8")

Then in Redis the quotes are escaped 100% of the time. What is the standard way to handle this? I do see this on StackOverflow:

http://stackoverflow.com/questions/12423071/how-to-remove-escape-characters-from-a-string-in-java

which suggests:

String noSlashes = input.replace("\\", "");
 
Would this be considered a hack, or would this be the normal way to do this?





Francis Avila

unread,
Jul 15, 2015, 12:07:52 PM7/15/15
to clo...@googlegroups.com
You are either encoding something incorrectly at some layer of your code (or the Java app's code), or you are misinterpreting what is printed as actual escaping when it actually is not escaped. Having to escape or unescape strings *is not normal* and is a symptom of a bigger problem.

As I demonstrated in my earlier post, redis-cli will PRINT \", but that is only for PRINTING.  The stored byte is actually just the quote. If you had an escaped string, redis would print "\\\"". Example:

127.0.0.1:6379> set test "\""
OK
127.0.0.1:6379> strlen test
(integer) 1
127.0.0.1:6379> get test
"\""
127.0.0.1:6379> set test "\\\""
OK
127.0.0.1:6379> strlen test
(integer) 2
127.0.0.1:6379> get test
"\\\""


You do need to take special care to bypass carmine's nippy encoding. That is what my raw-str-get and raw-str-set functions are doing. In fact, that code is a complete solution to round-tripping strings through redis! Are you saying you are getting different results?

And I don't know for sure what Jedis is doing because I've only read its code, not actually tried it. Perhaps it is to blame.


This is how you troubleshoot using the test string " (One-character string containing a double-quote). Repeat these steps on the Clojure app and the Java app.
  1. In app, verify the INPUT string is  " (length 1)
  2. In app, verify the UTF-8 bytes is length 1
  3. In app, SET (bytes or string, depending on interface) into redis at known key.
  4. In redis-cli, GET the key. You should see "\""
  5. In redis-cli STRLEN the key. It should return (integer) 1.
  6. In app, GET the key.
    1. If you get bytes, it should be length 1 and (= (String. byte-val "UTF-8") "\"")
    2. If you get a string, it should be (= (.length str-val) 1) and (= str-val "\""))
If any of these steps is broken, you know that you did something wrong in the previous step.

gingers...@gmail.com

unread,
Jul 15, 2015, 12:43:08 PM7/15/15
to clo...@googlegroups.com
Sorry, I am stupid. I misunderstood the problem entirely. Escaped quote marks are never really the problem. The real issue is that this works for us:

{ "money" : "100000", "contact_name" : "Martha Vena" }

and this doesn't work for us:

"{ \"money\" : \"100000\", \"contact_name\" : \"Martha Vena\" }"

but in the second case the whole thing is a string. In the first case I suppose the structure is a hashmap. I'll look at Carmine to see if it can help me get the hashmap structure consistently.



On Friday, July 10, 2015 at 6:04:01 PM UTC-4, gingers...@gmail.com wrote:

Francis Avila

unread,
Jul 15, 2015, 4:03:33 PM7/15/15
to clo...@googlegroups.com
As I said before, you need another layer of encoding/decoding, e.g. JSON. (Unless you plan on using Redis hash maps directly? They only support byte keys and values.)
Your example looks like Javascript or Python, but I don't know where you got it. Neither Clojure nor Java will print a map-like native structure in that format! I'll assume you are dealing with JSON.

To write: native data structure -> JSON string -> UTF-8 bytes -> Redis SET
To read: Redis GET -> UTF-8 bytes -> JSON string -> native data structure.

Example with carmine in Clojure:

user=> (require '[taoensso.carmine :as car] '[cheshire.core :refer [generate-string parse-string]])
nil
user
=> (defn json-set [k v] (car/set k (car/raw (.getBytes (generate-string v) "UTF-8"))))
#'user/json-set
user
=> (defn json-get [k] (car/parse #(parse-string %) (car/get k)))
#'user/json-get

user
=> (def conn {:pool {} :spec {}})
#'user/conn

user
=> (def test-data { "money" "100000", "contact_name" "Martha Vena"})
#'user/test-data
user
=> (car/wcar conn (json-set "test" test-data))
"OK"
user
=> (car/wcar conn (json-get "test"))
{"contact_name" "Martha Vena", "money" "100000"}

redis-clj:

127.0.0.1:6379> get test
"{\"contact_name\":\"Martha Vena\",\"money\":\"100000\"}"

Here's an example with a smiley face to test the UTF-8 encoding:

user=> (car/wcar conn (json-set "test2" ["☺"]))
"OK"
user
=> (car/wcar conn (json-get "test2"))
["☺"]
user
=> (count (first *1))
1

In redis-cli:

127.0.0.1:6379> get test2
"[\"\xe2\x98\xba\"]"
Reply all
Reply to author
Forward
0 new messages