Avoid interaction to save as binary

2,833 views
Skip to first unread message

Paco Guzmán

unread,
Mar 25, 2012, 9:32:49 AM3/25/12
to VCR Rubygem
I want to know if it's possible to avoid that the response body and
the headers were saving as binary, I want to read that files

Thanks in advance

Myron Marston

unread,
Mar 25, 2012, 10:54:11 AM3/25/12
to VCR Rubygem
First off, some context about what exactly you're seeing (i.e. a link
to a gist or pastie containing the cassette, and description of what
versions of ruby and VCR you're using, what HTTP library you're using,
what your VCR config looks like, etc) would be helpful.

I do happen to know that psych (the new YAML engine) in MRI 1.9.3-p125
now dumps all ASCII-8BIT strings as binary, as this gist demonstrates:

https://gist.github.com/2196661

There's nothing VCR can do about this if you choose to use psych as
your serializer. Really, the HTTP libraries should be fixed to not
output ASCII-8BIT strings, but rather to use the encoding given in the
HTTP headers--so you may want to open a ticket with whatever HTTP
library you're using.

You can choose to use a different serializer, though. This is only a
problem with psych on 1.9.3-p125, so you can use :syck or :json
instead.

If you're not using psych on MRI 1.9.3-p125, then you can ignore most
of what I said here, but that just goes to show that you need to
include more context when asking for help. I can't read your mind and
guess what your dev environment or project is like.

Myron

Paco Guzmán

unread,
Apr 8, 2012, 1:20:28 PM4/8/12
to VCR Rubygem
Yes, you're totally right I have to expose my context when asking for
help.

I'd changed the serializer and now vcr use :syck and at least the
headers are readable but not the body. But I think it's the server
that send the response encoded in base64

My vcr config file is this

VCR.configure do |config|
config.hook_into :webmock
config.cassette_library_dir = 'spec/vcr_cassettes'
config.configure_rspec_metadata!
config.preserve_exact_body_bytes { true }
config.default_cassette_options = {
:re_record_interval => 1.month,
:serialize_with => :syck
}
end

And I executing request to github with the octokit get

gem 'octokit'

require 'vcr'
require 'spec'
require 'octokit'

VCR.use_cassette("rails_events") do
Octokit.repository_events("rails/rails")
end

# Using curl you could execute: curl -i https://api.github.com/repos/rails/rails/events

Thanks

Myron Marston

unread,
Apr 8, 2012, 11:15:47 PM4/8/12
to VCR Rubygem
The response body is being saved using base64 encoding because you
have configured VCR to do that. That's what the
`preserve_exact_body_bytes` option does: it tells VCR to use base64
encoding in order to preserve the bytes exactly as-is.

So, if you don't want the body to be base64 encoded, then don't set
the VCR option that tells it to do that.

HTH,
Myron

dan...@heroku.com

unread,
May 31, 2012, 5:47:57 PM5/31/12
to vcr-...@googlegroups.com
> The response body is being saved using base64 encoding because you 
have configured VCR to do that. 

I'm new to using VCR, but I do like it.

Interestingly, I have encountered a case where it seems to me that sometimes VCR is deciding to use base64 coding for responses and other times not.  It's not entirely clear to me how this decision is made in VCR or how I could override it if I'm quite sure that the payload is supposed to be text.  In my case I have clearly *not* configured VCR to preserve the exact byte payload of the response.  Is that a known issue or documented somewhere?  Is there a way I can assert that I want human-readable results and errors otherwise?  I really liked how a very detailed explanation of what I needed to do (add :vcr to rspec blocks) was emitted from rspec when I followed the quite thorough narrative documentation.

Myron Marston

unread,
May 31, 2012, 6:07:09 PM5/31/12
to vcr-...@googlegroups.com
Within VCR, the one and only way VCR will covert the payload to base64 is if you have configured it to do so [1].

However, some serializers may decide to dump strings as base64.  A while back, a change was made to psych [2] that causes it to emit strings tagged as binary (e.g. the encoding) as base64, even if it's just raw ASCII text.  I believe this psych change got incorporated into one of the recent MRI 1.9.3 releases (maybe patch level 125, if memory serves?).

You can try using a different serializer (syck, json, or write your own).

HTH,
Myron

[1] https://github.com/myronmarston/vcr/blob/v2.2.0/lib/vcr/structs.rb#L78-84
[2] https://github.com/tenderlove/psych/commit/c9cd187d5aa8fa6607dd463b5f98a65483ae39ce

dan...@heroku.com

unread,
May 31, 2012, 10:32:11 PM5/31/12
to vcr-...@googlegroups.com
On Thursday, May 31, 2012 3:07:09 PM UTC-7, Myron Marston wrote:
Within VCR, the one and only way VCR will covert the payload to base64 is if you have configured it to do so [1].

However, some serializers may decide to dump strings as base64.  A while back, a change was made to psych [2] that causes it to emit strings tagged as binary (e.g. the encoding) as base64, even if it's just raw ASCII text.  I believe this psych change got incorporated into one of the recent MRI 1.9.3 releases (maybe patch level 125, if memory serves?).


Interesting. I have played with this a bit more and I think it has to do with the encoding of some of the responses from the API, some of which are chunked.  And, as VCR captures it, it is in fact binary when run through a base64 decoder, not text, so my best guess is that it is a compressed chunked response, at least one property not seen in the responses that serialize as text.

RestClient does seem to automatically understand what's going on, though, so this makes it harder to tweak the interaction by hand (I'd have to gin it up from whole cloth and rely on the abstracted equivalence of a non-chunked response with the chunked one), but it does allow VCR to basically complete its primary function.

Jonathan Rochkind

unread,
Jun 4, 2012, 3:27:11 PM6/4/12
to vcr-...@googlegroups.com
I just noticed that for me too, all HTTP _response_ headers are
serialized in the cassette as base64 'binary' too.

Webmock/HTTPClient, ruby 1.9.3

Whether it's ruby 1.9.3 YAML's 'fault', or webmock's or VCR's, it
certainly is annoying. For Webmock/HTTPClient it doesn't seem to
interfere with the proper functioning of VCR, everything works fine, but
it's annoying if you want to look inside the cassette to see what's
going on or tweak something in the recorded response, ordinarily one of
VCR's strong points.

Jonathan Rochkind

unread,
Jun 4, 2012, 3:32:45 PM6/4/12
to vcr-...@googlegroups.com
Hmm, I realized it's _possible_ it also prevents VCR's
fitler_sensitive_data from working, not sure if it's succesfully
filtering from inside a base64 encoded header, but haven't checked yet.
I do have an api key echo'd back inside a response header that needs
to be filtered.

Myron Marston

unread,
Jun 4, 2012, 4:28:57 PM6/4/12
to vcr-...@googlegroups.com
I've put together a gist[1] demonstrating exactly what's going on.  Notice that this gist don't use VCR at all, because this issue has nothing to do with VCR.  In a nutshell, here's what's going on:
  • HTTPClient tags response header keys and values as ASCII-8BIT encoding (which means "binary").
  • Psych serializes strings tagged as ASCII-8BIT using base64 encoding because it assumes it is binary data (since the string itself is telling psych "I'm binary").

Honestly, it seems like a bit of a bug in HTTPClient that response headers are tagged as ASCII-8BIT; I believe according to my prior readings of the HTTP RFCs that only ascii characters are allowed in HTTP headers.  Thus, I believe it would be safe for HTTPClient to set the encoding on headers to "US-ASCII".  That would fix this issue.  If you would like to see this changed, please file an issue with the HTTPClient issue tracker (I've never actually used HTTPClient!).

If you dislike the fact that psych treats ASCII-8BIT strings this way, feel free to file an issue with tenderlove about that.  I personally think it makes sense for psych to treat any string that says that is binary as binary.

As far as dealing with this in VCR, there's a simple way provided to do that: use a different serializer.  I believe that if you use `:serialize_with => :syck` or `:serialize_with => :json` you'll get a human readable cassette.  Alternately, you can easily write your own serializer [2].

Another potential way of dealing with it in VCR: use a `before_record` hook to manually change the encoding of all the strings so that they are set to UTF-8 or whatever you want.  Then psych will not base64 encode them.  However, I recommend you use this option with care...once you go down this route, you are essentially modifying what's being recorded, which means that you're opening yourself up to having false positive tests that pass because of how you modify things but don't pass when run w/o VCR.

Hmm, I realized it's _possible_ it also prevents VCR's fitler_sensitive_data from working, not sure if it's succesfully filtering from inside a base64 encoded header, but haven't checked yet.  I do have an api key echo'd back inside a response header that needs to be filtered.

Nope, VCR's hooks are fine.  The base64 encoding psych is doing happens just before the data is written to disk, and it's decoded automatically by psych when deserializing the data.  Any code that interacts with a VCR::HTTPInteraction object (or VCR::Request or VCR::Response) will be dealing with non-base64 strings.

HTH,

Myron

[1] https://gist.github.com/2870574
[2] https://www.relishapp.com/myronmarston/vcr/v/2-2-0/docs/cassettes/cassette-format#2

pcr...@gmail.com

unread,
Apr 30, 2014, 10:46:38 PM4/30/14
to vcr-...@googlegroups.com, pacog...@gmail.com
I confirm that I fixed a similar issue I was running into (fetching UTF8 content via ActiveResource) using a before_record hook:

  c.before_record do |i|
    i.response.body.force_encoding('UTF-8')
  end

Jonathan Rochkind

unread,
May 5, 2014, 10:22:26 AM5/5/14
to vcr-...@googlegroups.com
At first I thought, wow, this sounds great, I'd love to make my recorded
cassettes human-readable again. It's hugely painful that they are not.

Then I got worried about making the system under test behave differently
than the actual production system. Many times, in my actual systems,
I've encountered bugs related to HTTP response content tagged as
'binary' that should have been tagged as a proper encoding before the
system tried to do something with it requiring that; and/or HTTP
response content containing illegal bytes for the encoding it's supposed
to be in (usually UTF8).

I'm worried this change is going to make the system under testing behave
differently that will hide these bugs or make them reproduce
differently. As they are some of the trickiest bugs to find and debug, I
don't want to risk changing them under test, I need them to reproduce
properly under test.
> --
> You received this message because you are subscribed to the Google
> Groups "VCR Rubygem" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to vcr-ruby+u...@googlegroups.com
> <mailto:vcr-ruby+u...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

ga...@ithouse.lv

unread,
May 28, 2014, 7:51:17 AM5/28/14
to vcr-...@googlegroups.com, pacog...@gmail.com
For me on ruby >= 2.1 String#scrub worked

c.before_record do |i|
  i.response.body.scrub
end
Reply all
Reply to author
Forward
0 new messages