Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

String.strip with UTF-8

3 views
Skip to first unread message

Erik E.

unread,
Jan 12, 2011, 4:28:38 PM1/12/11
to
Hi

I can't strip the leading whitespace (or what at least looks like
whitespace) from a Ruby 1.9.2 string


ruby-1.9.2-p0 :002 > d.entity
=> " United Arab Emirates"
ruby-1.9.2-p0 :003 > d.entity.strip
=> " United Arab Emirates"
ruby-1.9.2-p0 :004 > d.entity.class
=> String
ruby-1.9.2-p0 :005 > d.entity.encoding
=> #<Encoding:UTF-8>
ruby-1.9.2-p0 :006 >

It's inside the Rails 3.0.3 console..

Erik

--
Posted via http://www.ruby-forum.com/.

Peter Vandenabeele

unread,
Jan 12, 2011, 5:50:49 PM1/12/11
to
Erik E. wrote in post #974416:

Hi, I made a fresh install with rvm 1.9.2-p0 and rails 3.0.3
and I cannot reproduce your problem. Maybe you could try to
replay what I did and see if you can still reproduce it ?

Also, to examine that first character in detail, what is the
result when you try this:

009:0> d.entity.bytes.to_a[0..5]
=> [32, 85, 110, 105, 116, 101]

I see a "regular" space (character 32 in decimal notation)
as first character.

HTH,

Peter


peterv@ASUS:~/ra/apps/trials$ rvm install 1.9.2-p0
/home/peterv/.rvm/rubies/ruby-1.9.2-p0, this may take a while depending
on your cpu(s)...

ruby-1.9.2-p0 - #fetching
..
Install of ruby-1.9.2-p0 - #complete

peterv@ASUS:~/ra/apps/trials$ rvm use 1.9.2-p0
Using /home/peterv/.rvm/gems/ruby-1.9.2-p0

peterv@ASUS:~/ra/apps/trials$ rvm gemset create rails3
'rails3' gemset created (/home/peterv/.rvm/gems/ruby-1.9.2-p0@rails3).

peterv@ASUS:~/ra/apps/trials$ rvm gemset use rails3
Now using gemset 'rails3'

peterv@ASUS:~/ra/apps/trials$ gem install rails --no-rdoc --no-ri
Successfully installed activesupport-3.0.3
Successfully installed builder-2.1.2
Successfully installed i18n-0.5.0
Successfully installed activemodel-3.0.3
Successfully installed rack-1.2.1
Successfully installed rack-test-0.5.7
Successfully installed rack-mount-0.6.13
Successfully installed tzinfo-0.3.23
Successfully installed abstract-1.0.0
Successfully installed erubis-2.6.6
Successfully installed actionpack-3.0.3
Successfully installed arel-2.0.6
Successfully installed activerecord-3.0.3
Successfully installed activeresource-3.0.3
Successfully installed mime-types-1.16
Successfully installed polyglot-0.3.1
Successfully installed treetop-1.4.9
Successfully installed mail-2.2.14
Successfully installed actionmailer-3.0.3
Successfully installed thor-0.14.6
Successfully installed railties-3.0.3
Successfully installed bundler-1.0.7
Successfully installed rails-3.0.3
23 gems installed

peterv@ASUS:~/ra/apps/trials$ rails new issue_with_strip
create
..
create vendor/plugins/.gitkeep
peterv@ASUS:~/ra/apps/trials$ cd issue_with_strip/
peterv@ASUS:~/ra/apps/trials/issue_with_strip$ bundle install
Fetching source index for http://rubygems.org/
Using rake (0.8.7)
..
Using rails (3.0.3)
Installing sqlite3-ruby (1.3.2) with native extensions
Your bundle is complete! Use `bundle show [gemname]` to see where a
bundled gem is installed.

peterv@ASUS:~/ra/apps/trials/issue_with_strip$ rails g model D
entity:string
invoke active_record
create db/migrate/20110112222955_create_ds.rb
create app/models/d.rb
invoke test_unit
create test/unit/d_test.rb
create test/fixtures/ds.yml

peterv@ASUS:~/ra/apps/trials/issue_with_strip$ rake db:migrate
(in /home/peterv/data/back/rails-apps/apps/trials/issue_with_strip)
== CreateDs: migrating
=======================================================
-- create_table(:ds)
-> 0.0010s
== CreateDs: migrated (0.0011s)
==============================================

US:~/ra/apps/trials/issue_with_strip$ rails c
Loading development environment (Rails 3.0.3)
001:0> IRB.prompt_mode=:RVM # this is a local patch
=> :RVM
ruby-1.9.2-p0 :002 > d = D.create :entity => " United Arab Emirates"
=> #<D id: 1, entity: " United Arab Emirates", created_at: "2011-01-12
22:31:21", updated_at: "2011-01-12 22:31:21">
ruby-1.9.2-p0 :003 > d.entity
=> " United Arab Emirates"
ruby-1.9.2-p0 :004 > d.entity.strip
=> "United Arab Emirates"
ruby-1.9.2-p0 :005 > d.entity.class
=> String
ruby-1.9.2-p0 :006 > d.entity.encoding
=> #<Encoding:UTF-8>
ruby-1.9.2-p0 :007 > exit

peterv@ASUS:~/ra/apps/trials/issue_with_strip$ rails c
Loading development environment (Rails 3.0.3)
001:0> d = D.find :last
=> #<D id: 1, entity: " United Arab Emirates", created_at: "2011-01-12
22:31:21", updated_at: "2011-01-12 22:31:21">
002:0> d.entity
=> " United Arab Emirates"
003:0> d.entity.strip
=> "United Arab Emirates"

--
Posted via http://www.ruby-forum.com/.

Erik E.

unread,
Jan 12, 2011, 6:06:15 PM1/12/11
to
Thank you for quick reply David & Peter, I was upgrading Ruby to see if
it made a difference, but I can see it's not a space now which explains
why it didn't strip

Loading development environment (Rails 3.0.3)

ruby-1.9.2-p136 :001 > d = Domain.last
=> #<Domain id: 2055, classification: "Internationalized Country Code
Top Level Domain", dns_name: "xn--mgbaam7a8h", idn_name: "امارات.",
entity: " United Arab Emirates", explanation: "imārāt", notes: nil,
related_id: 1795, idn: true, dnssec: false, created_at: "2011-01-12
19:04:54", updated_at: "2011-01-12 19:04:54">
ruby-1.9.2-p136 :002 > d.entity
=> " United Arab Emirates"
ruby-1.9.2-p136 :003 > d.entity.class
=> String
ruby-1.9.2-p136 :004 > d.entity.encoding
=> #<Encoding:UTF-8>
ruby-1.9.2-p136 :005 > d.entity[0].ord
=> 160
ruby-1.9.2-p136 :006 > d.entity.bytes.to_a
=> [194, 160, 85, 110, 105, 116, 101, 100, 32, 65, 114, 97, 98, 32, 69,
109, 105, 114, 97, 116, 101, 115]


Peter Vandenabeele wrote in post #974440:


>
> Hi, I made a fresh install with rvm 1.9.2-p0 and rails 3.0.3
> and I cannot reproduce your problem. Maybe you could try to
> replay what I did and see if you can still reproduce it ?
>
> Also, to examine that first character in detail, what is the
> result when you try this:
>
> 009:0> d.entity.bytes.to_a[0..5]
> => [32, 85, 110, 105, 116, 101]
>
> I see a "regular" space (character 32 in decimal notation)
> as first character.
>
> HTH,
>
> Peter
>

David Masover

unread,
Jan 12, 2011, 5:35:15 PM1/12/11
to
On Wednesday, January 12, 2011 03:28:38 pm Erik E. wrote:
> Hi
>
> I can't strip the leading whitespace (or what at least looks like
> whitespace) from a Ruby 1.9.2 string
>
>
> ruby-1.9.2-p0 :002 > d.entity
> => " United Arab Emirates"
> ruby-1.9.2-p0 :003 > d.entity.strip
> => " United Arab Emirates"
> ruby-1.9.2-p0 :004 > d.entity.class
> => String
> ruby-1.9.2-p0 :005 > d.entity.encoding
> => #<Encoding:UTF-8>
> ruby-1.9.2-p0 :006 >
>
> It's inside the Rails 3.0.3 console..

Try this:

d.entity[0].ord

I'm not sure how useful that will be, but you can compare it to that of a
space. It _seems_ to be unicode-aware:

ruby-1.9.2-p136 :020 > '☃'.ord
=> 9731
ruby-1.9.2-p136 :021 > _.to_s 16
=> "2603"
ruby-1.9.2-p136 :022 > "\u2603"
=> "☃"

And for good measure:

ruby-1.9.2-p136 :023 > _.ord
=> 9731

(If you're wondering, that underscore means "The result of the last command I
entered into IRB." It's fantastically useful, though it gets annoying when you
want to repeat commands using up arrow, etc.)

So, if you get something other than:

ruby-1.9.2-p136 :024 > ' '.ord
=> 32

...then it's not a space. At that point, maybe report a bug, but maybe you'll
also be able to work around it with a regex or something.

Jonas Pfenniger (zimbatm)

unread,
Jan 12, 2011, 6:59:50 PM1/12/11
to
2011/1/12 Erik E. <erik...@gmail.com>:

> Thank you for quick reply David & Peter, I was upgrading Ruby to see if
> it made a difference, but I can see it's not a space now which explains
> why it didn't strip

Yeah, it's the dreaded non-breaking space [1]. Unfortunately, somebody
thought it would be nice to map Alt+Space to this character on some
keymaps (like mine, which is Swiss-French). If you're on a mac, see my
solution here :
http://0x2a.im/2009/04/16/terminal-unicode-problem-2.html


[1]: https://secure.wikimedia.org/wikipedia/en/wiki/Non-breaking_space

Erik E.

unread,
Jan 12, 2011, 7:43:36 PM1/12/11
to
Cool, thanks for that! I can just gsub/gsub! it out now that I know what
it is.

zimbatm ... wrote in post #974462:

--
Posted via http://www.ruby-forum.com/.

Eric Hodel

unread,
Jan 13, 2011, 1:56:09 PM1/13/11
to
On Jan 12, 2011, at 16:43, Erik E. wrote:

> Cool, thanks for that! I can just gsub/gsub! it out now that I know what
> it is.

That will work if NO-BREAK SPACE is the only space you'll encounter.

s.gsub(/\A[[:space:]]*(.*?)[[:space:]]*\z/) { $1 }

will remove:
Space_Separator | Line_Separator | Paragraph_Separator | 0009 | 000A | 000B | 000C | 000D | 0085

See section 6 of http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt

PS: Note that s.gsub(/…(…)…/, '\1') may alter the encoding of the result string.

0 new messages