How do you deal with encodings in Treetop and Ruby 1.9?

60 views
Skip to first unread message

Nikolai Weibull

unread,
Nov 4, 2011, 7:10:43 AM11/4/11
to treet...@googlegroups.com
Hi!

How should I deal with encodings in Treetop and Ruby 1.9? Treetop
doesn’t currently write encoding comments, for example

# -*- coding: utf-8 -*-

to the generated .rb files, so if I use UTF-8 strings in my literals,
Ruby may generate “invalid multibyte char (US-ASCII)” warnings when
run in an environment that doesn’t default to UTF-8 (or however this
works in Ruby 1.9.3).

I suggest that Treetop should look for encoding comments in .treetop
files and, if found, copy them to the output. I guess this would also
force Treetop to have to deal with the encoding of the output.

As this would require some substantial work on Treetop, is there a
simpler way to get this working?

Dmitry Mozzherin

unread,
Nov 4, 2011, 11:18:36 AM11/4/11
to treet...@googlegroups.com
#Autogenerated comment generated by rules->ruby compiler prevents for
me to set UTF-8 encoding in ruby 1.9. I did cope with it with a rake
task. Would be nice to remove the #Autogenerated comment, or move it
after # encoding: UTF-8

Rake task:

task :tt do
['scientific_name_clean', 'scientific_name_dirty',
'scientific_name_canonical'].each do |f|
file = "#{dir}/lib/biodiversity/parser/#{f}"
FileUtils.rm("#{file}.rb") if FileTest.exist?("#{file}.rb")
system("tt #{file}.treetop")
rf = "#{file}.rb"
rfn = open(rf + ".tmp", 'w')
skip_head = false
f = open(rf)
#getting around a bug in treetop which prevents setting UTF-8
encoding in ruby19
f.each_with_index do |l, i|
skip_head = l.match(/^# Autogenerated/) if i == 0
if skip_head && (l.strip == '' || l.match(/^# Autogenerated/))
next
else
skip_head = false
rfn.write(l)
end
end
rfn.close
f.close
`mv #{rf}.tmp #{rf}`
end
end

Dmitry

> --
> You received this message because you are subscribed to the Google Groups "Treetop Development" group.
> To post to this group, send email to treet...@googlegroups.com.
> To unsubscribe from this group, send email to treetop-dev...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/treetop-dev?hl=en.
>
>

Nikolai Weibull

unread,
Nov 11, 2011, 4:26:21 PM11/11/11
to treet...@googlegroups.com

Yeah, that’s what I’m doing as well, although not as contrived:

rule '.rb' => ['.treetop'] do |t|
require 'treetop' unless defined? Treetop
when_writing 'tt %s' % t.source do
puts 'tt %s' % t.source if verbose
Treetop::Compiler::GrammarCompiler.new.compile(t.source, t.name)
contents = File.read(t.name)
File.open(t.name, 'wb') do |f|
f.puts("# -*- coding: utf-8 -*-", contents)
end
end
end

Reply all
Reply to author
Forward
0 new messages