How should I deal with encodings in Treetop and Ruby 1.9? Treetop
doesn’t currently write encoding comments, for example
# -*- coding: utf-8 -*-
to the generated .rb files, so if I use UTF-8 strings in my literals,
Ruby may generate “invalid multibyte char (US-ASCII)” warnings when
run in an environment that doesn’t default to UTF-8 (or however this
works in Ruby 1.9.3).
I suggest that Treetop should look for encoding comments in .treetop
files and, if found, copy them to the output. I guess this would also
force Treetop to have to deal with the encoding of the output.
As this would require some substantial work on Treetop, is there a
simpler way to get this working?
Rake task:
task :tt do
['scientific_name_clean', 'scientific_name_dirty',
'scientific_name_canonical'].each do |f|
file = "#{dir}/lib/biodiversity/parser/#{f}"
FileUtils.rm("#{file}.rb") if FileTest.exist?("#{file}.rb")
system("tt #{file}.treetop")
rf = "#{file}.rb"
rfn = open(rf + ".tmp", 'w')
skip_head = false
f = open(rf)
#getting around a bug in treetop which prevents setting UTF-8
encoding in ruby19
f.each_with_index do |l, i|
skip_head = l.match(/^# Autogenerated/) if i == 0
if skip_head && (l.strip == '' || l.match(/^# Autogenerated/))
next
else
skip_head = false
rfn.write(l)
end
end
rfn.close
f.close
`mv #{rf}.tmp #{rf}`
end
end
Dmitry
> --
> You received this message because you are subscribed to the Google Groups "Treetop Development" group.
> To post to this group, send email to treet...@googlegroups.com.
> To unsubscribe from this group, send email to treetop-dev...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/treetop-dev?hl=en.
>
>
Yeah, that’s what I’m doing as well, although not as contrived:
rule '.rb' => ['.treetop'] do |t|
require 'treetop' unless defined? Treetop
when_writing 'tt %s' % t.source do
puts 'tt %s' % t.source if verbose
Treetop::Compiler::GrammarCompiler.new.compile(t.source, t.name)
contents = File.read(t.name)
File.open(t.name, 'wb') do |f|
f.puts("# -*- coding: utf-8 -*-", contents)
end
end
end