x/text/runes: How can i replace LF by CRLF ?

882 views
Skip to first unread message

mhhcbon

unread,
Jun 24, 2016, 4:54:35 PM6/24/16
to golang-nuts
Hi,

I have a small func like this


func WriteAsWindows1252 (src string, dst string) error {
  bSrc
, err := ioutil.ReadFile(src)
 
if err != nil {
     
return err
 
}

  bDst
:= make([]byte, len(bSrc)*2)
  replaceNonAscii
:= runes.Map(func(r rune) rune {
       
if r > unicode.MaxASCII {
           
return rune('?')
       
}
       
return r
 
})
  transformer
:= transform.Chain(replaceNonAscii, charmap.Windows1252.NewEncoder())
  _
, _, err = transformer.Transform(bDst, bSrc, true)
 
if err != nil {
     
return err
 
}

 
return ioutil.WriteFile(dst, bDst, 0644)
}

I would like to add a new replacement of \n to \r\n.

I don't see how i can do that as rune can take only \r or \n but not both. And runes.Map take a function which returns a unique rune. If i don t mistake.

Is there a way to achieve this with Chain ? Or i got to go with a []byte.Replace https://golang.org/pkg/bytes/#Replace ?

BTW, is it the correct way to encode an utf-8 file to windows1252 ?

thanks!

mhhcbon

unread,
Jun 24, 2016, 5:00:20 PM6/24/16
to golang-nuts
I forgot to mention another difficulty i have using replacement.

As it will receive only one rune at a time in
runes.Map(func(r rune) rune {})

If the file already contains \r\n, i guess i will be doubling the \r, resulting in an ugly \r\r\n

Any ideas ?

Matt Harden

unread,
Jun 25, 2016, 11:46:59 AM6/25/16
to mhhcbon, golang-nuts
Don't use x/text/runes for this. It's overkill.

import "strings"
...
strings.NewReplacer("\r\n", "\r\n", "\r", "\r\n").Replace(mystring)


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Matt Harden

unread,
Jun 25, 2016, 11:47:41 AM6/25/16
to mhhcbon, golang-nuts
Sorry, make that

strings.NewReplacer("\r\n", "\r\n", "\n", "\r\n").Replace(mystring)

mhhcbon

unread,
Jun 25, 2016, 2:58:01 PM6/25/16
to golang-nuts, cpasmabo...@gmail.com
Thanks ! Works for me. I m looking forward to implement text stream processing. Next time maybe !

mp...@golang.org

unread,
Jun 28, 2016, 4:42:09 AM6/28/16
to Matt Harden, mhhcbon, golang-nuts
This does not work in cases where someone want to use a Transformer in a streaming context (e.g. to create a Reader or Writer or to include the Transform in a Chain).

It may we useful to add something like strings.Replacer in runes as well.

Alternatively, I have once implemented a generic rewriter for package runes that makes it much easier to create a Transform for arbitrary rewrites than writing one from scratch.  It is a bit more involved than a Replacer. It was deemed a bit out of place in the runes package, so it was never submitted. I could add it to my github repo, though, if there is any interest.

Tamás Gulácsi

unread,
Jun 28, 2016, 5:54:25 AM6/28/16
to golang-nuts
I've used bufio.Scanner to implement a custom transforming stream reader.

mhhcbon

unread,
Jun 28, 2016, 6:36:10 AM6/28/16
to golang-nuts
> > This does not work in cases where someone want to use a Transformer in a streaming context (e.g. to create a Reader or Writer or to include the Transform in a Chain).

This really is what i was looking to implement, somehow,

src -> transform -> transform -> sink


I've used bufio.Scanner to implement a custom transforming stream reader.

Indeed, that is a step forward for a much better implementation than previous solution. thanks!

Is there any formalized stream transform like apis in go that i missed ? Something like another language implements :x

mp...@golang.org

unread,
Jun 28, 2016, 6:54:37 AM6/28/16
to mhhcbon, golang-nuts
On Tue, Jun 28, 2016 at 12:36 PM, mhhcbon <cpasmabo...@gmail.com> wrote:
> > This does not work in cases where someone want to use a Transformer in a streaming context (e.g. to create a Reader or Writer or to include the Transform in a Chain).

This really is what i was looking to implement, somehow,

src -> transform -> transform -> sink


I've used bufio.Scanner to implement a custom transforming stream reader.

Indeed, that is a step forward for a much better implementation than previous solution. thanks!

Is there any formalized stream transform like apis in go that i missed ? Something like another language implements :x
Are you referring to something like streams in NodeJS?

The equivalent of this in Go would be io.Reader and io.Writer and friends.  Transformers in text are lower-level and allow for easier to implement, but above all, more efficient implementations of transforms. For text the latter is often quite important. 

Once you created a transform using Chain, you can convert it to a Reader or Writer, for instance, using transform.Reader or transform.Writer.

BTW, regarding your original problem, it is often more desirable to replace non-ASCII by encoding.ASCIISub (U+001a). This is the default behavior of the charmap.Windows1252 encoder.  If you want to use "?" instead, it may be better to replace U+001a with '?' instead of just replacing non-ASCII.








Le mardi 28 juin 2016 11:54:25 UTC+2, Tamás Gulácsi a écrit :
I've used bufio.Scanner to implement a custom transforming stream reader.

--

mhhcbon

unread,
Jun 28, 2016, 8:37:35 AM6/28/16
to golang-nuts, cpasmabo...@gmail.com
Yes i was referring to node.

Really just to illustrate, (don't slap : ) this is what i had in head when i started to dig this problem,

fs.createReadStream("some.file")
.on('error', console.error.bind(console))
.pipe(split(/\r?\n/))
.pipe(through(function (byteData, enc, next) {
 
next(null, byteData.toString().replace(/[^\x00-\x7F]/g, "?"))
}))
.pipe(through(function (byteData, enc, next) {
 
next(null, byteData.toString() + "\r\n")
}))
.pipe(iconv.encodeStream('win1252'))
.on('error', console.error.bind(console))
.pipe(fs.createWriteStream('file-in-win1252.txt'))
.on('error', console.error.bind(console))

as a simple developer, i worry much less about the size of the source, the simple implementation helps me to reduce errors, the job is done, and this is standard.
Now, IRL, this is not practicable without an helper like missisippi.pipeline and this will work fine only if some assumptions are met, and so on.

Better is preferable to worse when perfection is out of your scope, i guess.



> The equivalent of this in Go would be io.Reader and io.Writer and friends.  Transformers in text are lower-level and allow for easier to implement, but above all, more efficient implementations of transforms.
> For text the latter is often quite important. 

> Once you created a transform using Chain, you can convert it to a Reader or Writer, for instance, using transform.Reader or transform.Writer.

> BTW, regarding your original problem, it is often more desirable to replace non-ASCII by encoding.ASCIISub (U+001a). This is the default behavior of the charmap.Windows1252 encoder.  If you want to use "?" instead, it may be better to replace U+001a with '?' instead of just replacing non-ASCII.

Again, thanks!
I ll do more research based on all those hints.

Tong Sun

unread,
Jun 28, 2016, 12:11:54 PM6/28/16
to golang-nuts, cpasmabo...@gmail.com
I'm looking for something like that simple interface too. Please let me know if you have come up with something similar or found one. Thx. 

Tamás Gulácsi

unread,
Jun 28, 2016, 3:51:19 PM6/28/16
to golang-nuts
Don't forget io.Pipe: an easy way to transform a Reader to a Writer, or vice versa, if that makes it easier.
For example read in a loop into a big, replace as you wish, and write the result to a pipe. And return the reader pair of the pipe.
This way you don't have to account the reader's buffering.

mhhcbon

unread,
Jul 1, 2016, 4:13:46 PM7/1/16
to golang-nuts
Hi,

thanks for the tip.

But still i feel like there s a kind of mental gymnastic to apply which remains unfriendly

package main

import (
 
"os"
 
"io"
 
"compress/gzip"
)

func main
() {
  pr
, pw := io.Pipe()
  go func
() {
    decoder
, _ := gzip.NewReader(pr)
    io
.Copy(os.Stdout, decoder)
 
}()
  archiver
:= gzip.NewWriter(pw)
  defer archiver
.Close()
  io
.Copy(archiver, os.Stdin)
}

Here i start with a through, then i go func it to a sink, next i create a transform and finally copy src to the preceding transform.
I feel like i m playing the mega mumble jumble in and out of the stream. I would not like to have too much transforms : (

just pocking around this, i d like very much to see something like this from the core,

I honestly admit i have no real understanding of what i m doing (1 day maybe i ll get it)

package main

import (
 
"fmt"
 
"os"
 
// "bytes"
 
// "encoding/base64"
 
"github.com/mh-cbon/stream/stream3"
)

func main
() {
 
// src, _ := os.Open("main.go")
  stdin
:= os.Stdin
  s
:= stream.CreateStreamReader(stdin)
 
// s.Pipe(stream.B64Encode(base64.StdEncoding)).Pipe(stream.CreateStreamWriter(os.Stdout))
  s
.Pipe(stream.GzipEncode()).Pipe(stream.GzipDecode()).Pipe(stream.CreateStreamWriter(os.Stdout))
  err
:= s.Consume()
 
if err!=nil {
    fmt
.Println(err)
 
}
  stdin
.Close()
}

I can only figure out it works on the other end :o

But its way more readable, to me, src -> transform -> transform -> sink

Tong Sun

unread,
Jul 1, 2016, 4:36:42 PM7/1/16
to mhhcbon, golang-nuts

On Fri, Jul 1, 2016 at 4:13 PM, mhhcbon wrote:
I honestly admit i have no real understanding of what i m doing ...
 
I can only figure out it works on the other end :o

But its way more readable, to me, src -> transform -> transform -> sink

Then maybe check this one out?

You can do 

shaper.NewFilter().ApplyToUpper().ApplyToLower().ApplyReplace("test", "biscuit", -1).ApplyRegexpReplaceAll("(?i)ht(ml)", "X$1").ApplyTrim()...

and on and on, and you can build you own transform on top of that as well. 

mhhcbon

unread,
Jul 1, 2016, 4:39:57 PM7/1/16
to golang-nuts
Pretty cool. Why string only ?


Le vendredi 24 juin 2016 22:54:35 UTC+2, mhhcbon a écrit :

Tong Sun

unread,
Jul 1, 2016, 4:48:55 PM7/1/16
to mhhcbon, golang-nuts
I don't know whether you are replying to me from your following context. But assuming so, 

The reason it's string only is that I'm dealing with string only at the moment, and you can see from the source that the code is not mine, and I don't think I'm qualified enough to extend the functionality to bytes yet.

--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/rSHo0N7yDeQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.

Tong Sun

unread,
Jul 1, 2016, 4:53:52 PM7/1/16
to mhhcbon, golang-nuts
Adding Unix2Dos() and Dos2Unix() to it is real easy, and I'd happy to take the patch. Let me know if you want but unable to do it. 

mhhcbon

unread,
Jul 1, 2016, 5:08:03 PM7/1/16
to golang-nuts, cpasmabo...@gmail.com
Thing is, now i read at it twice, it s not writable by essence.
It takes an input as a whole and transform it using a pretty nice api.

To take an example about why it s not suitable for me, two days ago i was parsing git logs, which is a text stream.
Ideally, i do not want to load all that text into memory, then apply a big regexp onto it, simply because it maybe very huge.
I d prefer to read it by chunks, then transform the data as they come, maintaining only a little piece of information which tells about my state in the parsing operation.

I posted a gist of my terrible code

https://gist.github.com/mh-cbon/765e80126a12357f37889db15ed7d3a5

Let s see.

Tong Sun

unread,
Jul 1, 2016, 5:18:03 PM7/1/16
to golang-nuts
Actually that's how I'm using it as well. 

I use it to deal with huge xml files and transform only those pieces that I'm interested in. Anyway, I'm not expecting that one can fit all, but instead, just sharing the idea how transforming can be chained together in one particular way. 

Happy hacking. 

--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/rSHo0N7yDeQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages