substring from a string

9,695 views
Skip to first unread message

Ruslan Mezentsev

unread,
Jun 30, 2011, 12:32:00 AM6/30/11
to golang-nuts
Hi,
I wonder how to get a substring from a string, such as
from 3 to 7 characters:
"testestestest" -> "test"
Thanks.

Evan Shaw

unread,
Jun 30, 2011, 12:35:20 AM6/30/11
to Ruslan Mezentsev, golang-nuts

s1 := "testestestest"
s2 := s1[3:7]

s2 contains "test"

- Evan

Ruslan Mezentsev

unread,
Jun 30, 2011, 12:53:19 AM6/30/11
to golang-nuts
Thanks, but does not work for Russian characters:
package main

import "fmt"

func main() {
str1 := "тестестестест"
str2 := str1[3:7]
fmt.Println(str2)
}

Print:
�с�

On 30 июн, 10:35, Evan Shaw <eds...@gmail.com> wrote:

bflm

unread,
Jun 30, 2011, 1:24:57 AM6/30/11
to golang-nuts
On Jun 30, 6:53 am, Ruslan Mezentsev <rmib.em...@gmail.com> wrote:
> Thanks, but does not work for Russian characters:

func main() {
    str1 := "тестестестест"
    str2 := strint([]int(str1[3:7]))
    fmt.Println(str2)

}

But such mixing of UTF-8 encoding with codepoint access by index is
often probably a wrong way. Either work only with []rune (for random
access), or only with UTF-8 (for sequential access). In the later case
one can also range the UTF-8 to get the runes on by one.

Evan Shaw

unread,
Jun 30, 2011, 1:27:02 AM6/30/11
to Ruslan Mezentsev, golang-nuts
On Thu, Jun 30, 2011 at 4:53 PM, Ruslan Mezentsev <rmib....@gmail.com> wrote:
> Thanks, but does not work for Russian characters:

Sorry, I should have mentioned that. bflm's solution will work for that case.

- Evan

bflm

unread,
Jun 30, 2011, 1:41:02 AM6/30/11
to golang-nuts
Oops, I wrote wrong:

func main() {
str1 := "тестестестест"

str2 := string([]int(str1)[3:7])
fmt.Println(str2)
}

Sorry.

CrossWall

unread,
Jun 30, 2011, 1:46:39 AM6/30/11
to golan...@googlegroups.com
package main

import "fmt"

func main() {
s := "abcdefg"
fmt.Println(s[2:5])
}

Ruslan Mezentsev

unread,
Jun 30, 2011, 2:18:19 AM6/30/11
to golang-nuts
Forgive me for asking but what is the rune?

Jessta

unread,
Jun 30, 2011, 2:53:55 AM6/30/11
to Ruslan Mezentsev, golang-nuts
On Thu, Jun 30, 2011 at 4:18 PM, Ruslan Mezentsev <rmib....@gmail.com> wrote:
> Forgive me for asking but what is the rune?

'rune' is just a short name for a unicode code-point. (almost a
character, but some characters are made up of multiple code-points)
In Go it's just an int.


>> But such mixing of UTF-8 encoding with codepoint access by index is
>> often probably a wrong way. Either work only with []rune (for random
>> access), or only with UTF-8 (for sequential access). In the later case
>> one can also range the UTF-8 to get the runes on by one.

--
=====================
http://jessta.id.au

unread,
Jun 30, 2011, 3:08:57 AM6/30/11
to golang-nuts
Maybe the "strings" package should define a function for converting a
character-offset into a byte-offset.

Or: http://golang.org/pkg/utf8/#String.Slice

Jan Mercl

unread,
Jun 30, 2011, 3:35:41 AM6/30/11
to golan...@googlegroups.com
Mind O(N^2).

Jessta

unread,
Jun 30, 2011, 3:37:45 AM6/30/11
to ⚛, golang-nuts
On Thu, Jun 30, 2011 at 5:08 PM, ⚛ <0xe2.0x...@gmail.com> wrote:
> Maybe the "strings" package should define a function for converting a
> character-offset into a byte-offset.
>
> Or: http://golang.org/pkg/utf8/#String.Slice

The "strings" package doesn't expect any kind of encoding for the
data. It just deals with strings as immutable []byte.
The "utf8" package doesn't deal with actual characters either, it just
deals with converting utf8 encoded data in to unicode code-points.
The "unicode" package is where this kind of functionality should go
because working out what is actually a 'character' in unicode is more
work than a simple slice.


> On Jun 30, 7:41 am, bflm <befelemepesev...@gmail.com> wrote:
>> Oops, I wrote wrong:
>>
>> func main() {
>>     str1 := "тестестестест"
>>     str2 := string([]int(str1)[3:7])
>>     fmt.Println(str2)
>>
>> }
>>
>> Sorry.

--
=====================
http://jessta.id.au

chris dollin

unread,
Jun 30, 2011, 4:07:08 AM6/30/11
to Jessta, ⚛, golang-nuts
I think part of thestring-slicing issue we started with ...

str1 := "тестестестест"
str2 := str1[3:7]

... is that 3 and 7 are magic numbers. In a real slicing, those
numbers have to come from somewhere sensible, since We
All Know that strings are byte-sequences not character-sequences.
And once the indexes have been found in a rune-respecting
way then the slice just works.

Chris

--
Chris "allusive" Dollin

Message has been deleted

peterGo

unread,
Jun 30, 2011, 4:34:21 AM6/30/11
to golang-nuts
CrossWall,

I think that you are missing the point. s[2:5] is slicing the byte
array underlying the string. It's not slicing the UTF-8 encoded
characters forming the string. Therefore, outside the ASCII character
set, your solution doesn't work. For example,

package main

import "fmt"

func main() {
s := "abcdefg"
fmt.Println(s, len(s), []byte(s))
// byte slice and ASCII character slice
fmt.Println(s[2:5])

s = "αβγδεζη"
fmt.Println(s, len(s), []byte(s))
// byte slice
fmt.Println(s[2:5])
// Unicode code point slice
fmt.Println(string([]int(s)[2:5]))
}

Output:
abcdefg 7 [97 98 99 100 101 102 103]
cde
αβγδεζη 14 [206 177 206 178 206 179 206 180 206 181 206 182 206 183]
β
γδε


Peter

Rob 'Commander' Pike

unread,
Jun 30, 2011, 4:41:56 AM6/30/11
to peterGo, golang-nuts
The thing you're looking for is utf8.String.

-rob

Ruslan Mezentsev

unread,
Jun 30, 2011, 5:21:46 AM6/30/11
to golang-nuts
Thank you for your reply. This is what I need. )

str := utf8.NewString("тест本sтестест")
fmt.Println(str.Slice(3,7))

print:
т本sт

unread,
Jun 30, 2011, 6:40:27 AM6/30/11
to golang-nuts
On Jun 30, 9:37 am, Jessta <jes...@jessta.id.au> wrote:
> On Thu, Jun 30, 2011 at 5:08 PM, ⚛ <0xe2.0x9a.0...@gmail.com> wrote:
> > Maybe the "strings" package should define a function for converting a
> > character-offset into a byte-offset.
>
> > Or:http://golang.org/pkg/utf8/#String.Slice

(I mean no disrespect) If you don't believe me, then believe Rob
'Commander' Pike. He suggested the same thing I did.

> The "strings" package doesn't expect any kind of encoding for the
> data. It just deals with strings as immutable []byte.
> The "utf8" package doesn't deal with actual characters either, it just
> deals with converting utf8 encoded data in to unicode code-points.

Apple is not an apple?

Jessta

unread,
Jun 30, 2011, 2:01:59 PM6/30/11
to ⚛, golang-nuts
On Thu, Jun 30, 2011 at 8:40 PM, ⚛ <0xe2.0x...@gmail.com> wrote:
>> The "strings" package doesn't expect any kind of encoding for the
>> data. It just deals with strings as immutable []byte.
>> The "utf8" package doesn't deal with actual characters either, it just
>> deals with converting utf8 encoded data in to unicode code-points.
>
> Apple is not an apple?

I probably should have said 'glyph' instead of 'character'.
I was referring to the possibility of the use of combining characters
(http://en.wikipedia.org/wiki/Combining_character).
Where a 'glyph' entered by a user or displayed on a screen may be
multiple code-points. Thus slicing on code points without an
understanding of what those code-points mean might miss code-points
that the user expects to be there.

- jessta
--
=====================
http://jessta.id.au

Reply all
Reply to author
Forward
0 new messages