strutil module

6 views
Skip to first unread message

Michael Muller

unread,
May 10, 2011, 10:05:15 AM5/10/11
to crack-l...@googlegroups.com
Hi all,

Now that we have mostly-working generics, I've decided to move our
multiple copies of StringArray and split into a new module:
crack.strutil

crack.strutil will initially contain:

A StringArray class, derived from Array[String]. In addition to all
the Array[String] goodness, StringArray will contain:
- a writeTo() method that writes string reprs (example ['first',
'second', 'third'])
- A couple of join() methods (join() -> joins with no delmiters,
join(Buffer buf) -> joins using 'buf' as a delimiter, join(byte b) ->
joins using 'b' as a delimiter)
- A method to convert the contents to CStrings (if they aren't already)

Several flavors of split() methods:
- split(String s) -> splits the string with whitespace delimiters,
''' -> [''], 'foo bar' -> ['foo', 'bar'], this is a test ' -> ['',
'this', 'is', 'a', 'test', '']
- split(String s, byte delim) -> splits the string with single
instances of 'b' as a delimiter: split('fooxxbar', b'x') -> ['foo',
'', 'bar']
- split(String s, String deim) -> splits the string with 'delim' as
a delimiter: split('a..b..c', '..') -> ['a', 'b', 'c']
- split(String s, Splitter splitter) -> lets some (as yet undefined)
Splitter interface decide how to split the string

I expect there'll be more things that we want to add to strutil, but
this is a good start.

Any questions or suggestions?

Conrad Steenberg

unread,
May 10, 2011, 2:13:05 PM5/10/11
to crack-l...@googlegroups.com
Hi Michael,

Sounds good.

I needed some filling and wrapping code for cmdline.crk, so strutil
would be a good place for that, right?

Can you add a few lines to the wiki or the manual to clarify how
generics work?

Thanks,
Conrad


On Tue, 2011-05-10 at 10:05 -0400, Michael Muller wrote:
> Hi all,
>
> Now that we have mostly-working generics, I've decided to move our
> multiple copies of StringArray and split into a new module:
> crack.strutil
>
> crack.strutil will initially contain:
>
> A StringArray class, derived from Array[String]. In addition to all
> the Array[String] goodness, StringArray will contain:
> - a writeTo() method that writes string reprs (example ['first',
> 'second', 'third'])
> - A couple of join() methods (join() -> joins with no delmiters,
> join(Buffer buf) -> joins using 'buf' as a delimiter, join(byte b) ->
> joins using 'b' as a delimiter)
> - A method to convert the contents to CStrings (if they aren't already)
>
> Several flavors of split() methods:
> - split(String s) -> splits the string with whitespace delimiters,
> ''' -> [''], 'foo bar' -> ['foo', 'bar'], this is a test ' -> ['',
> 'this', 'is', 'a', 'test', '']
> - split(String s, byte delim) -> splits the string with single
> instances of 'b' as a delimiter: split('fooxxbar', b'x') -> ['foo',
> '', 'bar']

> - split(String s, String delim) -> splits the string with 'delim' as

Michael Muller

unread,
May 10, 2011, 3:37:49 PM5/10/11
to crack-l...@googlegroups.com
On Tue, May 10, 2011 at 2:13 PM, Conrad Steenberg
<conrad.s...@caltech.edu> wrote:
> Hi Michael,
>
> Sounds good.
>
> I needed some filling and wrapping code for cmdline.crk, so strutil
> would be a good place for that, right?

Cutting and pasting the IRC discussion on this:

11:04 < pumphaus> mmuller: something like strarray.find(regexp) would be nice
11:05 <@mmuller> ok, we could do that
11:10 <@GRiD> so what distinguishes between something we'd want to add
as a method to String, and
something that's a util in another library? would it be
that it depends on a utility
class?
11:11 <@mmuller> yes. basically, whether we _can_ implement it in crack.lang
11:11 <@GRiD> ok
11:13 <@GRiD> well, string replace (regex and non) is useful
11:13 <@mmuller> yes, definitely
11:14 <@mmuller> we can do non-regex in String, we might want to do
regex in, well, regex
11:14 <@GRiD> a bunch of simple mutation type utils like
upper/lower/ucfirst can probably go in String
11:14 <@mmuller> sure
11:15 <@GRiD> what about something like md5
11:17 <@GRiD> hashing module i guess
11:17 <@mmuller> python has modules for md5 and sha, my initial
reaction is that we should do the same
11:17 <@GRiD> nod
11:17 <@GRiD> there are a bunch of html specific string utilities as
well (urlencode, parse url, etc)
which should probably get their own module
11:19 <@mmuller> yes. "util" modules are always tough for this reason
11:20 <@GRiD> yeah. we've got lots of precedent at least

So in summary, crack.lang.String should be widely applicable stuff
that can be implemented without the post-bootstrapping libraries,
crack.strutil should be widely applicable stuff that requires the
higher level libraries, and more niche operations should go in their
own module.

Can you describe what you need in more detail?

Conrad Steenberg

unread,
May 10, 2011, 3:52:22 PM5/10/11
to crack-l...@googlegroups.com
I'm thinking of stuff in 7.1.1 of
http://docs.python.org/library/string.html (mentioned below too)

as well as
http://docs.python.org/library/textwrap.html

and some of the methods from
http://docs.python.org/library/stdtypes.html#string-methods that we
don't want to cram into the String class itself.

I agree that the sha/md5/html stuff would be better in their own
modules.

Michael Muller

unread,
May 10, 2011, 9:38:56 PM5/10/11
to crack-l...@googlegroups.com
On Tue, May 10, 2011 at 3:52 PM, Conrad Steenberg
<conrad.s...@caltech.edu> wrote:
> I'm thinking of stuff in 7.1.1 of
> http://docs.python.org/library/string.html  (mentioned below too)

String constants (ascii_letters, ascii_uppercase, ascii_lowercase) may
be good candidates for strutil, just because it's useful to have them
module-scoped and crack.lang wouldn't be a good module to have them
scoped to. Although I'm strongly tempted to uphold Crack's "encoding
neutrality" rule and make them part of a "crack.enc.ascii" module.

Also possibly good candidates for strutil, although once again, we
could just have a "textwrap" module.

>
> and some of the methods from
>  http://docs.python.org/library/stdtypes.html#string-methods that we
> don't want to cram into the String class itself.

A lot of those things are encoding-specific (upper, isaplha,
expandtabs). Those that are not (find, index, startswith, endswith) I
think we should cram into the String class. I'm not as sure about the
encoding specific stuff because of encoding neutrality.

How do people feel about putting encoding specific stuff into an
"ascii" module? strutil currently has split(String) which splits by
ascii whitespace, but that's kind of exceptional since all of the
other split() functions will end up there and the rest of them are
encoding-neutral.

The other consideration is that we don't want to make this too
complicated, so it might be better to consolidate most of these kinds
of things into strutil or even crack.lang.String whenever possible.

I guess what I'm looking for is a simple set of rules that we can
explain in the documentation: "if you want a function that does X look
in Y ... "

Shannon Weyrick

unread,
May 11, 2011, 12:34:10 PM5/11/11
to crack-l...@googlegroups.com
>
> How do people feel about putting encoding specific stuff into an
> "ascii" module? strutil currently has split(String) which splits by
> ascii whitespace, but that's kind of exceptional since all of the
> other split() functions will end up there and the rest of them are
> encoding-neutral.
>

I like the idea of keeping things encoding neutral, but on the other
hand it should be dirt simple to do all of the popular string
manipulations in ascii and unicode. If you have to go groping for which
module to import, which function to use just to strlower it's going to
be a drag.

Maybe it makes sense to build out a UString class that can assume a
unicode encoding and have all of the frequently used methods handily
available.

Shannon

Reply all
Reply to author
Forward
0 new messages