to_string and formatting

Christian Prinoth

unread,

Nov 17, 2011, 10:03:13 AM11/17/11

to pystat...@googlegroups.com

Hi,

I am trying to get a "pretty" string out of a DataFrame, which can then be pasted into a text document (using fixed width font). I noticed that t_string has a formatters parameter, but am not sure how this is supposed to be used.

In my case I would like to give a specific formatting to each row, i.e. columns are different observations of the same dataitem. Can this be done with to_string?

Thanks

--
Christian Prinoth

Wouter Overmeire

unread,

Nov 18, 2011, 9:03:14 AM11/18/11

to pystat...@googlegroups.com

Maybe this example helps:

formatters work on a column, not a row

In [22]: myformatter = lambda x: '[%4.1f]' % x

In [23]: df = pandas.DataFrame(np.random.randn(10,5), columns=['A', 'B', 'C', 'D', 'E'])

In [24]: df
Out[24]:
   A        B        C         D       E
0 -0.823   -0.5622 -0.1358    0.1603 -0.1214
1 0.1626   0.8593   0.6968   -1.463 -1.806
2 -0.6216 -0.464   -1.452    -0.6792 1.598
3 -0.2007   0.03748 1.886    -2.072 -0.8733
4 1.197    0.834    0.004781 0.367   0.5884
5 -0.3199   0.4235   0.08371 -0.78   -0.1605
6 -1.942    0.3887 -0.2255    1.913 -0.03441
7 1.044   -0.4363 -1.037    -0.4635 0.4452
8 -0.01796 -1.253    0.302     0.3137 -1.386
9 -1.854    0.4694 -0.6945   -0.7007 0.6088

In [25]: print df.to_string(formatters={'A': myformatter, 'C': myformatter})
   A      B        C      D       E
0 [-0.8] -0.5622 [-0.1] 0.1603 -0.1214
1 [ 0.2] 0.8593 [ 0.7] -1.463 -1.806
2 [-0.6] -0.464   [-1.5] -0.6792 1.598
3 [-0.2] 0.03748 [ 1.9] -2.072 -0.8733
4 [ 1.2] 0.834   [ 0.0] 0.367   0.5884
5 [-0.3] 0.4235 [ 0.1] -0.78   -0.1605
6 [-1.9] 0.3887 [-0.2] 1.913 -0.03441
7 [ 1.0] -0.4363 [-1.0] -0.4635 0.4452
8 [-0.0] -1.253   [ 0.3] 0.3137 -1.386
9 [-1.9] 0.4694 [-0.7] -0.7007 0.6088

Christian Prinoth

unread,

Nov 18, 2011, 10:55:43 AM11/18/11

to pystat...@googlegroups.com

Thanks!

--
Christian Prinoth

-----BEGIN PGP PUBLIC KEY BLOCK-----

Version: GnuPG/MacGPG2 v2.0.17 (Darwin)

Comment: GPGTools - http://gpgtools.org

mQENBE5p7L4BCADE1w1iBGAyutyi80A3AfzjXb2BuYGuvu/WPBVFzWtI5lNb5d0I

M9UrddNrNGWdVrhH3IraKHxKkOJIg/APcQ1GsgqDBsMEv18BQWOL19ItAHbaQgCl

aOXPT9BaHr168/N0zE3JwUYlThJhQCvPE9tM9zkJuqHzpHI0U6MHnJ7CxvVPCoSi

iiF14qHGG76rsgwxshFz+4EZ2cYqEPx909AARJQnAfFwp3x/4SqH7ECdQK9bPCB/

5gGUt+1yEIQ40KzJH1KNqTa4fK5Nn+1yEQShkjjTaKzCiTQnc2ekEihiCFVrdPJJ

4ihblbOFnWgeqif/x0i+H2ibna3t3f/y2w7NABEBAAG0KkNocmlzdGlhbiBQcmlu

b3RoIDxjaHJpc3RpYW5AcHJpbm90aC5uYW1lPokBOAQTAQIAIgUCTmnvhAIbLwYL

CQgHAwIGFQgCCQoLBBYCAwECHgECF4AACgkQPC08J9vIDo2V3wf8DZzSHR0CcRu8

+Br8Qhwy2FBaAmH4Oueqy78y7S87+Lu17X3iBWZwRomWx1jSmuU1cJ8BOs/T8EMp

xbgx6NHnRHpunEws7QPFtiSzeHmiMgp6WMdbpSAOOGmJahJqVAJt7SDPpwLXgXkA

L2cHIis8yYJsR1CvTqNgJGNoyQ1QzxOBnW3ynLDtZOVTvV6irGPJWXjUYGIvylxa

Z4d2yTkUiV11akoWHHQC5MMyUOPNlDutvNL9L4NVtOXkZafwBIr8n3PaeoBgRhcp

1fGwhaf+qR59KqAwugNw2fRal79/Vp1LLYRcqH5G8cIti1pCkKEfK/BfAmmcVEoa

PzrzQMbRTrQxQ2hyaXN0aWFuIFByaW5vdGggPGMucHJpbm90aEBxdWFlc3Rpb2Nh

cGl0YWwuY29tPokBOwQTAQIAJQIbLwYLCQgHAwIGFQgCCQoLBBYCAwECHgECF4AF

Ak5p8HoCGQEACgkQPC08J9vIDo39XAgAgv5BiQnqxGsamr/7s6j4uvtZJn1zRgqP

6kIQc78uWv2SWPgIyyGNuzpuxnjwXuONQbk0CyPteqZaW5UP78TvKnuZy0cvLidH

jzJ/kw+K9Np2nL7ezfgwjbNSQhIauyp1gqa5EAmWC5DeO4vpOc1lcup8MGiGkf2w

89CVjx8e+rIkbV6MDjDu7elw35ns5E3+dIzjxROoWBPEvrIVcXx2J3kW0QtYYRvx

DOtdjCukmCChD/D+yTh3/on00SDvhFJubYWGurTkMPD74hOL7XcrTsaV0nPly91e

ta7EPsGv1y1RVHBZpDDv0PZ/rogsHeKQZ26IQQC5bWSvImn6foeW67kBDQROaey+

AQgA2wKCdS6bWrDw4c5ohXUXSZCKf8HQPPYcOgh9zO9y1sSW04Z5MwSaTZyM9x+x

FK1lkcieSJ0hQre4YaApQw2zrFfnhdSr3209t7fZghJDAplo+fAbpAr/js5GtXqr

UQ5pvcfj1X2TUyyFH/sIuK94TzEmDJb7G3B1lpT9b+hKlKlHgbD/YmobM+r/+63g

++XkTA8lW1dvsCJlFq3YIJiFX3Cr7qvzgZ3bAaQQgoveoJGeOYjdKWzr3O+Qz6a0

Q8JaLNyn3fXdgjpU/xfWWVEGVjoald89Owp0/jqB6jYI4c1WP6SDr8TKlJB/EW6S

dFRJKe/AvzSCT0GzBuLh8uZoSwARAQABiQJEBBgBAgAPAhsuBQJOae30BQkSz/i2

ASnAXSAEGQECAAYFAk5p7L4ACgkQ2ShSn21Z3FBTzAgA2pXFuJf6E6L+9WzVUKBo

SYxIf22nNB/dvRLkZsmkcjY5VfIRgE9ZU5eHt3Ynp8g95rNWnbhMNHsX984m6+Ot

mvm1Fh3LGq2vbIWYdHi+dB1R/YAvt6vv4agC/3jDuENGE932PGLFEawX3xXUGE1I

U/d6juqq6n7wOFBXIFcZq10kVm5ofzdCvmxpdrHtURzlqT8hOw63qplz+jmpOM5+

lQ1TipBuYtgOCTR6Rv5PggT/L+h44yOr9SDDbPIn7NqHGVp6KdtLz5qFoUVNjSmM

K0ZsXjm6+kvYM5inQgnXbsMCqnGrMnk1+3oyeHN1IKZ3o09x9xCT7roPenQgJAJa

EAkQPC08J9vIDo370AgAtC06/qVvMapEgwLJnsJo0FyaE7obDXHp1F4AABK4VS5S

xaUukE8il1EPqgVphda9VTQd3YMNvSd2DP7a/629n9JbxxQ4pp0iuYR7qJX/bZLM

T3esuO0bmQNcJ+U7DHCcw+6MIY5NESXYW5XFJ23niB9jsoK9edEQ4EPqzDd/WN9c

dTd5DRBQoaXmQR42XRQcy/DffMbsTAcNpeFZyHMh94k2Dajm5xPI5MrSXNM6MZUB

yPAVE/ohKYd9IZT6qryaKwkOdDzui4PBL3HL0mutH6cwdnXr4l1jE+34aXNq8y4j

nLTEaduhZs6CTPgdxRLyfyu8puTdfAhgjIxS//9/dg==

=br0S

-----END PGP PUBLIC KEY BLOCK-----

Texas P.

unread,

Dec 31, 2011, 4:23:08 PM12/31/11

to pystatsmodels

Following this example, I did the following:

In [331]: DF=DataFrame({'i':[1,1,1],'j':[1,2,3],'k':[4,4,4],'mass_g_d':
[1.1,2.5,10.4]})

In [332]:

In [333]: intformatter = lambda x: '%10i' %x

In [334]: floatformatter = lambda x: '%10.5f' %x

In [335]: print(DF.to_string(columns=['k','i','j','mass_g_d'],
.....: index_names=False,
.....: colSpace=0,
.....: formatters={'k':intformatter,
.....: 'i':intformatter,
.....: 'j':intformatter,
.....: 'mass_g_d':floatformatter}))
k i j mass_g_d
0 4 1 1 1.10000
1 4 1 2 2.50000
2 4 1 3 10.40000

Maybe I'm doing something wrong, but I see 3 problems here:
1) I still see the index names, even though index_names=False
2) I'd like to have the option not to print the header. In to_csv
there is a header=False option. I tried it in to_string, hoping it
was a hidden option, but no dice.
3) Most important, the formatting isn't right. The k, i, and j
columns take 11 spaces, not 10. I tried it with colSpace=0 and
colSpace=None; both seem to put an extra space in there.

When I do the same command but writing to a file instead, I get the
following (with my real data):
k i j mass_g_d
76 1 122 70 99179
319 1 122 71 85806
562 1 123 70 78259
805 1 123 71 68174
5908 1 148 92 1.1712e+05
6151 1 148 93 1.7702e+05
6394 1 153 174 62498

The index names are still there and there's still an extra space for
each column. I also tried it with formats %9i and %9.5g but that
makes the float column no longer right-aligned, like so:
k i j mass_g_d
82 1 122 70 1.0611e+05
325 1 122 71 93298
568 1 123 70 74531
811 1 123 71 65825

What am I missing here? (it helps to view all this in a monospace
font)

Thanks.

Wouter Overmeire

unread,

Jan 3, 2012, 4:32:40 AM1/3/12

to pystat...@googlegroups.com

Op zaterdag 31 december 2011 22:23:08 UTC+1 schreef Texas P. het volgende:

1. When index_names=False, the index level names are not shown. The index values are always shown.
2. You can add it in the code and create a github pull request.
3. I spend some time on adding stylists to DataFrame.to_string() and DataFrame.to_html() and noticed the same thing. There is indeed an extra space between the columns. It did not really bother me, but to cross your t's and dot your i's, that extra space should not be there, it's a bug.

Wes McKinney

unread,

Jan 3, 2012, 11:42:49 AM1/3/12

to pystat...@googlegroups.com

Indeed I see the problem and where the "bug" is caused. The reason the
extra space is there is to make space for the '-' sign in numerical
columns. In the default console formatting this is (at least in my
opinion) the most attractive choice. When you pass custom formatters
it should not do this so you have very explicit control over how
things look. Basically the default formatter for floats is something
like:

if x < 0:
return '%.4f' % x
else:
return ' %.4f' % x

the result is something that looks like

B
4
-1.5
5.6
-6.7

instead of the much uglier

B
4
-1.5
5.6
6.7

In order to get the column label to line up on the first digit you
have to add a space.

One solution is to right-justify the column name (it's currently
left-justified). I'll see what I can do-- GitHub issue for this would
be useful for me so I don't forget to look into it.

- Wes

Texas P.

unread,

Jan 3, 2012, 5:16:36 PM1/3/12

to pystatsmodels

On Jan 3, 1:32 am, Wouter Overmeire <loda...@gmail.com> wrote:
> Op zaterdag 31 december 2011 22:23:08 UTC+1 schreef Texas P. het volgende:
>

> 1. When index_names=False, the index level names are not shown. The index
> values are always shown.

OK. But for my purposes it would be nice if there was an option to
turn it off, like in "to_csv."

> 2. You can add it in the code and create a github pull request.

I don't know what that means.

> 3. I spend some time on adding stylists<https://github.com/wesm/pandas/issues/459>to DataFrame.to_string() and DataFrame.to_html() and noticed the same

> thing. There is indeed an extra space between the columns. It did not
> really bother me, but to cross your t's and dot your i's, that extra space
> should not be there, it's a bug.

For my purposes, I'm looking for something I can format exactly.
Having an extra placeholder for a minus sign may look nicer for the
console, but I'm writing files that become input to other programs and
the format needs to be precise.

I could always do this in a loop, but that feels kinda dirty.

Thanks for the help.

Wouter Overmeire

unread,

Jan 4, 2012, 3:52:53 AM1/4/12

to pystat...@googlegroups.com

Created two issues on github:

https://github.com/wesm/pandas/issues/570 -- request for index and header options in DataFrame.to_string(), maybe also for DataFrame.to_html()
https://github.com/wesm/pandas/issues/571 - remove sign aligment white space character between columns.

Consider these as a request to implement this functionality.

Wes McKinney

unread,

Jan 7, 2012, 7:23:56 PM1/7/12

to pystat...@googlegroups.com

Adam and I (mostly Adam) hacked on this this week.

To Texas P's question about the extra space and formatting: to get a
pretty output you need to right-justify the column names. Currently
pandas left-justifies by default-- I'm considering making
right-justify the default behavior, or perhaps configurable as an
option in set_printoptions (just for the column names for now), it
seems to be fairly common elsewhere (e.g. R):

DF=DataFrame({'i':[1,1,1],'j':[1,2,3],'k':[4,4,4],'mass_g_d':
[1.1,2.5,10.4]})

intformatter = lambda x: '%10i' %x

floatformatter = lambda x: '%10.5f' %x

print DF.to_string(columns=['k','i','j','mass_g_d'],
index_names=False, justify='right',
col_space=0,
formatters={'k':intformatter,
'i':intformatter,
'j':intformatter,
'mass_g_d':floatformatter})
## -- End pasted text --

k i j mass_g_d
0 4 1 1 1.10000
1 4 1 2 2.50000
2 4 1 3 10.40000

Any extra thoughts on the matter?

- Wes

Wouter Overmeire

unread,

Jan 8, 2012, 1:05:04 PM1/8/12

to pystat...@googlegroups.com

2012/1/8 Wes McKinney <wesm...@gmail.com>

Ran some examples with justify='right', looks good, except when column width is large compared to number of chars in column key(s) then i prefer both column content and keys to have the same alignment, or to center the column keys over the column.

Seems good idea for making justify configurable through set_printoptions.

Increasing number of config options, what about a config file?

Reply all

Reply to author

Forward