to_string and formatting

5,358 views
Skip to first unread message

Christian Prinoth

unread,
Nov 17, 2011, 10:03:13 AM11/17/11
to pystat...@googlegroups.com
Hi,
I am trying to get a "pretty" string out of a DataFrame, which can then be pasted into a text document (using fixed width font). I noticed that t_string has a formatters parameter, but am not sure how this is supposed to be used.
In my case I would like to give a specific formatting to each row, i.e. columns are different observations of the same dataitem. Can this be done with to_string?

Thanks

--
Christian Prinoth

Wouter Overmeire

unread,
Nov 18, 2011, 9:03:14 AM11/18/11
to pystat...@googlegroups.com
Maybe this example helps:

formatters work on a column, not a row

In [22]: myformatter = lambda x: '[%4.1f]' % x


In [23]: df = pandas.DataFrame(np.random.randn(10,5), columns=['A', 'B', 'C', 'D', 'E'])

In [24]: df
Out[24]:
   A        B        C         D       E
0 -0.823   -0.5622  -0.1358    0.1603 -0.1214
1  0.1626   0.8593   0.6968   -1.463  -1.806
2 -0.6216  -0.464   -1.452    -0.6792  1.598
3 -0.2007   0.03748  1.886    -2.072  -0.8733
4  1.197    0.834    0.004781  0.367   0.5884
5 -0.3199   0.4235   0.08371  -0.78   -0.1605
6 -1.942    0.3887  -0.2255    1.913  -0.03441
7  1.044   -0.4363  -1.037    -0.4635  0.4452
8 -0.01796 -1.253    0.302     0.3137 -1.386
9 -1.854    0.4694  -0.6945   -0.7007  0.6088

In [25]: print df.to_string(formatters={'A': myformatter, 'C': myformatter})
   A      B        C      D       E
0 [-0.8] -0.5622  [-0.1]  0.1603 -0.1214
1 [ 0.2]  0.8593  [ 0.7] -1.463  -1.806
2 [-0.6] -0.464   [-1.5] -0.6792  1.598
3 [-0.2]  0.03748 [ 1.9] -2.072  -0.8733
4 [ 1.2]  0.834   [ 0.0]  0.367   0.5884
5 [-0.3]  0.4235  [ 0.1] -0.78   -0.1605
6 [-1.9]  0.3887  [-0.2]  1.913  -0.03441
7 [ 1.0] -0.4363  [-1.0] -0.4635  0.4452
8 [-0.0] -1.253   [ 0.3]  0.3137 -1.386
9 [-1.9]  0.4694  [-0.7] -0.7007  0.6088

Christian Prinoth

unread,
Nov 18, 2011, 10:55:43 AM11/18/11
to pystat...@googlegroups.com
Thanks!

--
Christian Prinoth

-----BEGIN PGP PUBLIC KEY BLOCK-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org

mQENBE5p7L4BCADE1w1iBGAyutyi80A3AfzjXb2BuYGuvu/WPBVFzWtI5lNb5d0I
M9UrddNrNGWdVrhH3IraKHxKkOJIg/APcQ1GsgqDBsMEv18BQWOL19ItAHbaQgCl
aOXPT9BaHr168/N0zE3JwUYlThJhQCvPE9tM9zkJuqHzpHI0U6MHnJ7CxvVPCoSi
iiF14qHGG76rsgwxshFz+4EZ2cYqEPx909AARJQnAfFwp3x/4SqH7ECdQK9bPCB/
5gGUt+1yEIQ40KzJH1KNqTa4fK5Nn+1yEQShkjjTaKzCiTQnc2ekEihiCFVrdPJJ
4ihblbOFnWgeqif/x0i+H2ibna3t3f/y2w7NABEBAAG0KkNocmlzdGlhbiBQcmlu
b3RoIDxjaHJpc3RpYW5AcHJpbm90aC5uYW1lPokBOAQTAQIAIgUCTmnvhAIbLwYL
CQgHAwIGFQgCCQoLBBYCAwECHgECF4AACgkQPC08J9vIDo2V3wf8DZzSHR0CcRu8
+Br8Qhwy2FBaAmH4Oueqy78y7S87+Lu17X3iBWZwRomWx1jSmuU1cJ8BOs/T8EMp
xbgx6NHnRHpunEws7QPFtiSzeHmiMgp6WMdbpSAOOGmJahJqVAJt7SDPpwLXgXkA
L2cHIis8yYJsR1CvTqNgJGNoyQ1QzxOBnW3ynLDtZOVTvV6irGPJWXjUYGIvylxa
Z4d2yTkUiV11akoWHHQC5MMyUOPNlDutvNL9L4NVtOXkZafwBIr8n3PaeoBgRhcp
1fGwhaf+qR59KqAwugNw2fRal79/Vp1LLYRcqH5G8cIti1pCkKEfK/BfAmmcVEoa
PzrzQMbRTrQxQ2hyaXN0aWFuIFByaW5vdGggPGMucHJpbm90aEBxdWFlc3Rpb2Nh
cGl0YWwuY29tPokBOwQTAQIAJQIbLwYLCQgHAwIGFQgCCQoLBBYCAwECHgECF4AF
Ak5p8HoCGQEACgkQPC08J9vIDo39XAgAgv5BiQnqxGsamr/7s6j4uvtZJn1zRgqP
6kIQc78uWv2SWPgIyyGNuzpuxnjwXuONQbk0CyPteqZaW5UP78TvKnuZy0cvLidH
jzJ/kw+K9Np2nL7ezfgwjbNSQhIauyp1gqa5EAmWC5DeO4vpOc1lcup8MGiGkf2w
89CVjx8e+rIkbV6MDjDu7elw35ns5E3+dIzjxROoWBPEvrIVcXx2J3kW0QtYYRvx
DOtdjCukmCChD/D+yTh3/on00SDvhFJubYWGurTkMPD74hOL7XcrTsaV0nPly91e
ta7EPsGv1y1RVHBZpDDv0PZ/rogsHeKQZ26IQQC5bWSvImn6foeW67kBDQROaey+
AQgA2wKCdS6bWrDw4c5ohXUXSZCKf8HQPPYcOgh9zO9y1sSW04Z5MwSaTZyM9x+x
FK1lkcieSJ0hQre4YaApQw2zrFfnhdSr3209t7fZghJDAplo+fAbpAr/js5GtXqr
UQ5pvcfj1X2TUyyFH/sIuK94TzEmDJb7G3B1lpT9b+hKlKlHgbD/YmobM+r/+63g
++XkTA8lW1dvsCJlFq3YIJiFX3Cr7qvzgZ3bAaQQgoveoJGeOYjdKWzr3O+Qz6a0
Q8JaLNyn3fXdgjpU/xfWWVEGVjoald89Owp0/jqB6jYI4c1WP6SDr8TKlJB/EW6S
dFRJKe/AvzSCT0GzBuLh8uZoSwARAQABiQJEBBgBAgAPAhsuBQJOae30BQkSz/i2
ASnAXSAEGQECAAYFAk5p7L4ACgkQ2ShSn21Z3FBTzAgA2pXFuJf6E6L+9WzVUKBo
SYxIf22nNB/dvRLkZsmkcjY5VfIRgE9ZU5eHt3Ynp8g95rNWnbhMNHsX984m6+Ot
mvm1Fh3LGq2vbIWYdHi+dB1R/YAvt6vv4agC/3jDuENGE932PGLFEawX3xXUGE1I
U/d6juqq6n7wOFBXIFcZq10kVm5ofzdCvmxpdrHtURzlqT8hOw63qplz+jmpOM5+
lQ1TipBuYtgOCTR6Rv5PggT/L+h44yOr9SDDbPIn7NqHGVp6KdtLz5qFoUVNjSmM
K0ZsXjm6+kvYM5inQgnXbsMCqnGrMnk1+3oyeHN1IKZ3o09x9xCT7roPenQgJAJa
EAkQPC08J9vIDo370AgAtC06/qVvMapEgwLJnsJo0FyaE7obDXHp1F4AABK4VS5S
xaUukE8il1EPqgVphda9VTQd3YMNvSd2DP7a/629n9JbxxQ4pp0iuYR7qJX/bZLM
T3esuO0bmQNcJ+U7DHCcw+6MIY5NESXYW5XFJ23niB9jsoK9edEQ4EPqzDd/WN9c
dTd5DRBQoaXmQR42XRQcy/DffMbsTAcNpeFZyHMh94k2Dajm5xPI5MrSXNM6MZUB
yPAVE/ohKYd9IZT6qryaKwkOdDzui4PBL3HL0mutH6cwdnXr4l1jE+34aXNq8y4j
nLTEaduhZs6CTPgdxRLyfyu8puTdfAhgjIxS//9/dg==
=br0S
-----END PGP PUBLIC KEY BLOCK-----


Texas P.

unread,
Dec 31, 2011, 4:23:08 PM12/31/11
to pystatsmodels
Following this example, I did the following:


In [331]: DF=DataFrame({'i':[1,1,1],'j':[1,2,3],'k':[4,4,4],'mass_g_d':
[1.1,2.5,10.4]})

In [332]:

In [333]: intformatter = lambda x: '%10i' %x

In [334]: floatformatter = lambda x: '%10.5f' %x

In [335]: print(DF.to_string(columns=['k','i','j','mass_g_d'],
.....: index_names=False,
.....: colSpace=0,
.....: formatters={'k':intformatter,
.....: 'i':intformatter,
.....: 'j':intformatter,
.....: 'mass_g_d':floatformatter}))
k i j mass_g_d
0 4 1 1 1.10000
1 4 1 2 2.50000
2 4 1 3 10.40000


Maybe I'm doing something wrong, but I see 3 problems here:
1) I still see the index names, even though index_names=False
2) I'd like to have the option not to print the header. In to_csv
there is a header=False option. I tried it in to_string, hoping it
was a hidden option, but no dice.
3) Most important, the formatting isn't right. The k, i, and j
columns take 11 spaces, not 10. I tried it with colSpace=0 and
colSpace=None; both seem to put an extra space in there.

When I do the same command but writing to a file instead, I get the
following (with my real data):
k i j mass_g_d
76 1 122 70 99179
319 1 122 71 85806
562 1 123 70 78259
805 1 123 71 68174
5908 1 148 92 1.1712e+05
6151 1 148 93 1.7702e+05
6394 1 153 174 62498

The index names are still there and there's still an extra space for
each column. I also tried it with formats %9i and %9.5g but that
makes the float column no longer right-aligned, like so:
k i j mass_g_d
82 1 122 70 1.0611e+05
325 1 122 71 93298
568 1 123 70 74531
811 1 123 71 65825


What am I missing here? (it helps to view all this in a monospace
font)

Thanks.

Wouter Overmeire

unread,
Jan 3, 2012, 4:32:40 AM1/3/12
to pystat...@googlegroups.com
Op zaterdag 31 december 2011 22:23:08 UTC+1 schreef Texas P. het volgende:

1. When index_names=False, the index level names are not shown. The index values are always shown.
2. You can add it in the code and create a github pull request.
3. I spend some time on adding stylists to DataFrame.to_string() and DataFrame.to_html() and noticed the same thing. There is indeed an extra space between the columns. It did not really bother me, but to cross your t's and dot your i's, that extra space should not be there, it's a bug.

Wes McKinney

unread,
Jan 3, 2012, 11:42:49 AM1/3/12
to pystat...@googlegroups.com

Indeed I see the problem and where the "bug" is caused. The reason the
extra space is there is to make space for the '-' sign in numerical
columns. In the default console formatting this is (at least in my
opinion) the most attractive choice. When you pass custom formatters
it should not do this so you have very explicit control over how
things look. Basically the default formatter for floats is something
like:

if x < 0:
return '%.4f' % x
else:
return ' %.4f' % x

the result is something that looks like

B
4
-1.5
5.6
-6.7

instead of the much uglier

B
4
-1.5
5.6
6.7

In order to get the column label to line up on the first digit you
have to add a space.

One solution is to right-justify the column name (it's currently
left-justified). I'll see what I can do-- GitHub issue for this would
be useful for me so I don't forget to look into it.

- Wes

Texas P.

unread,
Jan 3, 2012, 5:16:36 PM1/3/12
to pystatsmodels


On Jan 3, 1:32 am, Wouter Overmeire <loda...@gmail.com> wrote:
> Op zaterdag 31 december 2011 22:23:08 UTC+1 schreef Texas P. het volgende:
>
> 1. When index_names=False, the index level names are not shown. The index
> values are always shown.

OK. But for my purposes it would be nice if there was an option to
turn it off, like in "to_csv."

> 2. You can add it in the code and create a github pull request.

I don't know what that means.

> 3. I spend some time on adding stylists<https://github.com/wesm/pandas/issues/459>to DataFrame.to_string() and DataFrame.to_html() and noticed the same
> thing. There is indeed an extra space between the columns. It did not
> really bother me, but to cross your t's and dot your i's, that extra space
> should not be there, it's a bug.

For my purposes, I'm looking for something I can format exactly.
Having an extra placeholder for a minus sign may look nicer for the
console, but I'm writing files that become input to other programs and
the format needs to be precise.

I could always do this in a loop, but that feels kinda dirty.

Thanks for the help.

Wouter Overmeire

unread,
Jan 4, 2012, 3:52:53 AM1/4/12
to pystat...@googlegroups.com
Created two issues on github:

https://github.com/wesm/pandas/issues/570  -- request for index and header options in DataFrame.to_string(), maybe also for DataFrame.to_html()
https://github.com/wesm/pandas/issues/571 - remove sign aligment white space character between columns.

Consider these as a request to implement this functionality.

Wes McKinney

unread,
Jan 7, 2012, 7:23:56 PM1/7/12
to pystat...@googlegroups.com

Adam and I (mostly Adam) hacked on this this week.

To Texas P's question about the extra space and formatting: to get a
pretty output you need to right-justify the column names. Currently
pandas left-justifies by default-- I'm considering making
right-justify the default behavior, or perhaps configurable as an
option in set_printoptions (just for the column names for now), it
seems to be fairly common elsewhere (e.g. R):


DF=DataFrame({'i':[1,1,1],'j':[1,2,3],'k':[4,4,4],'mass_g_d':
[1.1,2.5,10.4]})

intformatter = lambda x: '%10i' %x

floatformatter = lambda x: '%10.5f' %x

print DF.to_string(columns=['k','i','j','mass_g_d'],
index_names=False, justify='right',
col_space=0,
formatters={'k':intformatter,
'i':intformatter,
'j':intformatter,
'mass_g_d':floatformatter})
## -- End pasted text --


k i j mass_g_d
0 4 1 1 1.10000
1 4 1 2 2.50000
2 4 1 3 10.40000

Any extra thoughts on the matter?

- Wes

Wouter Overmeire

unread,
Jan 8, 2012, 1:05:04 PM1/8/12
to pystat...@googlegroups.com


2012/1/8 Wes McKinney <wesm...@gmail.com>


Ran some examples with justify='right', looks good, except when column width is large compared to number of chars in column key(s) then i prefer both column content and keys to have the same alignment, or to center the column keys over the column.

Seems good idea for making justify configurable through set_printoptions.

Increasing number of config options, what about a config file?



Reply all
Reply to author
Forward
0 new messages