A better way to transform my dataframe

13 views
Skip to first unread message

Eric G

unread,
Jun 24, 2018, 12:41:33 AM6/24/18
to PyData
I have a dataframe that looks like:

           Jan                        Feb                        Mar  \
           actual budgeted difference actual budgeted difference actual  
Income        889     6716       1687   2437     7137       3640  10288  
Expenses     2478     5548      12623   2190    12882       9232   9947  
Difference  10661      657       7365   -147    10289       7044   3639  



My focus is on the Difference row and the 'actual' and 'budgeted' columns across all of the months.

and I need to transform it to look like:

      amount months      category
0      657.0 2018-01-31  budgeted
1     8438.0 2018-01-31    actual
2    10946.0 2018-02-28  budgeted
3    38318.0 2018-02-28    actual...

...

My code to do this is:

import pandas as pd
import string
import altair as alt


from random import randint


#
# This is the general form of my 'real' dataframe. It is not subject to change.
#
months
= [ 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec' ]
monthyAmounts
= [ "actual", "budgeted", "difference" ]


summary
= []
summary
.append( [ randint( -1000, 15000 ) for x in range( 0, len( months ) * len( monthyAmounts ) ) ] )
summary
.append( [ randint( -1000, 15000 ) for x in range( 0, len( months ) * len( monthyAmounts ) ) ]  )
summary
.append( [ randint( -1000, 15000 ) for x in range( 0, len( months ) * len( monthyAmounts ) ) ]  )


index  
= pd.Index( [ 'Income', 'Expenses', 'Difference' ], name = 'type' )
columns
= pd.MultiIndex.from_product( [months, monthyAmounts], names=['month', 'category'] )


summaryDF
= pd.DataFrame( summary, index = index, columns = columns )


print( summaryDF )


#
# From this point, I am trying to transform the summaryDF into something
# I can use in a different context...
#


budgetMonths
= pd.date_range( "January, 2018", periods = 12, freq = 'BM' )


idx
= pd.IndexSlice
budgeted
= summaryDF.loc[ 'Difference', idx[:, 'budgeted' ] ].cumsum()
actual
.index = budgetMonths


#
# The reason for this is that June is not over with yet, so, there can be no 'actual' amount for it or later months.
#
actual
.iloc[ 5 ] = None
actual
.iloc[ 6 ] = None
actual
.iloc[ 7 ] = None
actual
.iloc[ 8 ] = None
actual
.iloc[ 9 ] = None
actual
.iloc[ 10 ] = None
actual
.iloc[ 11 ] = None


actual
= actual.cumsum()


budgeted
.index = budgetMonths


budgetedDF
= pd.DataFrame( { 'amount': budgeted, 'months': budgetMonths, 'category': 'budgeted' })
actualDF  
= pd.DataFrame( { 'amount': actual, 'months': budgetMonths, 'category': 'actual' })


df3
= (pd.concat([budgetedDF, actualDF])
         
.sort_index()
         
.reset_index()
).drop( 'index', 1 )


print( df3 )

The transformation starts after the 'print( summaryDF )' line.

As I have only been playing with Pandas for a couple of days, my question is, is there a better way to do the transformation?

If it helps and as background, this transformation produces a dataframe that make it easy to use altair to graph with the line:

budgetChart = alt.Chart( df3 ).configure_view( width = 900 ).mark_line().encode(
    y
= alt.Y( 'amount:Q', axis = alt.Axis( format = '$,f', title = "Cumulative Amount") ),
    x
= alt.X( 'month(months):T', axis = alt.Axis( title = "Month") ),
    color
= 'category'
)



Reply all
Reply to author
Forward
0 new messages