A better way to transform my dataframe

13 views

Skip to first unread message

Eric G

unread,

Jun 24, 2018, 12:41:33 AM6/24/18

to PyData

I have a dataframe that looks like:

           Jan                        Feb                        Mar  \
           actual budgeted difference actual budgeted difference actual   
Income        889     6716       1687   2437     7137       3640  10288   
Expenses     2478     5548      12623   2190    12882       9232   9947   
Difference  10661      657       7365   -147    10289       7044   3639

My focus is on the Difference row and the 'actual' and 'budgeted' columns across all of the months.

and I need to transform it to look like:

      amount months      category
0      657.0 2018-01-31  budgeted
1     8438.0 2018-01-31    actual
2    10946.0 2018-02-28  budgeted
3    38318.0 2018-02-28    actual...

...

My code to do this is:

import pandas as pd
import string
import altair as alt


from random import randint


# 
# This is the general form of my 'real' dataframe. It is not subject to change.
#
months = [ 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec' ]
monthyAmounts = [ "actual", "budgeted", "difference" ]


summary = []
summary.append( [ randint( -1000, 15000 ) for x in range( 0, len( months ) * len( monthyAmounts ) ) ] )
summary.append( [ randint( -1000, 15000 ) for x in range( 0, len( months ) * len( monthyAmounts ) ) ]  )
summary.append( [ randint( -1000, 15000 ) for x in range( 0, len( months ) * len( monthyAmounts ) ) ]  )


index   = pd.Index( [ 'Income', 'Expenses', 'Difference' ], name = 'type' )
columns = pd.MultiIndex.from_product( [months, monthyAmounts], names=['month', 'category'] )


summaryDF = pd.DataFrame( summary, index = index, columns = columns )


print( summaryDF )


#
# From this point, I am trying to transform the summaryDF into something 
# I can use in a different context...
#


budgetMonths = pd.date_range( "January, 2018", periods = 12, freq = 'BM' )


idx = pd.IndexSlice
budgeted = summaryDF.loc[ 'Difference', idx[:, 'budgeted' ] ].cumsum()
actual.index = budgetMonths


#
# The reason for this is that June is not over with yet, so, there can be no 'actual' amount for it or later months.
#
actual.iloc[ 5 ] = None
actual.iloc[ 6 ] = None
actual.iloc[ 7 ] = None
actual.iloc[ 8 ] = None
actual.iloc[ 9 ] = None
actual.iloc[ 10 ] = None
actual.iloc[ 11 ] = None


actual = actual.cumsum()


budgeted.index = budgetMonths


budgetedDF = pd.DataFrame( { 'amount': budgeted, 'months': budgetMonths, 'category': 'budgeted' })
actualDF   = pd.DataFrame( { 'amount': actual, 'months': budgetMonths, 'category': 'actual' })


df3 = (pd.concat([budgetedDF, actualDF])
         .sort_index()
         .reset_index()
).drop( 'index', 1 )


print( df3 )

The transformation starts after the 'print( summaryDF )' line.

As I have only been playing with Pandas for a couple of days, my question is, is there a better way to do the transformation?

If it helps and as background, this transformation produces a dataframe that make it easy to use altair to graph with the line:

budgetChart = alt.Chart( df3 ).configure_view( width = 900 ).mark_line().encode(
    y = alt.Y( 'amount:Q', axis = alt.Axis( format = '$,f', title = "Cumulative Amount") ),
    x = alt.X( 'month(months):T', axis = alt.Axis( title = "Month") ),
    color = 'category'
)

Reply all

Reply to author

Forward

0 new messages