Problem in defining multidimensional array matrix and regression

21 views
Skip to first unread message

shalu....@gmail.com

unread,
Nov 19, 2017, 10:33:18 AM11/19/17
to pystatsmodels
Hi, All, 

I have 6 variables in CSV file. One is rainfall (dependent, at y-axis) and others are predictors (at x). I want to do multiple regression and create a correlation matrix between rainfall (y) and predictors (x; n1=5). Thus I want to read rainfall as a separate variable and others in separate columns, so I can apply the algo. However, I am not able to make a proper matrix for them. 

Here are my data and codes? 
Please suggest me for the same. 
I am new to Python. 

RF        P1        P2        P3        P4        P5 
120.235        0.234        -0.012        0.145        21.023        0.233 
200.14        0.512        -0.021        0.214        22.21        0.332 
185.362        0.147        -0.32        0.136        24.65        0.423 
201.895        0.002        -0.12        0.217        30.25        0.325 
165.235        0.256        0.001        0.22        31.245        0.552 
198.236        0.012        -0.362        0.215        32.25        0.333 
350.263        0.98        -0.85        0.321        38.412        0.411 
145.25        0.046        -0.36        0.147        39.256        0.872 
198.654        0.65        -0.45        0.224        40.235        0.652 
245.214        0.47        -0.325        0.311        26.356        0.632 
214.02        0.18        -0.012        0.242        22.01        0.745 
147.256        0.652        -0.785        0.311        18.256        0.924 

import numpy as np 
import statsmodels as sm 
import statsmodels.formula as smf 
import csv 

with open("pcp1.csv", "r") as csvfile: 
    readCSV=csv.reader(csvfile) 
    
    rainfall = [] 
    csvFileList = [] 
    
    for row in readCSV: 
        Rain = row[0] 
        rainfall.append(Rain) 

        if len (row) !=0: 
            csvFileList = csvFileList + [row]       
        
print(csvFileList) 
print(rainfall) 

Please suggest me guys 
Thanks 

josef...@gmail.com

unread,
Nov 19, 2017, 10:43:27 AM11/19/17
to pystatsmodels
That's more appropriate as a stackoverflow question. It doesn't directly involve statsmodels.

My main recommendation is to use pandas to read the csv file, then the data is already in the right format and you can use either pandas or numpy correlation.

The main problem in your code is that you need to convert your data to a numeric dataframe or numpy array.

e.g.
x = np.array(csvFileList, np.float64)
np.corrcoeff(x, rowvar=0)[0, 1:]

If that doesn't work, then you are much better off with using pandas because it has more conversion options.

Josef


shalu....@gmail.com

unread,
Nov 19, 2017, 11:25:01 AM11/19/17
to pystatsmodels
Many thanks @ Josef

I try that then

shalu....@gmail.com

unread,
Nov 19, 2017, 12:56:09 PM11/19/17
to pystatsmodels
Hi Josef,

Many thanks for your suggestion. 
Now I am using Pandas &
I already did that but now I need to make a multi-dimensional array for reading all variables (5 in this case) at one x-axis, so I can perform multiple regression analysis. 

I am not getting how to bring all variables at one axis (e.g. at x-axis)?

Thanks
Vishal

On Sunday, 19 November 2017 21:13:27 UTC+5:30, josefpktd wrote:
Reply all
Reply to author
Forward
0 new messages