Hello Everybody I need to sort a dataframe according to a specific column the create new dataframes

51 views
Skip to first unread message

Halitim Bachir

unread,
Jan 17, 2016, 10:06:36 AM1/17/16
to PyData
Hello Everybody
I need to sort a dataframe according to a specific column then create new dataframes according to the sorted columns, each new created dataframe should contain an element from  list(set('the sorted header' )
any help please , i am new in python and pandas 
thanks much
Bachir

Paul Hobson

unread,
Jan 17, 2016, 4:38:32 PM1/17/16
to pyd...@googlegroups.com
Halitim, 

I think you'll find that many folks on the list are happy to help newcomers. But your question is a bit vague. Consider including a very small, but representative dataset that can function as input and then what the desired output would like. Also, please include any code that you have already tried.
-paul

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Halitim Bachir

unread,
Jan 17, 2016, 5:25:44 PM1/17/16
to PyData
Hi Paul
First thank you for your replay, and this is the code i used to create new DataFrames from a single Dataframe

Vibn=[]                                # data frames to be created 
NV_id=list(set(V_id))                   # the number of elements after the unique sort on a single column in the main dataframe called vaps
for k in NV_id:
    Kint=int(k)                       
    Vibn.append(vaps[vaps['V_id']==Kint])

my input file is something like this and contain more than 100000 line, i want to separate according to the column V_id and create new Dataframes with a new name containing the V_id number.
This is csv file type :
Indx     V_id    Average    Mean    Peak 
0         1           3              2          5
1         2           2              1          6
2         3           4              1          8
3         1           2              2          7
4         2           3              3          6
5         3           5              3          4
6         1           1              1          8
7         2           2              5          10
8         3           5              5          9
....
...
...
1000         1        2              3         6
1001         2        1              1         7
1002         3        4              4         8

Regards
Bachir 

Paul Hobson

unread,
Jan 17, 2016, 10:14:00 PM1/17/16
to pyd...@googlegroups.com
Your example doesn't run since V_id is undefined. 

You need to create a Short, Self Contained, Correct (Compilable), Example (www.sscce.org).

I recommend that you build a dataframe with about 10 rows and 5 columns *by hand*, and show how that should look after it has been processed correctly.
-paul

Halitim Bachir

unread,
Jan 18, 2016, 3:28:08 AM1/18/16
to PyData
Hi Paul
Thank you much for your guidelines ,bellow is the main dataframe and the desired dataframes output correctly as i need it to be,
Main dataframe
ndx     V_id        Average        Mean       Peak 
0         1              3                      2          5
1         2              2                      1          6
2         3              4                     1          8
3         1              2                      2          7
4         2              3                     3          6
5         3              5                     3          4
6         1              1                     1          8
7         2              2                     5          10
8         3              5                     5          9
9         1              2                     5          10
10        2             5                     5          9
11        3             4                     3          10

Dataframe-1
ndx     V_id   Average     Mean    Peak 
0         1           3              2          5
3         1           2              2          7
6         1           1              1          8
9         1           2              5          10

Dataframe-2
ndx     V_id    Averag    Mean     Peak 
1          2           2              1          6
4          2           3              3          6
7          2           2              5          10
10        2           5              5          9

Dataframe-3
ndx     V_id    Average    Mean    Peak 
2         3           4              1          8
5         3           5              3          4
8         3           5              5          9
11        3           4              3          10

Bachir

On Sunday, January 17, 2016 at 10:38:32 PM UTC+1, Paul Hobson wrote:

Goyo

unread,
Jan 18, 2016, 7:26:45 AM1/18/16
to PyData


Hello, Bachir,

you can use groupby for that:

>>> from __future__ import print_function
>>> from __future__ import unicode_literals
>>> import io
>>> import pandas as pd
>>> data = """ndx,V_id,Average,Mean,Peak
... 0,1,3,2,5
... 1,2,2,1,6
... 2,3,4,1,8
... 3,1,2,2,7
... 4,2,3,3,6
... 5,3,5,3,4
... 6,1,1,1,8
... 7,2,2,5,10
... 8,3,5,5,9
... 9,1,2,5,10
... 10,2,5,5,9
... 11,3,4,3,10"""

>>> df = pd.read_csv(io.StringIO(data), index_col=0)
>>> print(df)
     V_id  
Average  Mean  Peak
ndx
0       1        3     2     5
1       2        2     1     6
2       3        4     1     8
3       1        2     2     7
4       2        3     3     6
5       3        5     3     4
6       1        1     1     8
7       2        2     5    10
8       3        5     5     9
9       1        2     5    10
10      2        5     5     9
11      3        4     3    10
>>> grouper = df.groupby('V_id')
>>> for k in grouper.groups:
...     print('\nDataframe-{}'.format(k))
...     print(grouper.get_group(k))
...

Dataframe-1
     V_id  
Average  Mean  Peak
ndx
0       1        3     2     5
3       1        2     2     7
6       1        1     1     8
9       1        2     5    10

Dataframe-2

     V_id  
Average  Mean  Peak
ndx
1       2        2     1     6
4       2        3     3     6
7       2        2     5    10
10      2        5     5     9

Dataframe-3

     V_id  
Average  Mean  Peak
ndx
2       3        4     1     8
5       3        5     3     4
8       3        5     5     9
11      3        4     3    10
>>>

Goyo

Halitim Bachir

unread,
Jan 18, 2016, 9:57:56 AM1/18/16
to PyData
Thanks much Goyo,
I would like to assign the  Dataframe-(k) to the grouper.get_group(k))in the same script
Do you think is it possible
Thanks much for the help
Sincerely
groupby for that:

...     print(Dataframe-1

(k))
...

Dataframe-1
     V_id  
Average  Mean  Peak
ndx
0       1        3     2     5
3       1        2     2     7
6       1        1     1     8
9       1        2     5    10

Dataframe-2
     V_id  
Average  Mean  Peak
ndx
1       2        2     1     6
4       2        3     3     6
7       2        2     5    10
10      2        5     5     9

Dataframe-3
     V_id  
Average  Mean  Peak
ndx
2       3        4     1     8
5       3        5     3     4
8       3        5     5     9
11      3        4     3    10
>>>

Goyo

Goyo

unread,
Jan 18, 2016, 3:38:29 PM1/18/16
to PyData
El lunes, 18 de enero de 2016, 15:57:56 (UTC+1), Halitim Bachir escribió:
Thanks much Goyo,
I would like to assign the  Dataframe-(k) to the grouper.get_group(k))in the same script
Do you think is it possible

Do you mean something like this?

>>> locals().update((('dataframe_{}'.format(k), grouper.get_group(k))
...                  for k in grouper.groups))
>>> print(dataframe_1)

     V_id  
Average  Mean  Peak
ndx                          
0       1        3     2     5
3       1        2     2     7
6       1        1     1     8
9       1        2     5    10
>>>

It seems to work but it is not the kind of magic I feel comfortable with. And most of the time is not a good idea to do this in a script --so I think. It might make your code more difficult to read and debug.

Goyo

Halitim Bachir

unread,
Jan 19, 2016, 2:31:01 AM1/19/16
to PyData
That was great, Thanks much Goyo,
Sincerely
Bachir
Reply all
Reply to author
Forward
0 new messages