IMPORTAR ARQUIVO CSV COM CAMPOS VAZIOS

55 views
Skip to first unread message

Ravana Andrade

unread,
Feb 17, 2022, 11:34:55 AM2/17/22
to Python Brasil
Preciso importar um arquivo csv, organizando a Matriz onde os Filmes tem suas avaliações, porém também tem campos onde não foram avaliados (vazios).

Para que eu possa calcular essas avaliações, preciso importar essa Matrix sem que gere erro por causa dos campos vazios.

import numpy as np

import pandas as pd

df=pd.read_csv('matix.csv')
print(df)
L=np.asarray(df)
print("L")
print(L)
H=np.genfromtxt('matrix.csv' , skip_header=1, delimiter=',')
print("H")
print(H)


Alguém pode me ajudar a fazer isso funcionar?

practicaimportar.jpg


User,260: Star Wars: Episode IV - A New Hope (1977),1210: Star Wars: Episode VI - Return of the Jedi (1983),356: Forrest Gump (1994),"318: Shawshank Redemption, The (1994)","593: Silence of the Lambs, The (1991)",3578: Gladiator (2000),1: Toy Story (1995),2028: Saving Private Ryan (1998),296: Pulp Fiction (1994),1259: Stand by Me (1986),2396: Shakespeare in Love (1998),2916: Total Recall (1990),780: Independence Day (ID4) (1996),541: Blade Runner (1982),1265: Groundhog Day (1993),"2571: Matrix, The (1999)",527: Schindler's List (1993),"2762: Sixth Sense, The (1999)",1198: Raiders of the Lost Ark (1981),34: Babe (1995)
755,1,5,2,,4,4,2,2,,3,2,,5,2,5,4,2,5,,
5277,5,,,2,4,2,1,,,4,3,2,2,,2,,5,1,3,
1577,,,,5,2,,4,,,1,,1,4,4,1,1,2,3,1,3
4388,,3,,,,1,2,3,4,,,4,,3,5,,5,1,1,2
1202,4,3,4,1,4,1,,4,,1,5,1,,4,,3,5,5,,
3823,2,4,4,4,,,3,1,4,4,5,2,4,,1,,,3,,2
5448,4,,3,1,1,4,,5,2,,1,,,3,,1,,,5,2
5347,4,,,,3,2,2,,3,,,2,1,2,4,,1,3,5,
4117,5,1,,4,2,4,4,4,,1,2,,1,,5,,,,,5
2765,4,2,,5,3,,4,3,4,,,,2,,,2,5,1,,
5450,2,1,5,,,5,5,,,,,3,2,,,1,,2,1,4
139,3,5,2,,,,2,,1,,3,,3,,2,5,,,,2
1940,2,,,5,4,,4,5,,,,2,4,,3,,,,5,
3118,3,,3,,2,,3,,,4,,1,2,2,3,5,1,,,
4656,4,4,,,5,5,2,,3,5,,1,3,,2,,3,,3,1
4796,,,1,,3,2,,2,,1,5,,2,,,2,2,4,3,4
6037,,,,,,,2,,2,,2,,3,,3,4,,,,
3048,4,5,1,5,1,1,4,,5,,,,,4,,,2,1,2,5
4790,5,1,3,,,4,2,1,3,3,3,1,,,,2,,,,
4489,1,2,2,4,5,,2,3,2,2,1,,4,5,5,4,3,5,3,
User,260: Star Wars: Episode IV - A New Hope (1977),1210: Star Wars: Episode VI - Return of the Jedi (1983),356: Forrest Gump (1994),"318: Shawshank Redemption, The (1994)","593: Silence of the Lambs, The (1991)",3578: Gladiator (2000),1: Toy Story (1995),2028: Saving Private Ryan (1998),296: Pulp Fiction (1994),1259: Stand by Me (1986),2396: Shakespeare in Love (1998),2916: Total Recall (1990),780: Independence Day (ID4) (1996),541: Blade Runner (1982),1265: Groundhog Day (1993),"2571: Matrix, The (1999)",527: Schindler's List (1993),"2762: Sixth Sense, The (1999)",1198: Raiders of the Lost Ark (1981),34: Babe (1995)
755,1,5,2,,4,4,2,2,,3,2,,5,2,5,4,2,5,,
5277,5,,,2,4,2,1,,,4,3,2,2,,2,,5,1,3,
1577,,,,5,2,,4,,,1,,1,4,4,1,1,2,3,1,3
4388,,3,,,,1,2,3,4,,,4,,3,5,,5,1,1,2
1202,4,3,4,1,4,1,,4,,1,5,1,,4,,3,5,5,,
3823,2,4,4,4,,,3,1,4,4,5,2,4,,1,,,3,,2
5448,4,,3,1,1,4,,5,2,,1,,,3,,1,,,5,2
5347,4,,,,3,2,2,,3,,,2,1,2,4,,1,3,5,
4117,5,1,,4,2,4,4,4,,1,2,,1,,5,,,,,5
2765,4,2,,5,3,,4,3,4,,,,2,,,2,5,1,,
5450,2,1,5,,,5,5,,,,,3,2,,,1,,2,1,4
139,3,5,2,,,,2,,1,,3,,3,,2,5,,,,2
1940,2,,,5,4,,4,5,,,,2,4,,3,,,,5,
3118,3,,3,,2,,3,,,4,,1,2,2,3,5,1,,,
4656,4,4,,,5,5,2,,3,5,,1,3,,2,,3,,3,1
4796,,,1,,3,2,,2,,1,5,,2,,,2,2,4,3,4
6037,,,,,,,2,,2,,2,,3,,3,4,,,,

3048,4,5,1,5,1,1,4,,5,,,,,4,,,2,1,2,5
4790,5,1,3,,,4,2,1,3,3,3,1,,,,2,,,,
4489,1,2,2,4,5,,2,3,2,2,1,,4,5,5,4,3,5,3,

Marcelo Valle

unread,
Feb 17, 2022, 11:50:50 AM2/17/22
to Python Brasil
Eu não mexo com pandas o dia todo, então não vou saber de cor a função que você usa pra desconsiderar nulos... Mas com certeza você vai achar métodos de data frames que façam o que você quer. 

 A questao eh - de qual erro por conta de campos vazios você está falando?

Na sua mensagem, você não especificou o erro.

[]s

--
--
------------------------------------
Grupo Python-Brasil
https://wiki.python.org.br/AntesDePerguntar
 
<*> Para visitar o site do grupo na web, acesse:
http://groups.google.com/group/python-brasil
 
<*> Para sair deste grupo, envie um e-mail para:
python-brasi...@googlegroups.com
---
Você recebeu essa mensagem porque está inscrito no grupo "Python Brasil" dos Grupos do Google.
Para cancelar inscrição nesse grupo e parar de receber e-mails dele, envie um e-mail para python-brasi...@googlegroups.com.
Para ver essa discussão na Web, acesse https://groups.google.com/d/msgid/python-brasil/7db1e902-1a3d-442b-bcba-d1ec0c043ab7n%40googlegroups.com.

eduardo...@hotmail.com

unread,
Feb 17, 2022, 12:22:17 PM2/17/22
to Python Brasil
Olá Ravana, sendo objetivo nas sua pergunta, o método que irá chamar para remover os valores nulo é "dropna()" ou seja:


de toda forma em análise de dados é sempre bom avaliar se remover os valores vazios é a melhor estratégia, muitas vezes isso acaba com a sua base de dados e sobram apenas poucos valores para realizar o resto do seu trabalho, apenas avalie a sua situação.

Apenas para complementar, talvez vale a pena olhar esse material no Kaggle (https://www.kaggle.com/alexisbcook/handling-missing-values)

Qualquer problema é só falar.

Tiago Camponogara Tomazetti

unread,
Feb 17, 2022, 12:41:25 PM2/17/22
to python...@googlegroups.com
Uma opção seria utilizar o método df.dropna(inplace=True) depois fazer o np.mean(df['coluna_de_avaliação'])

--
Dr. Tiago Camponogara Tomazetti
Data Scientist - The Insight (https://www.theinsight.com.br/)
+55 (48) 9-9681-3848


Reply all
Reply to author
Forward
0 new messages