UnicodeDecodeError

Domingo Vargas

unread,

Apr 22, 2012, 1:04:08 PM4/22/12

to networkx-discuss

Hi All

I getting this error when I try to write a Graphml file:

Traceback (most recent call last):
File "./p3.py", line 277, in <module>
net_x.write_graphml(G, filename + 'netx.xml',encoding ='UTF-8')
File "<string>", line 2, in write_graphml
File "/Library/Python/2.7/site-packages/networkx-1.6-py2.7.egg/
networkx/utils/decorators.py", line 193, in _open_file
result = func(*new_args, **kwargs)
File "/Library/Python/2.7/site-packages/networkx-1.6-py2.7.egg/
networkx/readwrite/graphml.py", line 83, in write_graphml
writer.dump(path)
File "/Library/Python/2.7/site-packages/networkx-1.6-py2.7.egg/
networkx/readwrite/graphml.py", line 322, in dump
document.write(stream, encoding=self.encoding)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/
python2.7/xml/etree/ElementTree.py", line 815, in write
serialize(write, self._root, encoding, qnames, namespaces)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/
python2.7/xml/etree/ElementTree.py", line 934, in _serialize_xml
_serialize_xml(write, e, encoding, qnames, None)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/
python2.7/xml/etree/ElementTree.py", line 934, in _serialize_xml
_serialize_xml(write, e, encoding, qnames, None)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/
python2.7/xml/etree/ElementTree.py", line 927, in _serialize_xml
v = _escape_attrib(v, encoding)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/
python2.7/xml/etree/ElementTree.py", line 1085, in _escape_attrib
return text.encode(encoding, "xmlcharrefreplace")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
17: ordinal not in range(128)

My system is a Mac running OSX 10.7.3 and I am processing text files
from a Windows System that I previously converted to UTF-8 using
iconv. Up to date I haven't any problems processing these files except
now that I am trying to use networkx.

Any ideas where to look at?

Your comments are highly appreciated

best regards

-dv

Aric Hagberg

unread,

Apr 22, 2012, 2:46:02 PM4/22/12

to networkx...@googlegroups.com

Could you post a short example code that produces the error?

Since you are using Python2.7, here is a modified version of the
networkx heavy_metal_umlaut.py example that shows how to write a utf-8
encoded graphml file with python2.x. The "strings" hd and mh are
Python unicode types.

import networkx as nx
hd='H' + unichr(252) + 'sker D' + unichr(252)
mh='Mot' + unichr(246) + 'rhead'
G=nx.Graph()
G.add_edge(hd,mh)
nx.write_graphml(G,'test.graphml',encoding='utf-8')

Aric

Domingo Vargas

unread,

Apr 22, 2012, 8:07:05 PM4/22/12

to networkx-discuss

Dear Aric

Here is a section of the code that produces the error:

import networkx as net_x

"""
record is a dictionary taken from a .csv file and processed with
regular expressions in order to clean up and properly define the
nodes...
"""

for n in record.keys():

ids = record[n].keys()
authors = set()

for x in ids:

authors.add(record[n][x]['author'])

if len(authors_xy.intersection(authors)) > 1 and len(ids)>1:

s = itertools.combinations(ids,2)

for pair in s:

author0 = record[n][pair[0]]['author']
author1 = record[n][pair[1]]['author']

university0 = record[n][pair[0]]['university']
university1 = record[n][pair[1]]['university']

country0 = record[n][pair[0]]['country']
country1 = record[n][pair[1]]['country']

national0 = record[n][pair[0]]['national']
national1 = record[n][pair[1]]['national']

t = record[n][pair[0]]['time']
jif = record[n][pair[0]]['jif']
unesco = record[n][pair[0]]['unesco']
numero = record[n][pair[0]]['numero']

"""
The links values are taken from pair[0], because all
registries share
the same links details this doesn't change the attributes
"""

G.add_node(author0,label = author0, university = university0,
country = country0, national = national0, time = t)
G.add_node(author1,label = author1, university = university1,
country = country1, national = national1, time = t)
G.add_edge(author0,author1, jif = jif, unesco = unesco,
numero = numero, time = t)

net_x.write_graphml(G, filename + 'netx.xml',encoding ='UTF-8')

On Apr 22, 2:46 pm, Aric Hagberg <aric.hagb...@gmail.com> wrote:

Domingo Vargas

unread,

Apr 22, 2012, 8:10:11 PM4/22/12

to networkx-discuss

I forgot to include:

G = net_x.MultiGraph()

Thanks in advance for your help!

-dv

Aric Hagberg

unread,

Apr 22, 2012, 8:13:51 PM4/22/12

to networkx...@googlegroups.com

On Sun, Apr 22, 2012 at 6:10 PM, Domingo Vargas <dvar...@gmail.com> wrote:
> I forgot to include:
>
> G = net_x.MultiGraph()
>
> Thanks in advance for your help!

For us to help you'll need to post a working short example code that
demonstrates the problem. You might have to do a little work to
isolate which nodes or edges are causing the error.

Aric

Domingo Vargas

unread,

Apr 22, 2012, 10:25:15 PM4/22/12

to networkx-discuss

Dear Aric

Here is a functional piece of code that produces the error:

#!/usr/bin/python

import sys
import re
import csv
import itertools
import operator
import networkx as nx
from collections import defaultdict

"""
Main section
"""

#input parsing
try:
argv2 = sys.argv[2]
if argv2 == 'all':
formats = ['all']
except:
formats = ['net','paj']

# I/O setup
data = csv.DictReader(open(sys.argv[1], 'rb'), delimiter=',',
quotechar='"', quoting=csv.QUOTE_MINIMAL)

# global vars
record = defaultdict(dict)
iline = 0

# db reading & data gathering

for line in data:

author = unicode(line['nombre'] + ' ' +
line['apellido'],'iso-8859-1').encode('utf-8')
s = {'author': author}

iline +=1
record[line['numero']][iline] = s

G = nx.MultiGraph()

for n in record.keys():

authors = record[n].keys()

if len(authors)>1:
s = itertools.combinations(authors,2)

for pair in s:

author0 = record[n][pair[0]]['author']
author1 = record[n][pair[1]]['author']

G.add_node(author0)
G.add_node(author1)
G.add_edge(author0,author1)

nx.write_graphml(G, './example.xml',encoding ='utf-8')

#######

The input file is the following:

numero,nombre,apellido
4023,JazzmÌn,HenrÌquez
4545,JosÈ,Graterol
4545,Coromoto,MartÌnez
4545,Anderson,Gonz·lez
5375,W Ronald,Heyer
5375,CÈsar,AmorÛs
7537,JosÈ,Bravo
7537,Nora,Gonz·lez
7537,Walter,Gonz·lez
7538,JosÈ,Bravo
7538,Elizabeth,Gonz·lez
7538,Walter,Gonz·lez
8067,MarÌa,RondÛn
8067,JoaquÌn,Buitrago
8067,Michael,Mccoy
13253,JosÈ Luis, Perasa
13253,MarÌa,PernÌa
14331,Fabiola,GarcÌa
14331,SofÌa,Mata Quintero

On Apr 22, 8:13 pm, Aric Hagberg <aric.hagb...@gmail.com> wrote:

Aric Hagberg

unread,

Apr 22, 2012, 10:50:55 PM4/22/12

to networkx...@googlegroups.com

That name data doesn't look correctly encoded in my email. And I get
an error when I run your program.
It's likely you are not reading the data file with the correct
encoding. Note that you need to decode from whatever encoding you are
using in the data file when you read it into Python.

Aric

Domingo Vargas

unread,

Apr 23, 2012, 10:23:18 PM4/23/12

to networkx-discuss

Dear Aric

Thanks for your help.

Problem solved!

It was a problem with the text conversion from windows to mac.

Best regards

-dv

On Apr 22, 10:50 pm, Aric Hagberg <aric.hagb...@gmail.com> wrote:

Reply all

Reply to author

Forward