Re: Scrapy: how to change COMMA delimiter to TAB in CSV exporter?

2,628 views
Skip to first unread message

Ellison Marks

unread,
Jan 28, 2013, 1:46:08 PM1/28/13
to scrapy...@googlegroups.com
From the scrapy docs:

The additional keyword arguments of this constructor are passed to the BaseItemExporter constructor, and the leftover arguments to the csv.writer constructor, so you can use any csv.writer constructor argument to customize this exporter.

So you'll want to pass some format option to the constructor. See: http://docs.python.org/2/library/csv.html#csv-fmt-params

On Monday, January 28, 2013 5:29:39 AM UTC-8, ryszard wrote:
I need to change default COMMA delimiter to TAB in CSV exporter,
since my data do contain commas.

Is there a way to do it without diving deep into the code (../scrapy/contrib/exporter/__init__.py)?
and/or creating my own exporter?

I will be very helpful for all suggestions.

R

ryszard

unread,
Jan 29, 2013, 12:57:45 PM1/29/13
to scrapy...@googlegroups.com
So, it seems ...scrapy/contrib/exporter/__init__.py needs to be modified to include delimiter

[...]
class CsvItemExporter(BaseItemExporter):
        self.csv_writer = csv.writer(file, delimiter='\t', **kwargs)
[...]

It would be nice if this could be done e.g. in settings.py, e.g.
CSV_DELIMITER = '\t'

otherwise every time I get a new version I need to make this modification :(

Ellison Marks

unread,
Jan 29, 2013, 2:30:39 PM1/29/13
to scrapy...@googlegroups.com
No, the exporter itself does not need to be modified. When you construct your CSVExporter, simply pass it a delimiter argument, like so:

from scrapy.contrib.exporter import CsvItemExporter 
 
exporter = CsvItemExporter( somefile, delimiter = '\t')
 
That argument will be passed to the underlying csv writer.

ryszard

unread,
Jan 29, 2013, 3:13:48 PM1/29/13
to scrapy...@googlegroups.com
This looks promising, but where do I do it?

So far I was just using basic code generated with
$ scrapy startproject... and
$ scrapy genspider...
and command "scrapy scrapy crawl spider -o spider.tsv -t csv"
which is using built-in csv exporter...

If I create new exported as you suggested, how do I instruct scrapy to use it?

As you can see I am new to scrapy :)

R

Ellison Marks

unread,
Jan 29, 2013, 5:00:34 PM1/29/13
to scrapy...@googlegroups.com
As it would be slightly unfeasible to give the entire tutorial in this context, I shall link to it instead.

https://scrapy.readthedocs.org/en/latest/intro/tutorial.html

That's on the scrapy docs website, and should give you a good idea of how a spider is created. At the bottom, you'll see the export method you're using now, called feed exports. To go beyond that, these pages should prove useful.

https://scrapy.readthedocs.org/en/latest/topics/item-pipeline.html

https://scrapy.readthedocs.org/en/latest/topics/exporters.html

Ellison Marks

unread,
Jan 29, 2013, 7:34:31 PM1/29/13
to scrapy...@googlegroups.com
You replied to me instead of the list.

While that would be easier, that's changing the source code. Any update to scrapy will blow away your changes. You can do it the long way (which isn't that bad, and pretty much required for any advanced use of scrapy), you can ask Pablo for a feed exporter setting to control the delimiter, or you can adapt whatever you're using to read the CSV to accept commas.
 
Well... all in all, it seems that this single line change is the simplest solution:
...scrapy/contrib/exporter/__init__.py

[...]
class CsvItemExporter(BaseItemExporter):
        self.csv_writer = csv.writer(file, delimiter='\t', **kwargs)
[...]

R

ryszard

unread,
Jan 29, 2013, 8:45:35 PM1/29/13
to scrapy...@googlegroups.com
> you can ask Pablo for a feed exporter setting to control the delimiter,
This would be the cleanest long term solution.

How do I ask Pablo?

> or you can adapt whatever you're using to read the CSV to accept commas.
The problem for me is not the reader, but rather the fact that some of my fields
contain commas.

Thank you for all your help.

R

Ellison Marks

unread,
Jan 30, 2013, 4:24:57 AM1/30/13
to scrapy...@googlegroups.com
He usually reads and responds to the list in batches, so he'll probably see this convo in a couple days.

Shane Evans

unread,
Jan 30, 2013, 4:27:28 AM1/30/13
to scrapy...@googlegroups.com
opening a github issue is probably the best idea, especially if there is some consensus that it's a good idea. A pull request is even better, and will certainly get integrated into scrapy much faster.

ryszard

unread,
Feb 6, 2013, 12:09:45 PM2/6/13
to scrapy...@googlegroups.com
> A pull request is even better, and will certainly get integrated into scrapy much faster
How do I create a pull request?

bcddd214

unread,
Feb 6, 2013, 2:59:13 PM2/6/13
to scrapy...@googlegroups.com
sed 's/,/\t/g' stuffin.csv >> stuffout.csv

ryszard

unread,
Feb 6, 2013, 3:25:33 PM2/6/13
to scrapy...@googlegroups.com
My goal is to use TAB as separator instead COMMA in output file from scrapy,
because my data do contain commas.

This (if it would work) would replace ALL commas to tabs in the output file (i.e. commas used as separator by CSV exporter from scrapy as well as commas in my data (which I do not want to replace)
sed 's/,/\t/g' stuffin.csv >> stuffout.csv
but the above line does not work as intended... the command below should be used for this putpose:
$ tr ',' '\t' < stuffin.csv > stuffout.csv

Steven Almeroth

unread,
Mar 30, 2013, 11:34:01 PM3/30/13
to scrapy...@googlegroups.com
I just added a pull request: https://github.com/scrapy/scrapy/pull/279/files

stav

unread,
Mar 30, 2013, 11:49:01 PM3/30/13
to scrapy...@googlegroups.com
ryszard <ryszard.czerminski@...> writes:

>
>
> I need to change default COMMA delimiter to TAB in CSV exporter,
> since my data do contain commas.
>
> Is there a way to do it without diving deep into the code
(../scrapy/contrib/exporter/__init__.py)?
> and/or creating my own exporter?
>
> I will be very helpful for all suggestions.
>
> R
>
>
>


You could try this: https://groups.google.com/d/msg/scrapy-
users/KTkP9kehoPI/OGbd7-pPKqMJ

or wait for this: https://github.com/scrapy/scrapy/pull/279/files

Reply all
Reply to author
Forward
0 new messages