Write Multiple Text Files into a Single CSV File

88 views
Skip to first unread message

Martin Mirero

unread,
Dec 31, 2014, 12:54:10 AM12/31/14
to django...@googlegroups.com
Hi folks,

I'm just cutting my teeth on Python/Django and need some assistance on something I've been grappling with for a few days:
  • I have a bunch of text files on disk that all have the same basic format: first line is the title and the rest is the body
  • I want to create one CSV file with the first column being populated from the first line of each text file (title) and the second column being populated from the rest of the text file contents (body)
  • Once I have the CSV file, I'm good to import that into my django model
Any help would be much appreciated.

Cheers.

Vijay Khemlani

unread,
Dec 31, 2014, 9:30:05 AM12/31/14
to django...@googlegroups.com
I'm not too sure about the format of the content, but maybe this to create the file?

import csv

files = ['f1.txt', 'f2.txt', 'f3.txt']

with open('output.csv', 'wb') as f:
    writer = csv.writer(f)
    for input_file_name in files:
        with open(input_file_name, 'r') as input_file:
            lines = input_file.readlines()
            title = lines[0].strip()
            content = ' '.join(line.strip() for line in lines[1:])
            writer.writerow([title, content])


--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.
To post to this group, send email to django...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/07a9375c-4b05-42cc-a46e-0337ce96d6e7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Scot Hacker

unread,
Dec 31, 2014, 12:36:44 PM12/31/14
to django...@googlegroups.com
Depending on your use case, there may be an easier (or more efficient way) to skin this cat. Both mysql and postgres have syntax that will let you load an entire csv into a db table as a one-liner. To do that from within Django, you need to be able to run an external shell command, and the best way to do that is (often) with Fabric. And since this is a function you'll likely want to do on a regular basis, it makes sense to wrap it all up in a Django management command.

Here's an excerpt of some code I use to do something similar:


This is located in someapp/management/commands/import_courses.py . With this in place I can run

./manage.py import_courses

at any time. On each run, it:

- Drops a temporary table 
- Re-creates that temporary table according to a predefined postgres schema (contained in importer_courses.sql)
- Imports the CSV into that temporary table

In my case, I needed to be able to perform queries across the CSV data *before* importing some of it into my real models, but this turned out to be a wonderfully efficient way of handling CSV data in general. You don't have to drop and recreate the temp table each time - I chose to do it that way for my use case but season to taste.

To use this technique you'll need to `pip install fabric`, set a directory location where your CSVs are located (as IMPORTER_DATA_DIR in your settings), and read up a bit on Fabric and Django management commands. Then you'll need to modify it to handle multiple CSVs rather than just one.

./s


Reply all
Reply to author
Forward
0 new messages