[postgis-users] Import CSV (was: Noob question with shp2pgsql)

65 views
Skip to first unread message

Margie Roswell

unread,
Apr 13, 2013, 8:54:24 AM4/13/13
to PostGIS Users Discussion
I figured out that COPY is used to import a file into a table.

(Actually, even though I don't speak a word of Portuguese, a Portuguese video did a great job of showing copying first into a temp table: https://www.youtube.com/watch?v=CwsnPPub9v4 )

But the shp2pgsql thread yesterday got me thinking: to import a shapefile, they've created a utility so that we don't have to set up the structure of the table in advance

Is there something similar on the CSV side?

My guess is that http://www.safe.com/solutions/for-databases/postgis/
might have something, but I can't quite put my finger on it.

Details on that?

Also, I'm sure there's a fee for that. Are there any other strategies for making the table creation more efficient, when importing a file to a table?

I suppose I could copy and paste the field names from the top row in the original Excel spreadsheet, and then manually reformat them into a CREATE NEW TABLE statement by adding all the field types. What strategies (like the shp2pgsql utility?) reduce the pain of importing a text file?

Margie



On Fri, Apr 12, 2013 at 6:14 PM, David Rush <da...@rushtone.com> wrote:
Total noob to PostgreSQL and PostGIS here.  Trying to follow examples from the Obe+Hsu book (1st Ed) in using shp2pgsql from the command line to import some tiger county data.

I ran this:

shp2pgsql -s 4269 -g geom_4269 -W LATIN1 c:/users/david/downloads/tl_2012_us_county/tl_2012_us_county.shp public.us_counties psql -h localhost -U postgres -p 5432 -d mygisdb 

Thanks to an archive of this list that led me to add the "-W LATIN1" param (it was failing with an error w/out it).

Now the command runs for several minutes, spitting out mostly zillions of hex digits, with no overt errors.  Last line it spits out is "COMMIT;".

But when I go into psql, I can't find the public.us_counties table that I thought I just added created:

mygisdb=# select * from public.us_counties;
ERROR:  relation "public.us_counties" does not exist
LINE 1: select * from public.us_counties;
                      ^
mygisdb=# select table_schema, table_name,table_type from information_schema.tables where
table_schema not in ('pg_catalog','information_schema');
 table_schema |    table_name     | table_type
--------------+-------------------+------------
 public       | geography_columns | VIEW
 public       | geometry_columns  | VIEW
 public       | spatial_ref_sys   | BASE TABLE
 ch01         | lu_franchises     | BASE TABLE
 ch01         | fastfoods         | BASE TABLE
(5 rows)

Poking around with pgAdmin III I can't find in anywhere, either.

Is the new table us_counties hiding somewhere?  Or did it quietly fail?  Or what?

David

_______________________________________________
postgis-users mailing list
postgi...@lists.osgeo.org
http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users


Richard Greenwood

unread,
Apr 13, 2013, 9:40:07 AM4/13/13
to PostGIS Users Discussion
On Sat, Apr 13, 2013 at 6:54 AM, Margie Roswell <mros...@gmail.com> wrote:

But the shp2pgsql thread yesterday got me thinking: to import a shapefile, they've created a utility so that we don't have to set up the structure of the table in advance

Is there something similar on the CSV side?

ogr2ogr does a wonderful job of that.

Rich

pcr...@pcreso.com

unread,
Apr 13, 2013, 3:16:03 PM4/13/13
to PostGIS Users Discussion
Hi Margie,

A shapefile is a structured set of files with self-documented column names & data types - so contains all the information required to create an equivalent table in a standard way. A CSV is just a text file, which has no typing information & may or may not have column names included. It does not necessarily contain the information describing the columns which is required to create an appropriate database table.

So importing a shapefile & csv are not that comparable. If you come up with a standard CSV format, which includes column names & typing info, you could automate the process, even try to auto-type the columns based on content, but it would only work for CSV's which conformed to your spec.

There are tools to assist with this operation, MapForce is a commercial one, others can probably found here:
http://wiki.postgresql.org/wiki/Community_Guide_to_PostgreSQL_GUI_Tools 

Brent Wood

--- On Sun, 4/14/13, Margie Roswell <mros...@gmail.com> wrote:
-----Inline Attachment Follows-----

Nathan Hemenway

unread,
Apr 13, 2013, 3:41:24 PM4/13/13
to PostGIS Users Discussion
As Richard Greenwood noted, ogr2ogr works great for importing CSV files into Postgres tables.
In fact, your CSV file does not necessarily even need to have any geometry related columns for this to work.

It is all documented here very nicely:

http://www.gdal.org/ogr/drv_csv.html
-- 
.nathan.

MJ

unread,
Apr 14, 2013, 7:10:28 AM4/14/13
to PostGIS Users Discussion
Parsing CSV files is one of the nastiest computing problems around. Very frequently, CSV files will have unparsable lines. Ogr2ogr is not going to solve this problem for you - it will only  consume Correctly Formatted CSV files.

Before you can even get around to handling the problem of unparsable lines, oftentimes a character set conversion is required. There are, unfortunately, way too many folks who publish CSV files, shapefiles, and SQL dumps which contain UTF-8 multibyte encoding sequences saved in ISO-8859-1 file encoding. I need to run iconv on roughly 90% of the shapefiles I load with shp2pgsql - generally any shapefile (or CSV or SQL dump) which was produced by a North American or Western European person which contains international data. This class of folk seem to believe that since ISO 8859-1 or ISO 8859-15 works for their own character set, it works for the entire world. In 2013, there is absolutely no reason for anyone on this planet to be encoding in something other than UTF-8 - disk space and bandwidth is cheap enough now and in the areas of the world where it's not yet cheap enough, UTF-8 is the only choice anyway.

What causes unparsable lines in CSV? Quotes where there aren't supposed to be, missing quotes,missing fields, ambiguously utilised and unescaped delimiter characters, etc. Manual correction is difficult when you are handling, for example, an 80 thousand line file.

Here is a tool I wrote to fix CSV files from one particularly nasty source. It changes a file delimited by commas into a file delimited by tabs, as well as correcting a whole host of other common problems. I have found that it works quite well, in general, for multiple sources of nastily encoded CSV files.


Use it like this:

fix-csv.pl nasty.csv > fixed.csv



#!/usr/bin/perl -w

while (<>)
{
  # 1. remove ^M
  $_ =~ s/\r//g;

  # 2. change commas at beginning of line to tabs
  $_ =~ s/^,/\t/;

  # 3. change "," to "\t" ("tab")
  $_ =~ s/","/"\t"/g;

  # 4. change ", to \t (tab)
  $_ =~ s/",/\t/g;

  # 5. change ," to \t (tab)
  $_ =~ s/,"/\t/g;

  # 6. change \t, to \t\t (double tab)
  $_ =~ s/\t,/\t\t/g;

  # 7. change \t, to \t\t (double tab)
  $_ =~ s/\t,/\t\t/g;

  # 8. change \t, to \t\t (double tab)
  $_ =~ s/\t,/\t\t/g;

  # 9. change \t, to \t\t (double tab)
  $_ =~ s/\t,/\t\t/g;

  # 10. remove quotes
  $_ =~ s/"//g;

  print $_;
}


-mike
Reply all
Reply to author
Forward
0 new messages