Loading CSV file with WKT column

116 views
Skip to first unread message

Dusan Vasiljevic

unread,
May 27, 2018, 9:37:23 AM5/27/18
to GeoSpark Discussion Board
Hi,

I've been reading your excellent documentation and I think there is one incorrect part:

I've tried using this but I kept getting ArrayIndexOutOfBoundsException exception, looking at the source code, it seems that the FileDataSplitter.WKT requires a line that is in WKT format and cannot handle CSV file that has WKT as a column.

Can you please tell me what I should use when there is a CSV file which has WKT column?

Regards,
Dusan

Jia Yu

unread,
May 28, 2018, 3:40:26 AM5/28/18
to Dusan Vasiljevic, GeoSpark Discussion Board
Hi Dusan,

WKT file is supposed to be in a CSV file but the file delimiter should be TAB instead of comma. Because WKT field itself may have commas inside.

Thanks,
Jia

------------------------------------

Jia Yu,

Ph.D. Student in Computer Science


--
You received this message because you are subscribed to the Google Groups "GeoSpark Discussion Board" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geospark-discussion-board+unsub...@googlegroups.com.
To post to this group, send email to geospark-discussion-board@googlegroups.com.
Visit this group at https://groups.google.com/group/geospark-discussion-board.
To view this discussion on the web visit https://groups.google.com/d/msgid/geospark-discussion-board/45c74d32-b233-45bf-a182-b60fc00e8631%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dusan Vasiljevic

unread,
May 28, 2018, 4:23:21 AM5/28/18
to GeoSpark Discussion Board
Hi Jia,

Thank you for the reply.
What I meant was I have a CSV files with multiple columns out of which one has WTK linestring.

Example: 
801137461;736663492;14003;100;LINESTRING(173.05735 -41.48814,173.05706 -41.48804,173.05653 -41.48792,173.05634 -41.48792,173.05594 -41.48788,173.05562 -41.4878,173.0549 -41.48778)

How would you suggest I load this file optimally?

I've created my own implementation of the FlatMapFunction. Is that the way to approach this?

Regards,
Dusan


On Monday, May 28, 2018 at 7:40:26 PM UTC+12, Jia Yu wrote:
Hi Dusan,

WKT file is supposed to be in a CSV file but the file delimiter should be TAB instead of comma. Because WKT field itself may have commas inside.

Thanks,
Jia

------------------------------------

Jia Yu,

Ph.D. Student in Computer Science


On Sun, May 27, 2018 at 6:37 AM, Dusan Vasiljevic <dusa...@gmail.com> wrote:
Hi,

I've been reading your excellent documentation and I think there is one incorrect part:

I've tried using this but I kept getting ArrayIndexOutOfBoundsException exception, looking at the source code, it seems that the FileDataSplitter.WKT requires a line that is in WKT format and cannot handle CSV file that has WKT as a column.

Can you please tell me what I should use when there is a CSV file which has WKT column?

Regards,
Dusan

--
You received this message because you are subscribed to the Google Groups "GeoSpark Discussion Board" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geospark-discussion-board+unsub...@googlegroups.com.
To post to this group, send email to geospark-dis...@googlegroups.com.

Jia Yu

unread,
May 28, 2018, 4:51:21 AM5/28/18
to Dusan Vasiljevic, GeoSpark Discussion Board
Hi Dusan,

Since semicolon delimiter is a little special, you should use GeoSparkSQL API to create a generic SpatialRDD: http://datasystemslab.github.io/GeoSpark/tutorial/rdd/#create-a-generic-spatialrdd

1. Load data in GeoSparkSQL (make sure you add SparkSQL and GeoSparkSQL dependencies as well). The delimiter is semicolon

var df = sparkSession.read.format("csv").option("delimiter", ":").option("header", "false").load(csvPointInputLocation)
df.createOrReplaceTempView("inputtable")

2. Create a Geometry column using ST_GeomFromWKT

var spatialDf = sparkSession.sql(
    """
        |SELECT ST_GeomFromWKT(wktColumnName) AS checkin, otherColumn1, otherColumn2
        |FROM inputtable
    """.stripMargin)
spatialDf.createOrReplaceTempView("spatialDf")

3. Use GeoSpark DataFrame to RDD adapter to create the generic SpatialRDD. Make sure the geometry column is the first column of the spatialDf

var spatialRDD = new SpatialRDD[Geometry]
spatialRDD.rawSpatialRDD = Adapter.toRdd(spatialDf)


Thanks,
Jia

------------------------------------

Jia Yu,

Ph.D. Student in Computer Science


To unsubscribe from this group and stop receiving emails from it, send an email to geospark-discussion-board+unsubsc...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "GeoSpark Discussion Board" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geospark-discussion-board+unsub...@googlegroups.com.
To post to this group, send email to geospark-discussion-board@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages