TimeStamp Column in Data and general data ingestion

1,882 views
Skip to first unread message

karteek chada

unread,
Jul 7, 2015, 3:33:18 PM7/7/15
to druid...@googlegroups.com
Hi Druid Developers,

I am a newbie to Druid and as a first step i am trying to load data into druid. Pardon me if my question is too naive.

i am looking into doc 'http://druid.io/docs/latest/ingestion/' for data ingestion.

From what i can understand is having Time-stamp column in your data file mandatory. Is this true? All i have is a date column at beginning of the file.If this is not true not then what should i mention is the spec file below
"timestampSpec" : {
        "column" : "timestamp",
        "format" : "auto"
Below is the first line of data. Can you please let me know if this format is acceptable. values in red are dimension values and rest are metric values. For now i want to load metric values as is into druid without any aggregation.Should i load them as dimension values as well or should i load them still as metricSpec but since i am loading all dimensions into druid it wont aggregate them.


3/7/2015^20^1366318122374311707 ^2216341272195416704 ^0^N^1514^Y^USD^Unbranded^87.9900^1^1^1.8300^0.0000^0.0000^0.0000^0.0000^0.0000^1.8300^0^0.483^0^0^0^87.99

Regards
Karteek

ol...@adsquare.com

unread,
Jul 8, 2015, 12:58:04 PM7/8/15
to druid...@googlegroups.com
Hey,

i am not sure, if druid "auto" covers your specific timestamp format.

I assume you have day and month without leading zeros, so the format is probably "d/M/yyyy".


 DateTimeFormatter fmt = DateTimeFormat.forPattern("d/M/yyyy");
 System.out.println(fmt.print(System.currentTimeMillis()));

See also Timestamp Spec:


Cheers.

ol...@adsquare.com

unread,
Jul 8, 2015, 1:01:15 PM7/8/15
to druid...@googlegroups.com
Ah, and your other questions. Sure, looks acceptable to me. Delimiter is "^", which you have to specify for your csv format. And then you also specify all dimensions in the order they appear in your csv line. And then you specify those columns, that you wanna include or exclude from that.

karteek chada

unread,
Jul 8, 2015, 1:14:17 PM7/8/15
to druid...@googlegroups.com
Thank a lot olaf. Will try and keep you posted.

karteek

--
You received this message because you are subscribed to a topic in the Google Groups "Druid User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-user/MzSj1d_LQvI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/283ff44c-f448-4ba0-a905-381ab2884755%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Karteek Chada

karteek chada

unread,
Jul 9, 2015, 12:37:02 AM7/9/15
to druid...@googlegroups.com


Ok i tried loading the data ( batch ingestion)  and i get following error. As a first step i just tried to load 1 line of data and i  get following error. Can any one please help.? Also i am attaching the complete file we captured after running indexing command.

SEVERE: The RuntimeException could not be mapped to a response, re-throwing to the HTTP container
org.skife.jdbi.v2.exceptions.CallbackFailedException: org.skife.jdbi.v2.exceptions.UnableToExecuteStatementException: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Table 'druid.druid_segments' doesn't exist [statement:"SELECT id FROM druid_segments WHERE id = :identifier", located:"SELECT id FROM druid_segments WHERE id = :identifier", rewritten:"SELECT id FROM druid_segments WHERE id = ?", arguments:{ positional:{}, named:{identifier:'QBRData_2015-07-08T00:00:00.000Z_2015-07-09T00:00:00.000Z_2015-07-08T23:32:08.503Z'}, finder:[]}]


Data:
3/7/2015^20^1366318122374311707 ^2216341272195416704 ^0^N^1514^Y^USD^Unbranded^87.9900^1^1^1.8300^0.0000^0.0000^0.0000^0.0000^0.0000^1.8300^0^0.483^0^0^0^87.99

Indexing :
{
    "type": "index",
    "spec": {
        "dataSchema": {
            "dataSource": "QBRData",
            "parser": {
                "type": "string",
                "parseSpec": {
                    "format": "tsv",
                    "delimiter": "^",
                    "timestampSpec": {
                        "column": "transaction_reference_date",
                        "format": "M/d/yyyy"
                    },
                    "dimensionsSpec": {
                        "dimensions": [
                            "time_of_day",
                            "rcvr_id",
                            "sndr_id",
                            "pkey",
                            "is_pmt_xbdr_y_n",
                            "pmt_flow_key",
                            "is_pmt_official_y_n",
                            "transaction_currency",
                            "Transaction_Type"
                        ],
                        "dimensionExclusions": [],
                        "spatialDimensions": []
                    },
                    "columns": [
                        "transaction_reference_date",
                        "time_of_day",
                        "rcvr_id",
                        "sndr_id",
                        "pkey",
                        "is_pmt_xbdr_y_n",
                        "pmt_flow_key",
                        "is_pmt_official_y_n",
                        "transaction_currency",
                        "Transaction_Type"
                    ]
                }
            },
            "metricsSpec": [
                {
                    "type": "doubleSum",
                    "name": "NTPV_USD_AMT",
                    "fieldName": "NTPV_USD_AMT"
                },
                {
                    "type": "doubleSum",
                    "name": "NET_CNT",
                    "fieldName": "NET_CNT"
                },
                {
                    "type": "doubleSum",
                    "name": "SUCCESS_CNT",
                    "fieldName": "SUCCESS_CNT"
                },
                {
                    "type": "doubleSum",
                    "name": "BA_Revenue",
                    "fieldName": "BA_Revenue"
                },
                {
                    "type": "doubleSum",
                    "name": "Seller_fx_Revenue",
                    "fieldName": "Seller_fx_Revenue"
                },
                {
                    "type": "doubleSum",
                    "name": "Buyer_fx_Revenue",
                    "fieldName": "Buyer_fx_Revenue"
                },
                {
                    "type": "doubleSum",
                    "name": "XBDR_Revenue",
                    "fieldName": "XBDR_Revenue"
                },
                {
                    "type": "doubleSum",
                    "name": "Revenue_share",
                    "fieldName": "Revenue_share"
                },
                {
                    "type": "doubleSum",
                    "name": "Pass_Through_Revenue",
                    "fieldName": "Pass_Through_Revenue"
                },
                {
                    "type": "doubleSum",
                    "name": "Total_Sales_Revenue",
                    "fieldName": "Total_Sales_Revenue"
                },
                {
                    "type": "doubleSum",
                    "name": "ACH_Cost",
                    "fieldName": "ACH_Cost"
                },
                {
                    "type": "doubleSum",
                    "name": "CC_Cost",
                    "fieldName": "CC_Cost"
                },
                {
                    "type": "doubleSum",
                    "name": "PP_Loss",
                    "fieldName": "PP_Loss"
                },
                {
                    "type": "doubleSum",
                    "name": "Seller_Loss",
                    "fieldName": "Seller_Loss"
                },
                {
                    "type": "doubleSum",
                    "name": "Buyer_Loss",
                    "fieldName": "Buyer_Loss"
                },
                {
                    "type": "doubleSum",
                    "name": "LOC_CURR_NTPV",
                    "fieldName": "LOC_CURR_NTPV"
                }
            ],
            "granularitySpec": {
                "intervals": [
                    "2015-01-01/2015-12-31"
                ]
            }
        },
        "ioConfig": {
            "type": "index",
            "firehose": {
                "type": "local",
                "baseDir": "/x/home/tsoliman/druid/data/batch/",
                "filter": "SQLAExport.txt"
            }
        }
    }
}


druid_segments_not_found.txt

ol...@adsquare.com

unread,
Jul 9, 2015, 5:24:30 AM7/9/15
to druid...@googlegroups.com
Hm, so druid can connect to mysql, but the table "druid_segments" have not been created. Can you check, if the user that you connect with to the database, has permission to create tables? Did ya check the logfiles?

Also, why you use tsv? And not csv? The metricsSpec also looks a bit weird. You reference columns, that you did not define.

karteek chada

unread,
Jul 9, 2015, 12:50:00 PM7/9/15
to druid...@googlegroups.com

Olaf,

 i will check on the permission for create table in my sql.

Regrading the data file itself 

  1. I used TSV because for csv, in  parspec there is no option to use delimiter. all i can use is list listDelimiter which is for multi-value dimension.i got bit confused there i and i just went ahead with TSV.
  2. Regarding your concerns with metricsSpec, should i refer all those columns in  "columns" : [] ? Please let me know.
Karteek

karteek chada

unread,
Jul 10, 2015, 12:57:00 AM7/10/15
to druid...@googlegroups.com
Olaf,

Thanks a lot for all the help. we were able to load and index data and were able to retrieve it through a query today.

Karteek

anusha shetty

unread,
Jun 7, 2019, 3:17:02 AM6/7/19
to Druid User
Hi Karteek ,
sorry to bump this very old message ,, 
what was the resolution for your issue ? 

What "format " did you use in your timestamp spec ?

Thanks,
Anoosha
Reply all
Reply to author
Forward
0 new messages