When I call the command read.avro('/path/to/my/file.avro') from R it works fine, so I believe that my avro file is fine.Regards,Renan
On Tuesday, September 16, 2014 6:39:55 PM UTC-3, Antonio Piccolboni wrote:You can check this example https://github.com/RevolutionAnalytics/rmr2/blob/15a0dbd7233087cfd7b020f90e317d65a207c872/pkg/examples/avro.RBuilt in support is expected for the next minor release, already in dev branch, input-onlyAntonio
On Monday, September 15, 2014 11:39:52 AM UTC-7, sai krishna bala wrote:i was looking into the mapreduce function provided by rmr library and just wanted to know if there is any support to read the data which is in avro serialized format.Right now, my data is avro serialized and i want to extract some aggregated metrics from the logs using mapreduce functionality in Rhadoop with make.input.format=avro and schema=<path_of_schema>. Can this be done?
--
post: rha...@googlegroups.com ||
unsubscribe: rhadoop+u...@googlegroups.com ||
web: https://groups.google.com/d/forum/rhadoop?hl=en-US
---
You received this message because you are subscribed to the Google Groups "RHadoop" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rhadoop+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
It'd be better if you shared your code exactly. There is a way, but it is not your way. That's all I can tell. Luckily for you I run the unit tests for avro input in dev and they don't pass. So you don't need to take on the humble task of actually sharing your code, in addition to your opinion. I will let you know what I find out.That AVRO_LIBS setting is not documented very well, is it? That's got to improve before we release. If you have ravro installed, take a look at ravro::AVRO_TOOLS. Setting them the same way should do.Antonio
When I call the command read.avro('/path/to/my/file.avro') from R it works fine, so I believe that my avro file is fine.Regards,Renan
On Tuesday, September 16, 2014 6:39:55 PM UTC-3, Antonio Piccolboni wrote:You can check this example https://github.com/RevolutionAnalytics/rmr2/blob/15a0dbd7233087cfd7b020f90e317d65a207c872/pkg/examples/avro.RBuilt in support is expected for the next minor release, already in dev branch, input-onlyAntonio
On Monday, September 15, 2014 11:39:52 AM UTC-7, sai krishna bala wrote:i was looking into the mapreduce function provided by rmr library and just wanted to know if there is any support to read the data which is in avro serialized format.Right now, my data is avro serialized and i want to extract some aggregated metrics from the logs using mapreduce functionality in Rhadoop with make.input.format=avro and schema=<path_of_schema>. Can this be done?
web: https://groups.google.com/d/forum/rhadoop?hl=en-US
---
You received this message because you are subscribed to the Google Groups "RHadoop" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rhadoop+unsubscribe@googlegroups.com.
Alright, so the only problem with the failed tests was the setting of AVRO_LIBS. We need to document that clearly. For now start with a
Sys.setenv(AVRO_LIBS = ravro::AVRO_TOOLS)If that doesn't do it please share a small example and I'll try to run it. Thanks
On Wednesday, September 17, 2014 2:42:25 PM UTC-7, Antonio Piccolboni wrote:
It'd be better if you shared your code exactly. There is a way, but it is not your way. That's all I can tell. Luckily for you I run the unit tests for avro input in dev and they don't pass. So you don't need to take on the humble task of actually sharing your code, in addition to your opinion. I will let you know what I find out.That AVRO_LIBS setting is not documented very well, is it? That's got to improve before we release. If you have ravro installed, take a look at ravro::AVRO_TOOLS. Setting them the same way should do.Antonio
When I call the command read.avro('/path/to/my/file.avro') from R it works fine, so I believe that my avro file is fine.Regards,Renan
On Tuesday, September 16, 2014 6:39:55 PM UTC-3, Antonio Piccolboni wrote:You can check this example https://github.com/RevolutionAnalytics/rmr2/blob/15a0dbd7233087cfd7b020f90e317d65a207c872/pkg/examples/avro.RBuilt in support is expected for the next minor release, already in dev branch, input-onlyAntonio
On Monday, September 15, 2014 11:39:52 AM UTC-7, sai krishna bala wrote:i was looking into the mapreduce function provided by rmr library and just wanted to know if there is any support to read the data which is in avro serialized format.Right now, my data is avro serialized and i want to extract some aggregated metrics from the logs using mapreduce functionality in Rhadoop with make.input.format=avro and schema=<path_of_schema>. Can this be done?
web: https://groups.google.com/d/forum/rhadoop?hl=en-US
---
You received this message because you are subscribed to the Google Groups "RHadoop" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rhadoop+u...@googlegroups.com.
Hi Piccolboni,
I wrote this piece of code based on tests and dev branch.
I saw the test that you said and I noticed that the schema is gotten from a temporary file which is local.
When I run the command ravro::read.avro('/path/to/local/file.avro') it works fine but if I run ravro::read.avro('hdfs://localhost:8020/path/to/hdfs/file.avro') it fails.
I'm not using the dev branch in fact cause I'm not sure if it is compatible with the version 3.2.0 (packaged and available in the wiki).
Is it ok to change to dev branch or I can face with compatibility issues?
On Wed, Sep 17, 2014 at 4:15 PM, Renan Pinzon <rpi...@gmail.com> wrote:Hi Piccolboni,I wrote this piece of code based on tests and dev branch.You will forgive a degree of perplexity when I see the code of the library pasted back in a message. Libraries are used with library(), not by cut and paste.
I saw the test that you said and I noticed that the schema is gotten from a temporary file which is local.And?
When I run the command ravro::read.avro('/path/to/local/file.avro') it works fine but if I run ravro::read.avro('hdfs://localhost:8020/path/to/hdfs/file.avro') it fails.That't talking. Will try to repro. You just did an hdfs dfs -put local.file.avro hdfs.file.avro, correct?
I'm not using the dev branch in fact cause I'm not sure if it is compatible with the version 3.2.0 (packaged and available in the wiki).You mean as in backward compatibility? It is supposed to. Backwards compatibility is determined by the highest number (same first number). Data compatibility has been a little more shaky of late and in general it's not recommended to use the "native" format for archival purposes.
Is it ok to change to dev branch or I can face with compatibility issues?Sorry, if you build from git there's no guarantee whatsoever, independent of branch. I mean, legally there isn't any warranty for anything but for releases we at least tested them. Between commits, you are living dangerously. Listen, I could have told you not-available-yet-sorry end of conversation. We have a chance to develop this thing together, but you can't have it in production tomorrow.
On Wednesday, September 17, 2014 8:44:43 PM UTC-3, Antonio Piccolboni wrote:On Wed, Sep 17, 2014 at 4:15 PM, Renan Pinzon <rpi...@gmail.com> wrote:Hi Piccolboni,I wrote this piece of code based on tests and dev branch.You will forgive a degree of perplexity when I see the code of the library pasted back in a message. Libraries are used with library(), not by cut and paste.Yes, I'm sure about this, but I wrote this piece of code to create an avro input format cause it's only available on dev branch which I'm not secure to use, so I'm trying to do in my own code running the latest released version of the library.
I saw the test that you said and I noticed that the schema is gotten from a temporary file which is local.And?I was wondering about reading the schema from the same file in HDFS that contains the entire data.
When I run the command ravro::read.avro('/path/to/local/file.avro') it works fine but if I run ravro::read.avro('hdfs://localhost:8020/path/to/hdfs/file.avro') it fails.That't talking. Will try to repro. You just did an hdfs dfs -put local.file.avro hdfs.file.avro, correct?Yes, the file is already in HDFS. If I run the command hdfs dfs -ls hdfs://localhost:8020/path/to/hdfs I can list the contents of the directory and see the file there, so the localhost:8020 is correct and accessible.
On Thu, Sep 18, 2014 at 7:41 AM, Renan Pinzon <rpi...@gmail.com> wrote:
On Wednesday, September 17, 2014 8:44:43 PM UTC-3, Antonio Piccolboni wrote:On Wed, Sep 17, 2014 at 4:15 PM, Renan Pinzon <rpi...@gmail.com> wrote:Hi Piccolboni,I wrote this piece of code based on tests and dev branch.You will forgive a degree of perplexity when I see the code of the library pasted back in a message. Libraries are used with library(), not by cut and paste.Yes, I'm sure about this, but I wrote this piece of code to create an avro input format cause it's only available on dev branch which I'm not secure to use, so I'm trying to do in my own code running the latest released version of the library.So the way software engineers speak about is is that you tried to backport the avro format to the current stable version. You can't say "I wrote" for stuff that someone else wrote. That's called plagiarism. If you write that in an email to the author of the plagiarized code, there's other ways to describe it, none of which is flattering enough for me to use it on this forum.
I saw the test that you said and I noticed that the schema is gotten from a temporary file which is local.And?I was wondering about reading the schema from the same file in HDFS that contains the entire data.Well the manual is pretty clear on this (make.input.format, "avro" entry)"(input only) It has one mandatory additional argument, schema.file that should provide the URL of a file containing an appropriate avro schema, can be the same as file to be read. The user can specify the protocol, for instance file: or hdfs: as part of the URL, with the first being the default."I am having problem with this feature though, let me get back to you.
When I run the command ravro::read.avro('/path/to/local/file.avro') it works fine but if I run ravro::read.avro('hdfs://localhost:8020/path/to/hdfs/file.avro') it fails.That't talking. Will try to repro. You just did an hdfs dfs -put local.file.avro hdfs.file.avro, correct?Yes, the file is already in HDFS. If I run the command hdfs dfs -ls hdfs://localhost:8020/path/to/hdfs I can list the contents of the directory and see the file there, so the localhost:8020 is correct and accessible.Well, I need to be sure it's absolutely the same file. I will send a script that we can both try.
On Thursday, September 18, 2014 1:36:16 PM UTC-3, Antonio Piccolboni wrote:On Thu, Sep 18, 2014 at 7:41 AM, Renan Pinzon <rpi...@gmail.com> wrote:
On Wednesday, September 17, 2014 8:44:43 PM UTC-3, Antonio Piccolboni wrote:On Wed, Sep 17, 2014 at 4:15 PM, Renan Pinzon <rpi...@gmail.com> wrote:Hi Piccolboni,I wrote this piece of code based on tests and dev branch.You will forgive a degree of perplexity when I see the code of the library pasted back in a message. Libraries are used with library(), not by cut and paste.Yes, I'm sure about this, but I wrote this piece of code to create an avro input format cause it's only available on dev branch which I'm not secure to use, so I'm trying to do in my own code running the latest released version of the library.So the way software engineers speak about is is that you tried to backport the avro format to the current stable version. You can't say "I wrote" for stuff that someone else wrote. That's called plagiarism. If you write that in an email to the author of the plagiarized code, there's other ways to describe it, none of which is flattering enough for me to use it on this forum.I understand what you say, but I only said that "I wrote" cause I was trying another solution using avro-utils (https://github.com/tomslabs/avro-utils) which requires to create an input format similar to this one and then I changed the code to this. In fact now the code is a bit different because I removed the line that reads the schema from the local file and placed the schema there, but it didn't solved the problem yet.I saw the test that you said and I noticed that the schema is gotten from a temporary file which is local.And?I was wondering about reading the schema from the same file in HDFS that contains the entire data.Well the manual is pretty clear on this (make.input.format, "avro" entry)"(input only) It has one mandatory additional argument, schema.file that should provide the URL of a file containing an appropriate avro schema, can be the same as file to be read. The user can specify the protocol, for instance file: or hdfs: as part of the URL, with the first being the default."I am having problem with this feature though, let me get back to you.Ok, I'll be waiting and I'll also package the git dev branch and try to use it.
On Thu, Sep 18, 2014 at 10:19 AM, Renan Pinzon <rpi...@gmail.com> wrote:
On Thursday, September 18, 2014 1:36:16 PM UTC-3, Antonio Piccolboni wrote:On Thu, Sep 18, 2014 at 7:41 AM, Renan Pinzon <rpi...@gmail.com> wrote:
On Wednesday, September 17, 2014 8:44:43 PM UTC-3, Antonio Piccolboni wrote:On Wed, Sep 17, 2014 at 4:15 PM, Renan Pinzon <rpi...@gmail.com> wrote:Hi Piccolboni,I wrote this piece of code based on tests and dev branch.You will forgive a degree of perplexity when I see the code of the library pasted back in a message. Libraries are used with library(), not by cut and paste.Yes, I'm sure about this, but I wrote this piece of code to create an avro input format cause it's only available on dev branch which I'm not secure to use, so I'm trying to do in my own code running the latest released version of the library.So the way software engineers speak about is is that you tried to backport the avro format to the current stable version. You can't say "I wrote" for stuff that someone else wrote. That's called plagiarism. If you write that in an email to the author of the plagiarized code, there's other ways to describe it, none of which is flattering enough for me to use it on this forum.I understand what you say, but I only said that "I wrote" cause I was trying another solution using avro-utils (https://github.com/tomslabs/avro-utils) which requires to create an input format similar to this one and then I changed the code to this. In fact now the code is a bit different because I removed the line that reads the schema from the local file and placed the schema there, but it didn't solved the problem yet.I saw the test that you said and I noticed that the schema is gotten from a temporary file which is local.And?I was wondering about reading the schema from the same file in HDFS that contains the entire data.Well the manual is pretty clear on this (make.input.format, "avro" entry)"(input only) It has one mandatory additional argument, schema.file that should provide the URL of a file containing an appropriate avro schema, can be the same as file to be read. The user can specify the protocol, for instance file: or hdfs: as part of the URL, with the first being the default."I am having problem with this feature though, let me get back to you.Ok, I'll be waiting and I'll also package the git dev branch and try to use it.It looks like we won't be able to support this feature and it'll have to be dropped. It was dropped from ravro because it was not supported in avro_tools and there was a lack of communication to the rmr2 team. Sorry about that. Is it a show stopper or something that can be worked around?
avro.input.format = function(schema.file, ..., read.size = 10^5) { schema = ravro:::avro_get_schema(file = schema.file) function(con) { lines = readLines(con = con, n = read.size) if (length(lines) == 0) NULL else { x = splat(paste.fromJSON)(lines) y = ravro:::parse_avro(x, schema, encoded_unions=FALSE, ...) keyval(NULL, y) } }}
Yes, I can do it but first of all I need to do a refactoring in order to correct things that I simplified in my test code.
Regarding to the avro I had to patch 1.7.4 cause it's the version that is running on my cluster due to CDH 4.7. I believe that it won't be a problem since I can backport the patch that you mentioned.