Groups
Sign in
Groups
mrjob
Conversations
About
Send feedback
Help
mrjob
Contact owners and managers
1–30 of 589
Discussion list for Yelp's MRJob library.
See
https://github.com/Yelp/mrjob
for details on MRJob.
Mark all as read
Report group
0 selected
Coyote Codornices Marin
9/17/20
mrjob v0.7.4 is out!
mrjob v0.7.4 is out! mrjob now supports Docker on EMR. Using this feature can be as simple as
unread,
mrjob v0.7.4 is out!
mrjob v0.7.4 is out! mrjob now supports Docker on EMR. Using this feature can be as simple as
9/17/20
Florin Andrei
6/23/20
Is there any way to connect to a remote Hadoop cluster?
Can't find any trace of this in the docs. Can I run a Python script with mrjob on my laptop, and
unread,
Is there any way to connect to a remote Hadoop cluster?
Can't find any trace of this in the docs. Can I run a Python script with mrjob on my laptop, and
6/23/20
Dave Marin
6/5/20
v0.7.3 is out!
mrjob v0.7.3 is out! This release makes cluster pooling use API calls much more efficiently, reducing
unread,
v0.7.3 is out!
mrjob v0.7.3 is out! This release makes cluster pooling use API calls much more efficiently, reducing
6/5/20
Alexandra Faynburd
,
Riaz Jahangir
2
5/11/20
Failed to run the first job on dataproc
"Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code
unread,
Failed to run the first job on dataproc
"Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code
5/11/20
Noah
,
Alexandra Faynburd
2
5/11/20
Task works locally, but fails on EMR
Hi, Did you find the solution? I think I have the same problem with google dataproc Thanks! On
unread,
Task works locally, but fails on EMR
Hi, Did you find the solution? I think I have the same problem with google dataproc Thanks! On
5/11/20
Dave Marin
4/13/20
mrjob v0.7.2 released!
mrjob v0.7.2 is out! If you are using Spark, mrjob now emulates archives (a YARN-only feature of
unread,
mrjob v0.7.2 released!
mrjob v0.7.2 is out! If you are using Spark, mrjob now emulates archives (a YARN-only feature of
4/13/20
Ariel Camperi
,
Riaz Jahangir
2
3/27/20
Logging to stderr with inline runner
I hit the same underlying issue with trying to write strings to sys.stderr, though in our case it was
unread,
Logging to stderr with inline runner
I hit the same underlying issue with trying to write strings to sys.stderr, though in our case it was
3/27/20
Ariel Camperi
,
Dave Marin
3
3/9/20
Configuring EMR managed security groups
Ah perfect, that looks like exactly what I need :) Thanks! On Mon, Mar 9, 2020 at 5:09 PM Dave Marin
unread,
Configuring EMR managed security groups
Ah perfect, that looks like exactly what I need :) Thanks! On Mon, Mar 9, 2020 at 5:09 PM Dave Marin
3/9/20
manpreet singh
1/30/20
mrjob v0.7.1 is out
mrjob v0.7.1 has been release. * v0.7.1 fixes the bug to set default value of VisibleToAllUsers to
unread,
mrjob v0.7.1 is out
mrjob v0.7.1 has been release. * v0.7.1 fixes the bug to set default value of VisibleToAllUsers to
1/30/20
Julius Raškevičius
,
Dave Marin
4
12/4/19
Primary sort not happening in MacOS Catalina and Ubuntu 18.04.3
You know, you may be right, local and inline mode might be over-zealous about not sorting output. I
unread,
Primary sort not happening in MacOS Catalina and Ubuntu 18.04.3
You know, you may be right, local and inline mode might be over-zealous about not sorting output. I
12/4/19
Dave Marin
11/22/19
mrjob v0.7.0 is out!
mrjob v0.7.0 has just been released! Unlike the transition from v0.5.0 to v0.6.0, v0.7.0 isn't
unread,
mrjob v0.7.0 is out!
mrjob v0.7.0 has just been released! Unlike the transition from v0.5.0 to v0.6.0, v0.7.0 isn't
11/22/19
Roving Richard
,
Damjan Krstajic
3
10/29/19
Where are stderr messages logged if not running on EMR?
I gave up and switched to PySpark and I am so glad I did. On Tuesday, October 29, 2019 at 7:22:21 PM
unread,
Where are stderr messages logged if not running on EMR?
I gave up and switched to PySpark and I am so glad I did. On Tuesday, October 29, 2019 at 7:22:21 PM
10/29/19
Dave Marin
10/23/19
v0.6.12 is out!
This is a quick bugfix release, and probably the last one in the v0.6.x series. v0.6.12 fixes a bug
unread,
v0.6.12 is out!
This is a quick bugfix release, and probably the last one in the v0.6.x series. v0.6.12 fixes a bug
10/23/19
Dave Marin
10/9/19
mrjob v0.6.11 released!
mrjob v0.6.11 is out! The major change in this version is that if a Spark job fails, mrjob can now
unread,
mrjob v0.6.11 released!
mrjob v0.6.11 is out! The major change in this version is that if a Spark job fails, mrjob can now
10/9/19
Xuchen Yao
10/4/19
Make EMR aware of cluster status via Hadoop
I log into Master node to submit job like this: python my_mrjob.py -r hadoop ... Job runs OK, I can
unread,
Make EMR aware of cluster status via Hadoop
I log into Master node to submit job like this: python my_mrjob.py -r hadoop ... Job runs OK, I can
10/4/19
Xuchen Yao
, …
Ash
7
9/26/19
detach terminal after submitting a long running job on EMR
If its an MR job that you submitted to AWS's EMR cluster (--runer=emr) from your laptop's
unread,
detach terminal after submitting a long running job on EMR
If its an MR job that you submitted to AWS's EMR cluster (--runer=emr) from your laptop's
9/26/19
Dave Marin
7/22/19
mrjob v0.6.10 is out!
mrjob v0.6.10 is released! Some important changes: - PyPy-aware. If you launch a MRJob in PyPy,
unread,
mrjob v0.6.10 is out!
mrjob v0.6.10 is released! Some important changes: - PyPy-aware. If you launch a MRJob in PyPy,
7/22/19
Dave Marin
5/29/19
mrjob v0.6.9 released!
This is mostly a bugfix release. Fixes a bug introduced in v0.6.8 that could break uploading archives
unread,
mrjob v0.6.9 released!
This is mostly a bugfix release. Fixes a bug introduced in v0.6.8 that could break uploading archives
5/29/19
Agustin Caminero
5/21/19
Sort_values
Dear all, I have a question about SORT_VALUES. I have two datasets, one with countries and one with
unread,
Sort_values
Dear all, I have a question about SORT_VALUES. I have two datasets, one with countries and one with
5/21/19
Dave Marin
4/26/19
mrjob v0.6.8 is out, big news for Spark users
mrjob v0.6.8 provides full support for Spark. You can now launch Spark code with any runner (except
unread,
mrjob v0.6.8 is out, big news for Spark users
mrjob v0.6.8 provides full support for Spark. You can now launch Spark code with any runner (except
4/26/19
Dave Marin
1/16/19
mrjob v0.6.7 released!
mrjob v0.6.7 is out! This release adds the `mrjob spark-submit` subcommand, which works just like
unread,
mrjob v0.6.7 released!
mrjob v0.6.7 is out! This release adds the `mrjob spark-submit` subcommand, which works just like
1/16/19
Snehal Lokesh
11/19/18
passing file in arguments in mrjob
how to pass input file in args in make runner mrjob instead of command line options
unread,
passing file in arguments in mrjob
how to pass input file in args in make runner mrjob instead of command line options
11/19/18
Dave Marin
11/5/18
mrjob v0.6.6 released!
mrjob v0.6.6 is released. This is mostly a series of small changes that make your life easier: - you
unread,
mrjob v0.6.6 released!
mrjob v0.6.6 is released. This is mostly a series of small changes that make your life easier: - you
11/5/18
Dave Marin
9/7/18
mrjob v0.6.5 released!
This release has a number of small changes, including: - more robust idle cluster self-termination (
unread,
mrjob v0.6.5 released!
This release has a number of small changes, including: - more robust idle cluster self-termination (
9/7/18
Dave Marin
8/11/18
mrjob v0.6.4 is out!
mrjob v0.6.4 is up on PyPI. This release adds the DIRS attribute to MRJobs, which allows you to
unread,
mrjob v0.6.4 is out!
mrjob v0.6.4 is up on PyPI. This release adds the DIRS attribute to MRJobs, which allows you to
8/11/18
Dave Marin
7/27/18
mrjob v0.5.12 is out!
This release is mostly a backport of some features in v0.6.x: * dropped support for Python 2.6 and
unread,
mrjob v0.5.12 is out!
This release is mostly a backport of some features in v0.6.x: * dropped support for Python 2.6 and
7/27/18
Wan Hu
6/20/18
Parquet Input Protocol
Hi! Is there an input protocol for parquet? Tried org.apache.parquet.hadoop.ParquetInputFormat and a
unread,
Parquet Input Protocol
Hi! Is there an input protocol for parquet? Tried org.apache.parquet.hadoop.ParquetInputFormat and a
6/20/18
Dave Marin
5/31/18
mrjob v0.6.3 is released!
mrjob can now read arbitrary file formats (eg image files) using mapper_raw(). mrjob also now has
unread,
mrjob v0.6.3 is released!
mrjob can now read arbitrary file formats (eg image files) using mapper_raw(). mrjob also now has
5/31/18
Vincent Chan
,
Dave Marin
3
4/10/18
multiple or shared SparkContext for each SparkStep?
Thanks for your feedback Dave. My problem is that I needed to wait until the results of step 1
unread,
multiple or shared SparkContext for each SparkStep?
Thanks for your feedback Dave. My problem is that I needed to wait until the results of step 1
4/10/18
Dane
4/9/18
Running a different mapper on different input files
Good-day I have an assignment wherein I have to do matrix multiplication. I am given two files in the
unread,
Running a different mapper on different input files
Good-day I have an assignment wherein I have to do matrix multiplication. I am given two files in the
4/9/18