Significance of $CONDITIONS in free form query

5,030 views
Skip to first unread message

shrijeet

unread,
Dec 1, 2010, 3:20:49 PM12/1/10
to Sqoop Users
Hi Sqoop Users,
I am wondering what is the significance of $CONDITIONS in free form
query
I am referring to this : http://archive.cloudera.com/cdh/3/sqoop/SqoopUserGuide.html#_free_form_query_imports

1. Is it that with every query I have to blindly mention $CONDITION
Example: " <actual query> where $CONDITIONS <actual query> "

2. What if I need to put my one where condition in a free form query?
Would it be something like this: "<actual query> where $CONDITIONS
and this=that;"

-Shrijeet

Shrijeet Paliwal

unread,
Dec 1, 2010, 4:12:46 PM12/1/10
to Sqoop Users
I think I sort of figured it out, 
1. If you run import in verbose mode it says
"Executing SQL statement: select name from user where  (1 = 0)  and name like "shrijeet%""  , this however is not the actual statement which gets result. 
This is executed to collect meta data. 

2. At a later stage $CONDITION is replaced with (1 = 1) which makes 'where $CONDITIONS' a no op. One can append his own condition after 'where $CONDITIONS' and it should work. 
 
If some one can confirm the observation, it will be great. 
--
Shrijeet

Aaron Kimball

unread,
Dec 1, 2010, 4:27:52 PM12/1/10
to sqoop...@cloudera.org
If you run a parallel import, the map tasks will execute your query with different values substituted in for $CONDITIONS. e.g., one mapper may execute "select bla from foo WHERE (id >=0 AND id < 10000)", and the next mapper may execute "select bla from foo WHERE (id >= 10000 AND id < 20000)" and so on.

Sqoop does not parse your SQL statement into an abstract syntax tree which would allow it to modify your query without textual hints. You are free to add further constraints like you suggested in your initial example, but the literal string "$CONDITIONS" does need to appear in the WHERE clause of your query so that Sqoop can textually replace it with its own refined constraints. 

- Aaron

shrijeet

unread,
Dec 1, 2010, 7:39:51 PM12/1/10
to Sqoop Users
Thanks Aaron.

Follow up question,

"non parallel import" : Can I make an import non parallel by setting
m=1 ?
I have tried and it works and sqoop does not insist on split-by option
if m=1, but same is not true for $CONDITIONS.

Also unrelated, Is there some other way to perform a non parallel
import (other than m=1) ?

On Dec 1, 1:27 pm, Aaron Kimball <akimbal...@gmail.com> wrote:
> If you run a parallel import, the map tasks will execute your query with
> different values substituted in for $CONDITIONS. e.g., one mapper may
> execute "select bla from foo WHERE (id >=0 AND id < 10000)", and the next
> mapper may execute "select bla from foo WHERE (id >= 10000 AND id < 20000)"
> and so on.
>
> Sqoop does not parse your SQL statement into an abstract syntax tree which
> would allow it to modify your query without textual hints. You are free to
> add further constraints like you suggested in your initial example, but the
> literal string "$CONDITIONS" does need to appear in the WHERE clause of your
> query so that Sqoop can textually replace it with its own refined
> constraints.
>
> - Aaron
>
> On Wed, Dec 1, 2010 at 1:12 PM, Shrijeet Paliwal <shrijeet.pali...@gmail.com
>
>
>
>
>
>
>
> > wrote:
> > I think I sort of figured it out,
> > 1. If you run import in verbose mode it says
> > "Executing SQL statement: select name from user where  (1 = 0)  and name
> > like "shrijeet%""  , this however is not the actual statement which gets
> > result.
> > This is executed to collect meta data.
>
> > 2. At a later stage $CONDITION is replaced with (1 = 1) which makes 'where
> > $CONDITIONS' a no op. One can append his own condition after 'where
> > $CONDITIONS' and it should work.
>
> > If some one can confirm the observation, it will be great.
> > --
> > Shrijeet
>
> > On Wed, Dec 1, 2010 at 12:20 PM, shrijeet <shrijeet.pali...@gmail.com>wrote:
>
> >> Hi Sqoop Users,
> >> I am wondering what is the significance of $CONDITIONS in free form
> >> query
> >> I am referring to this :
> >>http://archive.cloudera.com/cdh/3/sqoop/SqoopUserGuide.html#_free_for...

Aaron Kimball

unread,
Dec 1, 2010, 7:49:43 PM12/1/10
to sqoop...@cloudera.org
Setting -m 1 is the only way to force a non-parallel import.

You still need $CONDITIONS in there because it queries the database
about column type information, etc in the client before executing the
import job, but does not want actual rows returned to the client. So
it will execute your query with $CONDITIONS set to '1 = 0' to ensure
that it receives type information, but not records.

- Aaron

shrijeet

unread,
Dec 1, 2010, 8:11:45 PM12/1/10
to Sqoop Users
Understood. Thanks again.
Reply all
Reply to author
Forward
0 new messages