pbd22 (dush...@gmail.com) writes:
> On Jan 31, 9:57 am, pbd22 <dush...@gmail.com> wrote:
>> I am working in SQL Server 2005 and want to break up a table of 1M
>> rows into distinct results of 80,0000.
>> I feel like this is basically
>> use [database]
>> select * from [table]
>> where email_address like '%gmail.com'
>> group by ????
>> I am messing up with the group by (if this is the right way to go).
>> How do achieve this?
> Anybody? To people use this group any more?
I am here! However, I only look in here about once a day. The traffic
does not really warrant anymore. If you want speedy answers to your
questioms, this is a busy place:
http://social.msdn.microsoft.com/Forums/en-US/transactsql/threads Although, a bit too many people answers questions that goes outside
what they really know.
Anyway, your problem is a little vague. What is the purpose with
this division? And must the batches be exactly 80000 rows in size
or can the number vary?
-- Erland Sommarskog, SQL Server MVP, esq...@sommarskog.se
> pbd22 (dush...@gmail.com) writes:
> > On Jan 31, 9:57 am, pbd22 <dush...@gmail.com> wrote:
> >> I am working in SQL Server 2005 and want to break up a table of 1M
> >> rows into distinct results of 80,0000.
> >> I feel like this is basically
> >> use [database]
> >> select * from [table]
> >> where email_address like '%gmail.com'
> >> group by ????
> >> I am messing up with the group by (if this is the right way to go).
> >> How do achieve this?
> > Anybody? To people use this group any more?
> I am here! However, I only look in here about once a day. The traffic
> does not really warrant anymore. If you want speedy answers to your
> questioms, this is a busy place:http://social.msdn.microsoft.com/Forums/en-US/transactsql/threads > Although, a bit too many people answers questions that goes outside
> what they really know.
> Anyway, your problem is a little vague. What is the purpose with
> this division? And must the batches be exactly 80000 rows in size
> or can the number vary?
> --
> Erland Sommarskog, SQL Server MVP, esq...@sommarskog.se
Thanks both a bundle for your responses. Looks like there is life on
planet
Google Groups after all, much appreciated!
The specifics is that we are doing email deployments but google is
moving
all of the email sent to gmail users to their spam boxes. As a result,
we have
to "chunk" the gmail users out of the total amount and send in
manageable
batches. We have figured that 80,000 per batch out of the total gmail
users
in the table is possible.
And, Erland, to answer your question, I would prefer to turn this into
a stored
procedure we can use for the purpose of "chunking" email addresses.
Accordingly,
the number per batch should be a variable parameter.
Bob, the table we are querying against is pretty simple. Essentially,
it has one
one column - "email_address" which is a varchar. Its data is about 1
million email
addresses (but that number changes often). The result set table(s)
should only have
two columns, the count (INT) and the email_address (varchar). Please
see below.
The query I am trying to write is supposed to dump each result set
(batch)
to a text file in some folder on the hard drive. Each result set
should have
a count and the email addresses as columns. Something like this:
And so on up to the to total amount of the gmail addresses out of the
original table.
So, the statement should read something like this (pseudo code):
select all distinct users
from the master table
where email_address like '%gmail.com'
return in batches of N (such as 80,000)
and write each batch to a text file on the
hard drive.
I hope I have explained myself well. Let me know if anything is
unclear.
pbd22 wrote:
> On Jan 31, 5:03 pm, Erland Sommarskog <esq...@sommarskog.se> wrote:
>> pbd22 (dush...@gmail.com) writes:
>>> On Jan 31, 9:57 am, pbd22 <dush...@gmail.com> wrote:
>>>> I am working in SQL Server 2005 and want to break up a table of 1M
>>>> rows into distinct results of 80,0000.
> Thanks both a bundle for your responses. Looks like there is life on
> planet
> Google Groups after all, much appreciated!
> The specifics is that we are doing email deployments but google is
> moving
> all of the email sent to gmail users to their spam boxes. As a result,
> we have
> to "chunk" the gmail users out of the total amount and send in
> manageable
> batches. We have figured that 80,000 per batch out of the total gmail
> users
> in the table is possible.
> And, Erland, to answer your question, I would prefer to turn this into
> a stored
> procedure we can use for the purpose of "chunking" email addresses.
> Accordingly,
> the number per batch should be a variable parameter.
> Bob, the table we are querying against is pretty simple. Essentially,
> it has one
> one column - "email_address" which is a varchar. Its data is about 1
> million email
> addresses (but that number changes often). The result set table(s)
> should only have
> two columns, the count (INT) and the email_address (varchar). Please
> see below.
> The query I am trying to write is supposed to dump each result set
> (batch)
> to a text file in some folder on the hard drive. Each result set
> should have
> a count and the email addresses as columns. Something like this:
> RESULT 1:
> [COUNT] [EMAIL_ADDRESS]
> 1 n...@gmail.com
> 2 n...@gmail.com
<snip>
> And so on up to the to total amount of the gmail addresses out of the
> original table.
> So, the statement should read something like this (pseudo code):
> select all distinct users
You keep making a point of saying "distinct". Does that imply that there are
duplicate email addresses in that 1-million row table?
> from the master table
> where email_address like '%gmail.com'
> return in batches of N (such as 80,000)
> and write each batch to a text file on the
> hard drive.
I assume you can handle this part, correct? It's only batching the data that
you need help with?
> I hope I have explained myself well. Let me know if anything is
> unclear.
> Thanks a bundle for your help.
SQL 2008 has some paging functionality builtin but you're using SQL 2005.
I can think of a couple approaches. Here's one:
1. Create a temp table (#batches) with an identity column (indentcol) and an
email column (email). Insert the distinct email addresses into it:
insert #batches (email)
select distinct email_address from master_table where email_address like
'%gmail.com'
Then use a WHILE loop to retrieve the batches, using a variable to keep
track of them.
declare @batchsize int --convert this to a parameter for your sproc
set @batchsize=80000
declare @lastrec int, @endrec int
set @lastrec=(select max(identcol) from #batches)
set @endrec=@batchsize
WHILE @endrec-@batchsize<=@lastrec
BEGIN
select email from #batches where identcol >=@lastrec
--process the batch
delete #batches where identcol >=@lastrec
set @endrec=@endrec + @batchsize
END
pbd22 (dush...@gmail.com) writes:
> The specifics is that we are doing email deployments but google is > moving all of the email sent to gmail users to their spam boxes. As a
> result, we have to "chunk" the gmail users out of the total amount and
> send in manageable batches. We have figured that 80,000 per batch out of
> the total gmail users in the table is possible. >... > Bob, the table we are querying against is pretty simple. Essentially, > it has one one column - "email_address" which is a varchar. Its data is
> about 1 million email addresses (but that number changes often). The
> result set table(s) should only have two columns, the count (INT) and
> the email_address (varchar). Please see below.
The simple-minded solution would be:
WITH numbered AS (
SELECT email, row_number() OVER(ORDER BY email) AS rowno
FROM addresses
WHERE email LIKE '...@gmail.com'
)
SELECT email FROM numbered
WHERE rowno > (@batchno - 1) * @batchsize AND rowno <= (@batchno - 1) * @batchsize
But it would be far more efficient to do:
CREATE TABLE gmail_addresses (rowno int NOT NULL,
gmail nvarchar(255) NOT NULL,
CONSTRAINT pk_gmail PRIMARY KEY CLUSTERED (rowno),
CONSTRAINT pk_unique UNIQUE NONCLUSTERED(gmail))
INSERT gmail_addresses(rowno, gmail)
SELECT row_number() OVER(ORDER BY (SELECT 1)), gmail
FROM (SELECT DISTINCT email
FROM addresses
WHERE email like '...@gmail.com') AS x
Then you have materialised the row number once for all, and a selection of 80000 accounts will be quick.
-- Erland Sommarskog, SQL Server MVP, esq...@sommarskog.se