Re: duplicates

42 views
Skip to first unread message
Message has been deleted
Message has been deleted

Thomas Fischer

unread,
Nov 20, 2012, 6:36:00 AM11/20/12
to textwr...@googlegroups.com
Hi Mr. J,

you can use the Text -> Process dupllicate lines… menu entry, using the grep option.

Or take a look at the script "Kill Dups Using Grep.scpt" inside TextWrangler's Script Folder (~/Library/Application Support/TextWrangler/Scripts/, the one behind the menu with the scroll icon), which fills in those fields for you. You should be able to adjust this to your needs.

Best
Thomas


Am 20.11.2012 um 01:14 schrieb Mr. J:

sorry guys,
i got the format wrong
the email is always on the first row but the rows after are different..
but i want to match just the dupe emails in first row (ignoring the rest of the row) and delete the dupes..

thanks again!

--
You received this message because you are subscribed to the
"TextWrangler Talk" discussion group on Google Groups.
To post to this group, send email to textwr...@googlegroups.com
To unsubscribe from this group, send email to
textwrangler...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/textwrangler?hl=en
If you have a feature request or would like to report a problem,
please email "sup...@barebones.com" rather than posting to the group.

Joachim Soussan

unread,
Nov 20, 2012, 1:45:12 PM11/20/12
to textwr...@googlegroups.com
would this work?
Process duplicate lines…
           with the following options:
            – Matching All
           – Send dupes to new document
           – Match using pattern
           – Searching pattern: ",? *([\w ]+)$" (without the quotes)
          – Match using:
                    specific sub-patterns:
                                  \1
Message has been deleted

Thomas Fischer

unread,
Nov 21, 2012, 3:06:26 PM11/21/12
to textwr...@googlegroups.com
Hello,

can you include an example of the data you're dealing with to see what's going on?

In your example you're looking for ",? *([\w ]+)$", that doesn't relate to email and anchors the search expression to the end of the line, try something like
([\w._-]+@\S*\.\w+)
or look  at http://www.regular-expressions.info/email.html for search expressions for email addresses.

Best
Thomas

Am 20.11.2012 um 18:47 schrieb Joachim Soussan:

coded that out.. but doesnt look like its working .. the goal is that i want to delete all lines that has a duplicate email.. whether the line has an email by itself on that line.. or whether the line has an email with other data on that line... both those lines should be deleted
--
Thanks,
Joachim Soussan

Mr. J

unread,
Nov 21, 2012, 3:22:28 PM11/21/12
to textwr...@googlegroups.com
when i use that grep you just gave me it deletes the entire document..
my document looks like this:

tomo...@aol.com,American,name,,,,
he...@aol.com,address,store,product,city,state
dog...@aol.com,USAL,store,last name,address,first name,business
yest...@aol.com,business,USA,last name, product,,
yest...@aol.com,last name,first name,,,,
gr...@aol.com,product,busness,,,,
hell...@aol.com
yest...@aol.com
textwr...@aol.com
m.n...@aol.com
he...@aol.com
tw...@aol.com

so the goal is to search for duplicate emails within the whole document and then delete those emails and their entire lines

Patrick Woolsey

unread,
Nov 21, 2012, 6:14:29 PM11/21/12
to textwr...@googlegroups.com
At 12:22 -0800 11/21/2012, Mr. J wrote:
>when i use that grep you just gave me it deletes the entire document..
>my document looks like this:
>
>[ example data elided ]
>
>so the goal is to search for duplicate emails within the whole document
>and then delete those emails and their entire lines


If all the data follow the same patterns as your above example, i.e. each
line consists of either:

a) an email address, or

b) multiple comma delimited fields, the first of which is an email address

you needn't worry about matching addresses in detail, but can instead
"cheat" and just deal with them positionally. :-)

So, please give this a try:

Apply Text -> Delete Duplicate Lines set to `Leaving One` with the
"Duplicates to new document" and "Delete duplicate lines" options set, and
"Match using pattern" enabled with:

Searching pattern: ^(.+?@.+?)(|,.*)$

Match Using:
Specific sub-patterns: \1

I think the search pattern will make sense on its own, but if not, please
say so and I'll be happy to explain in more detail.


Also, as a general observation:

Though it's not absolutely necessary to bind search patterns for use with
TextWrangler's line processing commands to the line start and end (i.e. if
your pattern naturally limits itself to a single line), I recommend doing
so to avoid unexpected outcomes.


Regards,

Patrick Woolsey
==
Bare Bones Software, Inc. <http://www.barebones.com/>

Mr. J

unread,
Nov 21, 2012, 7:53:34 PM11/21/12
to textwr...@googlegroups.com
Thanks patrick that worked great!
I wanted to ask, other than the user manual.. does text wrangler have any youtube tutorial videos so i can learn the greps much better?

Thomas Fischer

unread,
Nov 22, 2012, 12:23:25 PM11/22/12
to textwr...@googlegroups.com
Hello,

Am 21.11.2012 um 20:22 schrieb Mr. J:

when i use that grep you just gave me it deletes the entire document..

there must be an error in the setting.
With 
Process duplicate lines…
with the following options:
– Leaving One
– Duplicates to new document
- Delete duplicate lines
– Match using pattern
– Searching pattern: "([\w._-]+@\S*\.\w+)" (without the quotes)
– Match using:
Entire match
I get the duplicates


which are removed from your example.

Best
Thomas

Christopher Stone

unread,
Nov 23, 2012, 7:03:10 PM11/23/12
to textwr...@googlegroups.com
On Nov 21, 2012, at 18:53, Mr. J <joaso...@gmail.com> wrote:
I wanted to ask, other than the user manual.. does text wrangler have any youtube tutorial videos so i can learn the greps much better?
______________________________________________________________________

Hey There,

This is a pretty good place for new regexers: http://www.regular-expressions.info

If you're serious though it would be a good idea to buy a book or two.



Mastering Regular Expressions - Superb, but not really for beginners.


Since BBEdit and TextWrangler use Perl Compatible Regular Expressions (PCRE) this is a useful reference:


--
Best Regards,
Chris

Reply all
Reply to author
Forward
0 new messages