Extracting just email addresses from a text file

appsp...@gmail.com

unread,

Feb 20, 2014, 2:30:27 PM2/20/14

to textwr...@googlegroups.com

Hi All, thank you for any assistance/insight.

So I have a text file that I've opened in TextWrangler with email names and email addresses in various formats. sample records:

Timmy Turner <ttu...@example.com>

"jammi...@example.com" <jammi...@example.com>

Susan Alder <sues...@example.com>,

sally...@example.com

So some email addresses with names preceding them, most emails enclosed by <> brackets, and some emails just by themselves, already correct, some lines have commas and spaces on the end. I want to do a global process that will automate the process of getting this end result (just the email addresses, nothing else):

Thanks for any insight!

Christopher Stone

unread,

Feb 20, 2014, 4:32:51 PM2/20/14

to TextWrangler-Talk

On Feb 20, 2014, at 13:30, appsp...@gmail.com wrote:

I want to do a global process that will automate the process of getting this end result (just the email addresses, nothing else):

______________________________________________________________________

Hey There,

A very quick and dirty Perl Text-Filter:

#! /usr/bin/env perl -0777 -n

use strict; use warnings;

#-----------------------------------------------------------

my @array = m!\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b!ig;

my %hash = map { $_ => 1 } @array;

my @unique = sort(keys %hash);

$, = "\n";

print @unique;

Save it to TextWrangler's Text Filters folder:

~/Library/Application Support/TextWrangler/Text Filters/

Run it from {Text}-->{Apply Text Filter}.

It will be applied to the front document or if there is one the selection of the front document.

--

Best Regards,

Chris

appsp...@gmail.com

unread,

Feb 20, 2014, 4:42:51 PM2/20/14

to textwr...@googlegroups.com

Thanks so much, Chris! Oddly, someone suggested this:

sed -e "s|.*<||" -e "s|>.*||"  your_file.txt  > new_file.txt

But I didn't know how to do that (run a sed as a regex), AND I didn't want to create a new file. So then I took the first part of it:

s|.*<||

And I pasted that into the "find" window. I left the "replace" window empty, and clicked "replace all." It removed everything from before the email address. And then I just did a replace on the trailing ">" so now it's done.

Thank you so much for your input as well, I appreciate the time you took to evaluate the question!

Steve

unread,

Feb 20, 2014, 11:16:27 PM2/20/14

to textwr...@googlegroups.com

Search:

^.*\b(\S+@\S+)\b.*$

Replace:

\1

Start at the beginning of the line '^' and search zero-or-more '.*' characters until it finds a word boundary '\b' that is immediately followed by non-whitespace characters '\S+', a '@' symbol, and more non-whitespace characters '\S+'. It would then have to be followed by another word boundary '\b', and then discard anything else that goes to the end of the line '.*$'.

Just run it in TextWrangler with 'grep' enabled.

-Steve

Christopher Stone

unread,

Feb 21, 2014, 11:49:39 AM2/21/14

to TextWrangler-Talk

On Feb 20, 2014, at 15:42, appsp...@gmail.com wrote:

Thanks so much, Chris! Oddly, someone suggested this:
sed -e "s|.*<||" -e "s|>.*||"  your_file.txt  > new_file.txt
But I didn't know how to do that (run a sed as a regex), AND I didn't want to create a new file. So then I took the first part of it:

______________________________________________________________________

Hey There,

I don't like using pipe characters for delimiters in find/replace arguments, because they are a typical character in regular expressions - and it can get confusing (to me).

I prefer to use the exclamation point. To me it's much easier to read.

#! /usr/bin/env bash

sed -e "s!.*<!!" -e "s!>.*!!"

The canonical way to write that would be:

sed -e "s/.*<//" -e "s/>.*//"

But all three are syntactically correct.

What's between the '!' and the '/' are the regex patterns.

's' for search.

s(earch)!<search-pattern>!<replace-pattern!

The given sed statement uses two find/replace actions to kill what is before "<" and what is after ">", and that's not safe unless all the input is consistent.

The text-filter I posted is fairly bomb-proof. It will find all email addresses, remove any duplicates, sort the remaining addresses, and replace the text in the front window.

my @array = m!\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b!ig;

The regular expression in this is:

\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b

It's a very good pattern gleaned from here:

http://www.regular-expressions.info/email.html

--

Best Regards,

Chris

Gautam Pinto

unread,

Dec 8, 2014, 3:39:39 PM12/8/14

to textwr...@googlegroups.com

I'm not sure why, but this does not seem to extract emails with dots in the address like: example...@something.com, any way around that?

Steve

unread,

Dec 8, 2014, 5:11:55 PM12/8/14

to textwr...@googlegroups.com

That regex uses .* which is greedy. You want to add a ? after it to make it non-greedy: .*?

https://regex101.com/r/xI1vH1/1 shows the current version, using only the .* syntax.

https://regex101.com/r/xI1vH1/2 shows the version where .*? is used instead.

-Steve

Gautam Pinto

unread,

Dec 8, 2014, 7:04:40 PM12/8/14

to textwr...@googlegroups.com

Thank you that worked, but is there a method to only extract emails that have dots in the name before the @ symbol?

--
This is the TextWrangler Talk public discussion group.
If you have a feature request or would like to report a problem,
please email "sup...@barebones.com" instead of posting here.
---
You received this message because you are subscribed to a topic in the Google Groups "TextWrangler Talk" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/textwrangler/lnpquQK9qEg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to textwrangler...@googlegroups.com.

--

Gautam Pinto
e: gautam...@gmail.com

m: +1 (647) 896 7678

Steve

unread,

Dec 8, 2014, 7:28:03 PM12/8/14

to textwr...@googlegroups.com

One method would be to use a look-ahead.

(?=\S*\.\S*@)

https://regex101.com/r/xI1vH1/3 shows this where it only accepts email addresses with '.' before the '@'.

-Steve

Hígialuz

unread,

Oct 20, 2015, 9:21:15 AM10/20/15

to TextWrangler Talk, listm...@suddenlink.net

Dear Chris,

That worked beautifully!

Thank you sir!

Kind regards,

elmoluz

Reply all

Reply to author

Forward