Extracting just email addresses from a text file

1,681 views
Skip to first unread message

appsp...@gmail.com

unread,
Feb 20, 2014, 2:30:27 PM2/20/14
to textwr...@googlegroups.com
Hi All, thank you for any assistance/insight.

So I have a text file that I've opened in TextWrangler with email names and email addresses in various formats. sample records:

Timmy Turner <ttu...@example.com>
Susan Alder <sues...@example.com>,

So some email addresses with names preceding them, most emails enclosed by <> brackets, and some emails just by themselves, already correct, some lines have commas and spaces on the end. I want to do a global process that will automate the process of getting this end result (just the email addresses, nothing else):


Thanks for any insight!

Christopher Stone

unread,
Feb 20, 2014, 4:32:51 PM2/20/14
to TextWrangler-Talk
On Feb 20, 2014, at 13:30, appsp...@gmail.com wrote:
I want to do a global process that will automate the process of getting this end result (just the email addresses, nothing else):
______________________________________________________________________

Hey There,

A very quick and dirty Perl Text-Filter:

#! /usr/bin/env perl -0777 -n
use strict; use warnings;
#-----------------------------------------------------------
my @array = m!\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b!ig;
my %hash = map { $_ => 1 } @array;
my @unique = sort(keys %hash);
$, = "\n";
print @unique;

Save it to TextWrangler's Text Filters folder:

    ~/Library/Application Support/TextWrangler/Text Filters/

Run it from {Text}-->{Apply Text Filter}.

It will be applied to the front document or if there is one the selection of the front document.

--
Best Regards,
Chris

appsp...@gmail.com

unread,
Feb 20, 2014, 4:42:51 PM2/20/14
to textwr...@googlegroups.com
Thanks so much, Chris! Oddly, someone suggested this:

sed -e "s|.*<||" -e "s|>.*||"  your_file.txt  > new_file.txt
But I didn't know how to do that (run a sed as a regex), AND I didn't want to create a new file. So then I took the first part of it:

s|.*<||
And I pasted that into the "find" window. I left the "replace" window empty, and clicked "replace all." It removed everything from before the email address. And then I just did a replace on the trailing ">" so now it's done.

Thank you so much for your input as well, I appreciate the time you took to evaluate the question!

Steve

unread,
Feb 20, 2014, 11:16:27 PM2/20/14
to textwr...@googlegroups.com
Search:
   ^.*\b(\S+@\S+)\b.*$
Replace:
   \1

Start at the beginning of the line '^' and search zero-or-more '.*' characters until it finds a word boundary '\b' that is immediately followed by non-whitespace characters '\S+', a '@' symbol, and more non-whitespace characters '\S+'. It would then have to be followed by another word boundary '\b', and then discard anything else that goes to the end of the line '.*$'.

Just run it in TextWrangler with 'grep' enabled.

-Steve

Christopher Stone

unread,
Feb 21, 2014, 11:49:39 AM2/21/14
to TextWrangler-Talk
On Feb 20, 2014, at 15:42, appsp...@gmail.com wrote:
Thanks so much, Chris! Oddly, someone suggested this:
sed -e "s|.*<||" -e "s|>.*||"  your_file.txt  > new_file.txt
But I didn't know how to do that (run a sed as a regex), AND I didn't want to create a new file. So then I took the first part of it:
______________________________________________________________________

Hey There,

I don't like using pipe characters for delimiters in find/replace arguments, because they are a typical character in regular expressions - and it can get confusing (to me).

I prefer to use the exclamation point.  To me it's much easier to read.

#! /usr/bin/env bash
sed -e "s!.*<!!" -e "s!>.*!!"

The canonical way to write that would be:

sed -e "s/.*<//" -e "s/>.*//"

But all three are syntactically correct.

What's between the '!' and the '/' are the regex patterns.

's' for search.

s(earch)!<search-pattern>!<replace-pattern!

The given sed statement uses two find/replace actions to kill what is before "<" and what is after ">", and that's not safe unless all the input is consistent.

The text-filter I posted is fairly bomb-proof.  It will find all email addresses, remove any duplicates, sort the remaining addresses, and replace the text in the front window.

my @array = m!\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b!ig;

The regular expression in this is:

\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b

It's a very good pattern gleaned from here:


--
Best Regards,
Chris

Gautam Pinto

unread,
Dec 8, 2014, 3:39:39 PM12/8/14
to textwr...@googlegroups.com
I'm not sure why, but this does not seem to extract emails with dots in the address like: example...@something.com, any way around that?

Steve

unread,
Dec 8, 2014, 5:11:55 PM12/8/14
to textwr...@googlegroups.com
That regex uses   .*   which is greedy. You want to add a   ?   after it to make it non-greedy:   .*?

https://regex101.com/r/xI1vH1/1 shows the current version, using only the  .*  syntax.
https://regex101.com/r/xI1vH1/2 shows the version where   .*?   is used instead.

-Steve

Gautam Pinto

unread,
Dec 8, 2014, 7:04:40 PM12/8/14
to textwr...@googlegroups.com
Thank you that worked, but is there a method to only extract emails that have dots in the name before the @ symbol?

--
This is the TextWrangler Talk public discussion group.
If you have a feature request or would like to report a problem,
please email "sup...@barebones.com" instead of posting here.
---
You received this message because you are subscribed to a topic in the Google Groups "TextWrangler Talk" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/textwrangler/lnpquQK9qEg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to textwrangler...@googlegroups.com.



--

Steve

unread,
Dec 8, 2014, 7:28:03 PM12/8/14
to textwr...@googlegroups.com
One method would be to use a look-ahead.

    (?=\S*\.\S*@)

https://regex101.com/r/xI1vH1/3 shows this where it only accepts email addresses with '.' before the '@'.

-Steve

Hígialuz

unread,
Oct 20, 2015, 9:21:15 AM10/20/15
to TextWrangler Talk, listm...@suddenlink.net

Dear Chris,
That worked beautifully!
Thank you sir!

Kind regards,
elmoluz
Reply all
Reply to author
Forward
0 new messages