" Warning: preg_match(): Unknown modifier 'd' "

843 views
Skip to first unread message

nano byte

unread,
Feb 8, 2014, 8:48:35 PM2/8/14
to 140dev-twitt...@googlegroups.com
Hi guys, 

In parse_tweets.php we have this line:

" if (!preg_match('/'.$word.'/i',$tweet_text)) {"   inside the function "find_collection_words($words,$tweet_text,$type,$out_words)".

When running the test I'm getting the following error:

" Warning: preg_match(): Unknown modifier 'd' "

I'm not sure how to fix it.  any ideas?

Thank you!

Adam Green

unread,
Feb 9, 2014, 6:37:11 AM2/9/14
to 140dev-twitt...@googlegroups.com
The only way I can reproduce this is if one of the collection words has '/d' in it:
$word = "collect me /d";
$tweet_text = "test tweet that says collect me /d";
print preg_match('/'.$word.'/i',$tweet_text);

Is this the case with your collection words? 

One solution to this type of problem is to change the delimiter used in the preg_match() pattern so that it uses a different delimiter than the forward slash. For example, this works:
$word = "collect me /d";
$tweet_text = "test tweet that says collect me /d";
print preg_match('#'.$word.'#i',$tweet_text);

But that is a silly delimiter, because the # character is important in tweets. We could change the code to use a delimiter that is unlikely to show up in a collection word, like the backtick. This works:
$word = "collect me /d";
$tweet_text = "test tweet that says collect me /d";
print preg_match('`'.$word.'`i',$tweet_text);

But it is confusing to read in code, and could still cause trouble if a collection word matches the delimiter. A better solution could be to escape the delimiter in collection words. This works:
$word = "collect me /d";
$word = str_replace('/','\/',$word);
$tweet_text = "test tweet that says collect me /d";
print preg_match('/'.$word.'/i',$tweet_text);

You are more the PHP purist than I am. :) What is your suggestion?  My preference is the second solution of escaping the delimiter. I could do this with one of the many escaping functions in PHP, but that hides the actual details of what is being done, and escape functions tend to drift in meaning as PHP evolves. I prefer the str_replace(), because that is explicit. 

This is the basic dilemma I face in moving techniques I use in production code into this open source code for use by many people. In my own code, I just use what I feel is best. Here I want to create code that is understandable, even to people with minimal PHP experience, and generally applicable. 

Advice is welcome from anyone who really cares about the growth of this code. 


--
You received this message because you are subscribed to the Google Groups "140dev Twitter Framework" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 140dev-twitter-fra...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
Adam Green
CEO, 140dev.com
CTO, UniteBlue.org
ad...@140dev.com
781-879-2960
@140dev

nano byte

unread,
Feb 10, 2014, 10:02:45 AM2/10/14
to 140dev-twitt...@googlegroups.com, ad...@140dev.com

Hi Adam

Yes, you are right. The problem is because of REGEX meta characters in the collection_words.   Your example with “/” works well  J

However, in my case the problem is with the metacharacter “.” (full stop)   I believe the solution is as you mentioned to dynamically escape all the regex metacharacters.

 I have compiled this list of metacharacters for us to escape:

 “ \ , ^ , $ , * , + , ? , { , } , [ , ] , | ”

 I agree with you on the use of str_replace rather than for example preg_replace.  str_replace seems to be faster for short words like these ones.

$word = str_replace('/','\/',$word); 

$word = str_replace('^','\^',$word);

$word = str_replace('$','\$',$word);   ….

  

what do you think ?  Thank you!   :)

Adam Green

unread,
Feb 11, 2014, 7:01:34 AM2/11/14
to 140dev-twitt...@googlegroups.com, ad...@140dev.com
I've done a lot of testing with this issue. I think the best answer is to use the preg_quote() function, which escapes all the standard regex metacharacters. You can also add additional characters, such as the forward slash I use as the regex delimiter. For example, this works:

$word = "collect me /d";
$word = preg_quote($word, '/');
$tweet_text = "tweet collect me /d";
print preg_match('/'.$word.'/i',$tweet_text);

I have new versions of the find_collection_words() and exclusion_words() function . Please modify parse_tweets_keyword.php to use these and let me know if they solve your problem. 

// Return 1 if match is found
// Return 0 if no match, or match containing out word
function find_collection_words($words,$tweet_text,$type,$out_words) {
// Remove extra spaces from words, out_words, and tweet text
$words = trim(preg_replace('/\s+/',' ', $words));
$tweet_text = trim(preg_replace('/\s+/',' ', $tweet_text));
$out_words = trim(preg_replace('/\s+/',' ', $out_words));
// Escape any characters in collection words that may 
// conflict with a regex pattern used by preg_match
$words = preg_quote($words, '/');
$match = 0;
if ($type=='phrase') {
// Exact match of collection phrase is required
$match = preg_match('/\b' . $words . '\b/i',$tweet_text);
} else {
// Break apart the words on space boundaries 
// and check for each of them separately
$words_array = explode(' ',$words);
foreach($words_array as $word) {
  if (!preg_match('/' . $word . '/i',$tweet_text)) {
    // One of the words is missing, so we're done
    return 0;
 } 
}
$match = 1;
}

if($match && !empty($out_words)) {
// Check for out words
// Break apart the out words on comma boundaries 
// and check for each of them separately
$out_words_array = explode(',',$out_words);
foreach($out_words_array as $out_word) {
 // Escape any characters in out_word that may 
          // conflict with a regex pattern used by preg_match
          $out_word = preg_quote($out_word, '/');
   if (preg_match('/' . $out_word . '/i',$tweet_text)) {
          // One of the out_words is found, so we're done
return 0;
 } 
}
}
return $match;
}

// Return 1 if match is found, 0 if not
function find_exclusion_words($words,$tweet_text,$type) {
// Remove extra spaces from words and tweet text
$words = trim(preg_replace('/\s+/',' ', $words));
$tweet_text = trim(preg_replace('/\s+/',' ', $tweet_text));
// Escape any characters in the exclusion word that may 
// conflict with a regex pattern used by preg_match
$words = preg_quote($words, '/');
if ($type == 'partial') {
return preg_match('/' . $words . '/i',$tweet_text);
} elseif ($type='exact') {
return preg_match('/\b' . $words . '\b/i',$tweet_text);
}
}

nano byte

unread,
Feb 11, 2014, 12:53:39 PM2/11/14
to 140dev-twitt...@googlegroups.com, ad...@140dev.com
Hi Adam, 

Thank you!   Yes, this solves the problem.  Now parse_tweets.php runs without complains  :-)

Adam Green

unread,
Feb 11, 2014, 12:59:02 PM2/11/14
to 140dev-twitt...@googlegroups.com
Thanks for testing. I'm going to update the posted examples of parse_tweets.php to include the new versions of the functions. 


On Tue, Feb 11, 2014 at 12:53 PM, nano byte <nano...@gmail.com> wrote:
Hi Adam, 

Thank you!   Yes, this solves the problem.  Now parse_tweets.php runs without complains  :-)

--
You received this message because you are subscribed to the Google Groups "140dev Twitter Framework" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 140dev-twitter-fra...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Reply all
Reply to author
Forward
Message has been deleted
0 new messages