Hi Mansi,
This is a GREAT question. Thank you for asking!
The short answer is that you may not be able to do this with either a CSS or an XPath query. It has to do with the underlying parser's handling of the ":" character in HTML attributes, because the ":" character is used to indicate namespacing in XML.
Let's start by explaining the error you're seeing ... which requires an explanation of how Nokogiri implements CSS parsing.
The error message you're seeing is the CSS parser complaining that it doesn't understand the ":". Usually, the ":" character is used to indicate pseudo-selectors in CSS, but we could (probably?) do some work and get the CSS parser to handle it differently in an attribute. In that case, the generated XPath would be something like:
css 'div[dd:meta2]' → xpath './/div[@dd:meta2]'
and this is where it gets really hairy. This isn't a valid XPath query; or at least, libxml2 throws up on it. Here's what happens if you execute this search:
page.css('body').xpath('.//div[@dd:meta2]')
... ERROR: Undefined namespace prefix: .//div[@dd:meta2] (Nokogiri::XML::XPath::SyntaxError)
That is, libxml2 thinks we're searching for an attribute with a
namespace of `dd`. And there doesn't appear to be any way to escape it. All of these variations also fail, for a variety of reasons:
.//div[@dd\:meta2]
.//div[@dd\\:meta2]
.//div[attribute::dd:meta2]
.//div[attribute::dd\:meta2]
.//div[attribute::dd\\:meta2]
So, the unfortunate advice I have for you right now is: either change your attribute names to
not contain a ":" character, or else do this in Ruby space, like this:
page.css('div').select { |node| node.attributes["dd:meta2"] }
You could get fancy and write a customer xpath function to do this within an XPath query, but in this case performance will likely be the same (or worse) and the code would be far more complicated.
Hope this helps,
-m