REGEX for substring in mongoDB

658 views
Skip to first unread message

mohit samarth

unread,
Jun 30, 2016, 4:13:03 PM6/30/16
to mongodb-user
Hi,

I need to find the substring of this string

^!DOCTYPE html>^html>^head>^meta charset="utf-8" />^link rel="dns-prefetch" href="//trends.builtwith.com" />^link rel="dns-prefetch" href="//trendspro.builtwith.com" />^link rel="dns-prefetch" href="//api.builtwith.com" />^title>
JUSTPATERSON.CO.NZ Technology Profile^/title>^meta name="description" content="Web technologies JUSTPATERSON.CO.NZ is using on their website." />^meta name="viewport" content="width=device-width, initial-scale=1.0" />^style>
@font-face{font-family:'Open Sans';font-style:normal;font-weight:400;src:local('Open Sans'),local(OpenSans),url(https://themes.googleusercontent.com/static/fonts/opensans/v6/cJZKeOuBrn4kERxqtaUH3T8E0i7KZn-EPnyo3HZu7kw.woff) format("woff")}@font-face{font-family:'Open Sans';font-style:normal;font-weight:700;src:local('Open Sans Bold'),local(OpenSans-Bold),url(https://themes.googleusercontent.com/static/fonts/opensans/v6/k3k702ZOKiLJc3WVjuplzHhCUOGz7vYGh680lGh-uXM.woff) format("woff")}html{font-size:100%}.navbar-custom-black{background-color:#002902!important;padding-bottom:0!important;min-height:22px!important;height:22px!important}body{margin:0;font-family:"Open Sans",Calibri,Candara,Arial,sans-serif;font-size:13px;line-height:20px;color:#555;background-color:#fff;display:block;font-weight:300}.navbar-fixed-top{top:0}.navbar-fixed-top,.navbar-fixed-bottom{position:fixed;right:0;left:0;z-index:1030;margin-bottom:0}.navbar-static-top{position:static;margin-bottom:0!important}.navbar{margin-bottom:20px;overflow:visible}.navbar .navbar-inner{background-image:none;-webkit-border-radius:0;-moz-border-radius:0;border-radius:0;-webkit-box-shadow:none;-moz-box-shadow:none;box-shadow:none}.navbar-builtwith .navbar-inner{background-color:#094303;background-repeat:repeat-x;border-color:transparent}.navbar-fixed-top .navbar-inner,.navbar-static-top .navbar-inner{-webkit-box-shadow:0 1px 10px rgba(0,0,0,0.1);-moz-box-shadow:0 1px 10px rgba(0,0,0,0.1);box-shadow:0 1px 10px rgba(0,0,0,0.1)}.navbar-fixed-top .navbar-inner,.navbar-fixed-bottom


of this thing I need to display only "JUSTPATERSON.CO.NZ"

I have used this
> var domain = {$project: { id:1,Domain: {$substr: ["$url",21,-1]} }};
> db.builtwith1.aggregate(domain)


Can i get a response in which i will not have to write index?

like find from string ABC: Between:      content="Web technologies && is using on their website."


Do we have a solution in mongodb??

Please help

Kevin Adistambha

unread,
Jul 3, 2016, 10:05:14 PM7/3/16
to mongodb-user

Hi Mohit,

I need to find the substring of this string

of this thing I need to display only “JUSTPATERSON.CO.NZ


Can i get a response in which i will not have to write index?
like find from string ABC: Between: content=”Web technologies && is using on their website.”
Do we have a solution in mongodb??

You posted similar topics in https://groups.google.com/forum/#!topic/mongodb-user/HzjsFE-sqS8 and https://groups.google.com/forum/#!topic/mongodb-user/pd1vXRcQ2WY. I will try to answer them in this post as well.

MongoDB $regex queries uses regular expression to return documents with field matching a given regular expression as a query condition, but does not support regex-based projection or other string manipulation to return part of a match. Please see the following links for more information:

The main issue that I understand so far is that your current document structure contains the whole HTML page as a string under a single field. To return the data in the form that you require, you would need to use a post-processing step in your application after you have found the desired document in your MongoDB database.

Another method you could consider is to perform pre-processing of the HTML data using a HTML parser before inserting the data into MongoDB, so you can put the information of interest into a specific field.

Best regards,
Kevin

Reply all
Reply to author
Forward
0 new messages