Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion Parsing Google Group URLs (v1)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Elisabeth Riba  
View profile  
 More options Dec 11 2001, 6:25 pm
Newsgroups: alt.fan.dejanews
From: Elisabeth Riba <l...@osmond-riba.org>
Date: 11 Dec 2001 23:25:27 GMT
Local: Tues, Dec 11 2001 6:25 pm
Subject: [INFO] Parsing Google Group URLs (v1)
Especially with the newer extended archive, I've seen a lot of questions
about what all the bits of the URL do and how one can intelligently trim
it to (hopefully) fit on one line.  I've been playing around with this for
a while, and can do this on my own.
Here's my first stab at sharing this knowledge widely:

==================
GOOGLE GROUP URLS:
==================
The URL always begins with
     http://groups.google.com/groups?
and then is followed by numerous short segments (controls) concatenated by &'s

There are two ways of explaining these controls:
 a) by control, better to help you read & dissect URLs
 b) by function, better to help you build your own URLs

=======================
CONTROLS, ALPHABETICAL:
=======================
 as_drrb     a toggle for searches by date
 as_epq      way of querying for an exact phrase
 as_eq       way of excluding words (Boolean NOT) in a query
 as_maxd     for searches between two dates, the end day-of-the-month
 as_maxm     for searches between two dates, the end month number
 as_maxy     for searches between two dates, the end year
 as_mind     for searches between two dates, the start day-of-the-month
 as_minm     for searches between two dates, the start month number
 as_miny     for searches between two dates, the start year
 as_oq       way of querying for optional words (Boolean OR)
 as_q        way of querying for required words (Boolean AND)
 as_qdr      way of searching by date relative to the current time
 as_scoring  whether results are sorted by date or relevance
 as_uauthors way of querying in Author field
 as_ugroup   way of querying for newsgroup name
 as_umsgid   way of querying on Message-ID
 as_usubject way of querying for words in Subject
 filter      whether Google will display or omit similar results
 hl          language the Google UI will display
 ic          shows full text for articles
 lr          way of querying by language of article
 num         number of results shown per screen
 output      shows the results in
 q           general query field
 safe        whether content-filtering is turned on
 scoring     whether results are sorted by date or relevance
 selm        way of querying on Message-ID
=====================
CONTROLS, FUNCTIONAL:
=====================
Building the Query:
-------------------
The easiest way to do this is just use Q= and follow it by your search terms
concatenated by + (plus signs).  For example, q=a+b+c yields (a AND b AND c)
 Boolean AND:
 ------------
  Google's search treats all words as if they were joined by a Boolean AND
 Boolean OR:
 -----------
  Use Q= and put +OR+ ("OR" must be in upper case) between those words.
   For example, q=a+b+OR+c yields (a AND (b OR c))
  You can also list all optional words using the as_oq control.
 Boolean NOT:
 ------------
  Use Q= and put a - (minus sign) before words you wish to exclude.
  You still must include the + (plus sign) between terms.
   For example, q=a+-b+c yields (a AND NOT(b) AND c)
  You can also list all words to exclude using the as_eq control.
 Exact Phrases:
 --------------
 Use Q= and put quotation marks around the words in the phrase.
 Continue to put + (plus signs) between all words in the phrase, and
 between the phrase and other terms as necessary.
  For example, q=a+"b+c"+d yields (a AND phrase(b c) AND d)
 You can also list all words in a phrase using the as_epq control.

Field Searches:
---------------
 Author:
 -------
  Within the q= control, enter author: followed by one word in order to
  search for that word in the author field.  To search for multiple words,
  repeat the author: before each word.
  You can also list all author words at once using the as_uauthors control.
   For example, q=author:John+author:Doe == as_uauthors:John+Doe

 Group:
 ------
  Within the q= control, enter group: followed by as much of the
  newsgroup name as you know.
  You can also use the as_ugroup control.
   For example, q=group:alt.fan.dejanews == as_ugroup=alt.fan.dejanews
  NOTE: This is the ONLY field at present which permits wildcards.

 Subject:
 --------
  Within the q= control, enter insubject: followed by one word in order to
  search for that word in the subject field.  To search for multiple words,
  repeat the insubject: before each word.
  You can also list all subject words at once using the as_usubject control.
   For example, q=insubject:delurk+insubject:test == as_usubject:delurk+test
  NOTE: the control here is "INsubject" not plain "subject" (a common pitfall)

 Message-ID:
 -----------
  Within the q= control, enter msgid: followed by the Message-ID
  You can also use the controls as_umsgid or selm followed by the Message-ID
   http://groups.google.com/groups?selm=anews.Asdcsvax.285
   http://groups.google.com/groups?as_umsgid=anews.Asdcsvax.285
   http://groups.google.com/groups?q=msgid:anews.Asdcsvax.285

NOTE: Google accepts a maximum of TEN words in a query.  That number doesn't
include prefixes to specify author, group or subject, but can be a limitation.
Also, Google does not recognize wildcards and does not perform stemming.  
Searching "printer" vs. "printers" will give different results.

Date Limitations:
-----------------
 No control is needed to get articles written at any time.
 To only get posts within the last 24 hours, append as_drrb=q&as_qdr=d
 To only get posts within the last week, append as_drrb=q&as_qdr=w
 To only get posts within the last month, append as_drrb=q&as_qdr=m
 To only get posts within the last year, append as_drrb=q&as_qdr=y
 To only get posts in a specified date range, takes SEVEN appended controls:
   as_drrb=d
   as_mind=(starting date - a number from 1 to 31)
   as_minm=(starting month - a number from 1 to 12)
   as_miny=(starting year - a number from 1981 to 2001)
   as_maxd=(ending date - a number from 1 to 31)
   as_maxm=(ending month - a number from 1 to 12)
   as_maxy=(ending year - a number from 1981 to 2001)
  For example, December 1, 1999 thru January 30, 2000 would look like:
  as_drrb=b&as_mind=1&as_minm=12&as_miny=1999&as_maxd=30&as_maxm=1&as_maxy=20 00

Language Limitations:
---------------------
 Google can return results that were written in one of 28 possible languages.
 If you want articles written in any language, no control is needed.
 To only get messages written in English, append lr=lang_en
 For French, lr=lang_fr  For Spanish, lr=lang_es  For Russian, lr=lang_ru
  and so on. [I'm *not* going to list all the options here.]

Customizing Display of the Results:
-----------------------------------
 Sorting:
 --------
  To sort the results by date, append scoring=d OR as_scoring=d
  To sort results by relevance, append scoring=r OR as_scoring=r
 Number of results per screen:
 -----------------------------
  To show 10 results per screen, append num=10
  To show 20 results per screen, append num=20
  To show 30 results per screen, append num=30
  To show 50 results per screen, append num=50
  To show 100 results per screen, append num=100
  Keep in mind, Google Groups only show 10 screens of results.
  At num=10, that's only the first hundred; num=100 can show a thousand
 Other options:
 --------------
  To show all results (don't omit "very simililar entries"), append filter=0
  To show the bodies of all messages, append ic=1
  To turn on content filtering (avoid "adult" messages), append safe=on
  To show all results without any content safeguards, append safe=off
  To show single articles in plain-text with full headers, append output=plain
   (this is only enabled for searches using selm=)

============================
TRIMMING GOOGLE GROUPS URLS:
============================
If you want to provide a link to *one* specific post, the shortest format
 is http://groups.google.com/groups?selm= and the Message-ID.  The easiest
 way to get there is to click on the article and copy the URL from the
 "Original Format" link, removing the "&output=plain"
If you want to provide a link to a search, start weeding through the URL
 using the guide above to remove the unnecessary bits.  For example, all
 searches begun from the Advanced page include the date fields; if you're
 not searching by date, just delete them. Sometimes, clicking the "sort by
 date/relevance" link in search results will also clean up the URL for you.
 Just be sure to resubmit your edited URL into your browser to ensure you
 still get the right results.

==============
IN CONCLUSION:
==============
That's as far as I've gotten so far.  This does *NOT* include terms in the
URLs that relate to message threading, but I'll see about adding them
eventually. Please *post* any responses to alt.fan.dejanews rather than
e-mailing me (I'm having e-mail problems right now, and wouldn't want to
lose anybody's replies).

Hope this helps!
--
     ----------> Elisabeth Anne Riba * l...@osmond-riba.org <----------
     "[She] is one of the secret masters of the world: a librarian.  
      They control information. Don't ever piss one off."
                                        - Spider Robinson, "Callahan Touch"


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.