Issue in fetch URL

29 views
Skip to first unread message

Parthasarathi Mukhopadhyay

unread,
Apr 7, 2021, 4:03:43 PM4/7/21
to openr...@googlegroups.com
Dera all

Could you plz help me to solve this puzzle?

1.When I issue this (mentioned below) over my browser it gives me result -


[{"id":"422060","publisher":"SAGE","issn":"0376-9836","journal":"Indian Historical Review","role":"Advisory Committee","editor":"K K Thapliyal","affiliation":"Lucknow University, Lucknow","url":"https:\/\/journals.sagepub.com\/editorial-board\/ihra","date":"2020-12-14"},{"id":"436346","publisher":"SAGE","issn":"2631-8318","journal":"Journal of Psychosexual Health","role":"Column Editors","editor":"Aleem Siddiqui","affiliation":"Eras Lucknow Medical College and Hospital, Lucknow","url":"https:\/\/journals.sagepub.com\/editorial-board\/ssha","date":"2020-12-14"},{"id":"67963","publisher":"Elsevier","issn":null,"journal":"Indian Journal of Tuberculosis","role":"Section Editors","editor":"Dr. Rajendra Prasad","affiliation":"King George's Medical University, Lucknow","url":"https:\/\/www.journals.elsevier.com\/indian-journal-of-tuberculosis\/editorial-board","date":"2020-12-16"},{"id":"67971","publisher":"Elsevier","issn":null,"journal":"Indian Journal of Tuberculosis","role":"National Advisers","editor":"Dr. Surya Kant","affiliation":"King George's Medical University Department of Pulmonary Medicine, Lucknow","url":"https:\/\/www.journals.elsevier.com\/indian-journal-of-tuberculosis\/editorial-board","date":"2020-12-16"},{"id":"305935","publisher":"IGI Global","issn":"2334-4628","journal":"International Journal of Corporate Finance and Accounting (IJCFA)","role":"Editorial Review Board","editor":"Sana Moid","affiliation":"Amity University, Lucknow Campus","url":"https:\/\/www.igi-global.com\/journal\/international-journal-corporate-finance-accounting\/67810","date":"2021-01-18"},{"id":"334760","publisher":"Karger","issn":"0257-2753","journal":"Digestive Diseases","role":"Associate Editors","editor":"Uday C. Ghoshal","affiliation":"Sanjay Gandhi Post Graduate Institute of Medical Sciences, Lucknow, IndiaClinical Pharmacology and Therapeutics","url":"https:\/\/www.karger.com\/Journal\/EditorialBoard\/224231","date":"2021-01-28"},{"id":"392732","publisher":"SAGE","issn":"0971-8907","journal":"Paradigm","role":"Associate Editors","editor":"Chandan Sharma","affiliation":"Professor, Business Environment, Indian Institute of Management Lucknow","url":"https:\/\/journals.sagepub.com\/editorial-board\/para","date":"2020-12-14"},{"id":"403976","publisher":"SAGE","issn":"2632-3273","journal":"The Traumaxilla","role":"Assistant Editors","editor":"Harmurti Singh","affiliation":"Senior Lecturer, Department Of Oral & Maxillofacial Surgery, Career Post Graduate Institute Of Dental Sciences IIM Road, Ghailla, Lucknow","url":"https:\/\/journals.sagepub.com\/editorial-board\/tmxa","date":"2020-12-14"},{"id":"403982","publisher":"SAGE","issn":"2632-3273","journal":"The Traumaxilla","role":"Editorial Advisory Board","editor":"Abbas Ali Mahdi","affiliation":"Vice Chancellor, Erss University, Lucknow","url":"https:\/\/journals.sagepub.com\/editorial-board\/tmxa","date":"2020-12-14"},{"id":"404014","publisher":"SAGE","issn":"2632-3273","journal":"The Traumaxilla","role":"Editorial Advisory Board","editor":"Hemant Gupta","affiliation":"Professor and Head, Professor and Head, BBD University, Lucknow","url":"https:\/\/journals.sagepub.com\/editorial-board\/tmxa","date":"2020-12-14"},{"id":"412467","publisher":"SAGE","issn":"2472-7512","journal":"Craniomaxillofacial Trauma & Reconstruction Open","role":"Craniomaxillofacial Trauma","editor":"Divya Mehrotra, MDS, FDS RCPS (Glasgow, London) FAMS, AO Fellow","affiliation":"Oral & Maxillofacial Surgeon Vice Dean Faculty of Dental Sciences Additional Controller of examinations Faculty In-charge DHR MRU Lab King George's Medical University, Lucknow-226003","url":"https:\/\/journals.sagepub.com\/editorial-board\/cmoa","date":"2020-12-14"},{"id":"435088","publisher":"SAGE","issn":"2516-600X","journal":"Journal of Operations and Strategic Planning","role":"Journal Editorial Board","editor":"Sushil Kumar","affiliation":"Professor, Operations Management Area, IIM Lucknow","url":"https:\/\/journals.sagepub.com\/editorial-board\/ospa","date":"2020-12-14"},{"id":"435097","publisher":"SAGE","issn":"2516-600X","journal":"Journal of Operations and Strategic Planning","role":"Journal Editorial Board","editor":"Suresh Jhakar","affiliation":"IIM Lucknow","url":"https:\/\/journals.sagepub.com\/editorial-board\/ospa","date":"2020-12-14"},{"id":"436349","publisher":"SAGE","issn":"2631-8318","journal":"Journal of Psychosexual Health","role":"Column Editors","editor":"Adarsh Tripathi","affiliation":"King Georges Medical University, Lucknow","url":"https:\/\/journals.sagepub.com\/editorial-board\/ssha","date":"2020-12-14"},{"id":"455255","publisher":"Brill","issn":null,"journal":"Youth and Globalization","role":"Editorial Board","editor":"Vinod Chandra","affiliation":"Lucknow University","url":"https:\/\/brill.com\/yogo\/yogo-overview.xml?contents=About","date":"2020-12-16"}]

2. But I'm receiving noting through fetch URL option against the following expression -

"https://openeditors.ooir.org/export-json.php?editor_query=" + value +" " + "NOT" +" " +"(India" +" " +"OR" +" " + '"'+cells.admin_name.value +'"' +")"

3. I tried this too but no luck >>

"https://openeditors.ooir.org/export-json.php?editor_query=" + value.escape('url') +" " + "NOT" +" " +"(India" +" " +"OR" +" " + '"'+cells.admin_name.value.escape('url') +'"' +")"

What am I missing here?

Best regards


-----------------------------------------------------------------------
Parthasarathi Mukhopadhyay
University of Kalyani, Kalyani - 741 235 (WB), India
-----------------------------------------------------------------------

Thad Guidry

unread,
Apr 7, 2021, 4:19:32 PM4/7/21
to openr...@googlegroups.com
1. Construct the whole URL first in ColumnA using concatenation as you are trying to do.
2. Once ColumnA's URL's look great... make a ColumnB based on ColumnA and take it's value and escape it.
 
  value.escape(cells.ColumnA)

3. Then run a Fetch on that ColumnB



--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/CAGM_5uZXeobzNR5Jvd0_-hhsJn87UvWvtnh9fuiWE-0LErBYew%40mail.gmail.com.

Parthasarathi Mukhopadhyay

unread,
Apr 8, 2021, 5:01:11 AM4/8/21
to openr...@googlegroups.com
Thanks Thad for the advice.

The above grel expression  value.escape(cells.ge) is giving me null in the resulting column.

When I use value.escape('url')(cells.ge) or escape(value,"url"), it gives me result like this: https%3A%2F%2Fopeneditors.ooir.org%2Fexport-json.php%3Feditor_query%3DLucknow+NOT+%28India+OR+%22Uttar+Pradesh%22%29

When I use value.escape('xml')(cells.ge), it gives me result like this : https://openeditors.ooir.org/export-json.php?editor_query=Lucknow NOT (India OR "Uttar Pradesh")

In all cases it fails to store json data.

Any suggestions?

Regards

Owen Stephens

unread,
Apr 8, 2021, 5:55:08 AM4/8/21
to OpenRefine
When I hit an issue with the fetch, the first thing I do is try to do it on a single row with the option to "Store error" checked (this is under the "Column name" in the "Add column by fetching URLs" dialogue). Hopefully this will give you some more information about what's going wrong.

The most common issues that we've seen people have over time are listed in the documentation https://docs.openrefine.org/manual/columnediting/#common-errors with some guidance on resolving the issue. However, if your error doesn't fall into those categories, or you can't resolve it - post the error here and we can see if we can resolve.

One other thing to check that isn't in that list is whether your browser is set to use a proxy when retrieving - if it is, then it could be a local proxy or firewall issue causing the problem

Try capturing any errors and let us know what you get

Owen

Parthasarathi Mukhopadhyay

unread,
Apr 8, 2021, 6:57:19 AM4/8/21
to openr...@googlegroups.com
Thanks.

When I use grel exp "https://openeditors.ooir.org/export-json.php?editor_query=" + value the result is populated (column tt5) but when I use

"https://openeditors.ooir.org/export-json.php?editor_query=" + value.escape('url') +" " + "NOT" +" " +"(India" +" " +"OR" +" " + '"'+cells.admin_name.value +'"' +")"

OR

"https://openeditors.ooir.org/export-json.php?editor_query=" + value.escape('url') +" " + "NOT" +" " +"(India" +" " +"OR" +" " + '"'+cells.admin_name.value.escape('url') +'"' +")" it is giving 400 error (column tt3)
but the resulting array [ https://openeditors.ooir.org/export-json.php?editor_query=Lucknow NOT (India OR "Uttar Pradesh") ]can show records when pasted in a browser address bar 


admin_name
city_ascii
tt5
tt3
  
15.
Uttar Pradesh
Lucknow
[{"id":"54525","publisher":"Elsevier","issn":null,"journal":"Current Plant Biology","role":"Editorial Advisory Board","editor":"Mohammad Israil Ansari","affiliation":"University of Lucknow, Lucknow, India","url":"https:\/\/www.journals.elsevier.com\/current-plant-biology\/editorial-board","date":"2020-12-15"},{"id":"55444",......
HTTP error 400 : Bad request | <html><body><h1>400 Bad request</h1> Your browser sent an invalid request. </body></html>
Regards

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.

Owen Stephens

unread,
Apr 8, 2021, 7:11:37 AM4/8/21
to OpenRefine
Thanks for this - I can see the issue now.

You are using `escape('url')` on the values in the cell, but in your overall expression you are adding in characters (specifically spaces) which are not valid URL characters. Your browser automatically coverts the spaces (and other characters that are not valid in a URL) to the "escaped" or url-encoded versions - so a space becomes %20 (or + ) etc.

So - you either need to escape/url-encode the whole of the query, or when you are adding in extra characters in your expression you need to use the escaped/encoded values.

So you should find the following works:

"https://openeditors.ooir.org/export-json.php?editor_query=" + (value +" " + "NOT" +" " +"(India" +" " +"OR" +" " + '"'+cells.admin_name.value +'"' +")").escape('url')

This URL encodes the whole of the query string, rather than just the values from your cells

That works for me - I get the response:

[
  {
    "id": "422060",
    "publisher": "SAGE",
    "issn": "0376-9836",
    "journal": "Indian Historical Review",
    "role": "Advisory Committee",
    "editor": "K K Thapliyal",
    "affiliation": "Lucknow University, Lucknow",
    "date": "2020-12-14"
  },
  {
    "id": "436346",
    "publisher": "SAGE",
    "issn": "2631-8318",
    "journal": "Journal of Psychosexual Health",
    "role": "Column Editors",
    "editor": "Aleem Siddiqui",
    "affiliation": "Era<U+0092>s Lucknow Medical College and Hospital, Lucknow",
    "date": "2020-12-14"
  },
  {
    "id": "67963",
    "publisher": "Elsevier",
    "issn": null,
    "journal": "Indian Journal of Tuberculosis",
    "role": "Section Editors",
    "editor": "Dr. Rajendra Prasad",
    "affiliation": "King George's Medical University, Lucknow",
    "date": "2020-12-16"
  },
  {
    "id": "67971",
    "publisher": "Elsevier",
    "issn": null,
    "journal": "Indian Journal of Tuberculosis",
    "role": "National Advisers",
    "editor": "Dr. Surya Kant",
    "affiliation": "King George's Medical University Department of Pulmonary Medicine, Lucknow",
    "date": "2020-12-16"
  },
  {
    "id": "305935",
    "publisher": "IGI Global",
    "issn": "2334-4628",
    "journal": "International Journal of Corporate Finance and Accounting (IJCFA)",
    "role": "Editorial Review Board",
    "editor": "Sana Moid",
    "affiliation": "Amity University,  Lucknow Campus",
    "date": "2021-01-18"
  },
  {
    "id": "334760",
    "publisher": "Karger",
    "issn": "0257-2753",
    "journal": "Digestive Diseases",
    "role": "Associate Editors",
    "editor": "Uday C. Ghoshal",
    "affiliation": "Sanjay Gandhi Post Graduate Institute of Medical Sciences, Lucknow, IndiaClinical Pharmacology and Therapeutics",
    "date": "2021-01-28"
  },
  {
    "id": "392732",
    "publisher": "SAGE",
    "issn": "0971-8907",
    "journal": "Paradigm",
    "role": "Associate Editors",
    "editor": "Chandan Sharma",
    "affiliation": "Professor, Business Environment, Indian Institute of Management Lucknow",
    "date": "2020-12-14"
  },
  {
    "id": "403976",
    "publisher": "SAGE",
    "issn": "2632-3273",
    "journal": "The Traumaxilla",
    "role": "Assistant Editors",
    "editor": "Harmurti Singh",
    "affiliation": "Senior Lecturer, Department Of Oral & Maxillofacial Surgery, Career Post Graduate Institute Of Dental Sciences IIM Road, Ghailla, Lucknow",
    "date": "2020-12-14"
  },
  {
    "id": "403982",
    "publisher": "SAGE",
    "issn": "2632-3273",
    "journal": "The Traumaxilla",
    "role": "Editorial Advisory Board",
    "editor": "Abbas Ali Mahdi",
    "affiliation": "Vice Chancellor, Ers<U+0092>s University, Lucknow",
    "date": "2020-12-14"
  },
  {
    "id": "404014",
    "publisher": "SAGE",
    "issn": "2632-3273",
    "journal": "The Traumaxilla",
    "role": "Editorial Advisory Board",
    "editor": "Hemant Gupta",
    "affiliation": "Professor and Head, Professor and Head, BBD University, Lucknow",
    "date": "2020-12-14"
  },
  {
    "id": "412467",
    "publisher": "SAGE",
    "issn": "2472-7512",
    "journal": "Craniomaxillofacial Trauma & Reconstruction Open",
    "role": "Craniomaxillofacial Trauma",
    "editor": "Divya Mehrotra, MDS, FDS RCPS (Glasgow, London) FAMS, AO Fellow",
    "affiliation": "Oral & Maxillofacial Surgeon Vice Dean Faculty of Dental Sciences Additional Controller of examinations Faculty In-charge DHR MRU Lab King George's Medical University, Lucknow-226003",
    "date": "2020-12-14"
  },
  {
    "id": "435088",
    "publisher": "SAGE",
    "issn": "2516-600X",
    "journal": "Journal of Operations and Strategic Planning",
    "role": "Journal Editorial Board",
    "editor": "Sushil Kumar",
    "affiliation": "Professor, Operations Management Area, IIM Lucknow",
    "date": "2020-12-14"
  },
  {
    "id": "435097",
    "publisher": "SAGE",
    "issn": "2516-600X",
    "journal": "Journal of Operations and Strategic Planning",
    "role": "Journal Editorial Board",
    "editor": "Suresh Jhakar",
    "affiliation": "IIM Lucknow",
    "date": "2020-12-14"
  },
  {
    "id": "436349",
    "publisher": "SAGE",
    "issn": "2631-8318",
    "journal": "Journal of Psychosexual Health",
    "role": "Column Editors",
    "editor": "Adarsh Tripathi",
    "affiliation": "King George<U+0092>s Medical University, Lucknow",
    "date": "2020-12-14"
  },
  {
    "id": "455255",
    "publisher": "Brill",
    "issn": null,
    "journal": "Youth and Globalization",
    "role": "Editorial Board",
    "editor": "Vinod Chandra",
    "affiliation": "Lucknow University",
    "date": "2020-12-16"
  }
]

Parthasarathi Mukhopadhyay

unread,
Apr 8, 2021, 7:32:52 AM4/8/21
to openr...@googlegroups.com
Yessss.... It's now working like a charm.

Thanks a ton Owen.

Great learning experience for me.

Best regards

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages