Hi Dave,We had a use case where we need to get only filenames from around 1000 repos in Gerrit at a time. We tried Gerrit REST API but it would only give us information about modified/added/deleted files during a change but not all filenames from a repo. So we thought of scraping Gitiles Webpage to obtain filenames, but when we use python requests.get with authentication we are getting HTTP response as html containing <h1>Cannot Parse as Gitiles URL</h1> . Can you please help me how to resolve this and get the page sourceThanks and Regards,Anurag Aravala
Thanks for the information. The gitiles url I'm using is "https://ec-gerrit.<company>.com/plugins/gitiles/<project name>/+/refs/heads/master". This url uses saml for authentication and gives the webpage containing files under a branch of a project in Gerrit. Until getting the list of branches in a project, it will be a Gerrit url only and there won't be any gitiles in the url, but after clicking on a particular branch I'd , the files will be listed and the url will be in the above mentioned format. I'm using python requests library to get the html. I have attached the webpage(I have created HTML file using the response, but if I use the url in browser, it works fine) which is the response I'm getting after sending the request with python requests.get for the above url.You have mentioned the possible issues in your email but I'm sending this mail to explain the error in detail with the url so that you can help me in diagnosing the problem. Please help me with information about possible issues while requesting for the page source. I have already posted this in repo-discuss and I didn't get any reply to the scraping issue but people suggested me to clone the repos to get filenames which is impossible in our case. Please help me.Thanks & Regards,Anurag
Anurag Aravala.