Groups
Sign in
Groups
portia-scraper
Conversations
About
Send feedback
Help
portia-scraper
Contact owners and managers
1–30 of 46
Welcome!. This is the discussion group of the
Portia open-source visual web scraper
.
Source code & issues are managed through the
Github project
. Questions (about development or usage) should be asked here, not on Github.
Mark all as read
Report group
0 selected
Aarya Disuja
9/24/18
Are you looking for best data scraping services provider company? Hire us now!!
Our experts offer a wide range of useful data according to your needs. We will manage all the
unread,
Are you looking for best data scraping services provider company? Hire us now!!
Our experts offer a wide range of useful data according to your needs. We will manage all the
9/24/18
ayushman...@gmail.com
3/10/18
Portia generate incorrect scrapy spider for domains beginning with numbers
Hello mentor and developers I am Ayushman Koul, a student at GCET, Jammu. I went through the Portia
unread,
Portia generate incorrect scrapy spider for domains beginning with numbers
Hello mentor and developers I am Ayushman Koul, a student at GCET, Jammu. I went through the Portia
3/10/18
Олег Сериков
2/25/18
GSoC 2018, page clustering questions
Hello, my name is Oleg and I hope to participate in GSoC. I have few questions about the "
unread,
GSoC 2018, page clustering questions
Hello, my name is Oleg and I hope to participate in GSoC. I have few questions about the "
2/25/18
Divyadeep Singh
2/14/18
Contributing to portia in GSOC 2018
Hi everyone! My name is Divyadeep Singh and I am a 3rd year (Information Technology) student at HBTU,
unread,
Contributing to portia in GSOC 2018
Hi everyone! My name is Divyadeep Singh and I am a 3rd year (Information Technology) student at HBTU,
2/14/18
Nikhil Ranjan
2/12/18
Contributing for GSoC 2018
Dear Mentors, I am Nikhil Ranjan, a third year student majoring in Computer Science and Biological
unread,
Contributing for GSoC 2018
Dear Mentors, I am Nikhil Ranjan, a third year student majoring in Computer Science and Biological
2/12/18
vir...@gmail.com
2/11/18
Contributing to Portia in GSoC 2018
Hello Mentors and Developers, I am Viral Mehta, a student at BITS Pilani, Hyderabad Campus. I went
unread,
Contributing to Portia in GSoC 2018
Hello Mentors and Developers, I am Viral Mehta, a student at BITS Pilani, Hyderabad Campus. I went
2/11/18
Viren Parmar
1/26/18
I want to be a contributor or Mentor
Hello, I want to be a Contributor or Mentor for your organization, so please help me with this. so I
unread,
I want to be a contributor or Mentor
Hello, I want to be a Contributor or Mentor for your organization, so please help me with this. so I
1/26/18
Henry John
4/13/17
errors when running portia spiders
When click the “Run this spider” icon in webui, the error message: 2017-04-13 20:16:46+0800 [
unread,
errors when running portia spiders
When click the “Run this spider” icon in webui, the error message: 2017-04-13 20:16:46+0800 [
4/13/17
Henry John
4/13/17
Why Portia force using scrapyd to deploy and run spiders?
Hi, scrapyd is not a good deployment env for use, But the portia webui will use scrapyd while we
unread,
Why Portia force using scrapyd to deploy and run spiders?
Hi, scrapyd is not a good deployment env for use, But the portia webui will use scrapyd while we
4/13/17
Henry John
4/13/17
How to set spider name?
The manual said: portiacrawl PROJECT_PATH SPIDER_NAME that command can activate spider to fetch pages
unread,
How to set spider name?
The manual said: portiacrawl PROJECT_PATH SPIDER_NAME that command can activate spider to fetch pages
4/13/17
Henry John
4/13/17
How to scrapy different annotation to the same Field in a sample?
Hi, all this is the html fragment <div class="d-title">功能概述 <span class="d-
unread,
How to scrapy different annotation to the same Field in a sample?
Hi, all this is the html fragment <div class="d-title">功能概述 <span class="d-
4/13/17
Sourabh Majumdar
3/11/17
Increasing the Crawling Performance
I am also interesrted in Contributing to the Project Labeled as "Increasing Crawling Performance
unread,
Increasing the Crawling Performance
I am also interesrted in Contributing to the Project Labeled as "Increasing Crawling Performance
3/11/17
Dierk Pfeiffer
,
madhusai ravada
3
3/1/17
403
On Saturday, May 2, 2015 at 1:24:33 PM UTC+5:30, Dierk Pfeiffer wrote: Hi it seams that some pages
unread,
403
On Saturday, May 2, 2015 at 1:24:33 PM UTC+5:30, Dierk Pfeiffer wrote: Hi it seams that some pages
3/1/17
madhusai ravada
3/1/17
contributing for portia spider generation , GSOC 2017
I am interested in the idea of making new spiders whenever the layout of website is modified. I have
unread,
contributing for portia spider generation , GSOC 2017
I am interested in the idea of making new spiders whenever the layout of website is modified. I have
3/1/17
Mahmoud Mohammadi
2/28/17
GSoc 2017: Increase Crawling Performance through page clustering
Dear Mentors, I am Mahmoud, a computer science grad student form US. I am planning to contribute in
unread,
GSoc 2017: Increase Crawling Performance through page clustering
Dear Mentors, I am Mahmoud, a computer science grad student form US. I am planning to contribute in
2/28/17
张龙
2/28/17
Contributing for GSOC 2017 : Portia Spider Generation
Hi everyone, I'm a sophomore study in computer science in Chengdu and a newbie to GSOC as well. I
unread,
Contributing for GSOC 2017 : Portia Spider Generation
Hi everyone, I'm a sophomore study in computer science in Chengdu and a newbie to GSOC as well. I
2/28/17
Mit Pandya
2/10/17
Increase Crawling Performance through page clustering
Hi, I am enthusiastically looking forward to contribute to Google Summer of Code 2017 and I am
unread,
Increase Crawling Performance through page clustering
Hi, I am enthusiastically looking forward to contribute to Google Summer of Code 2017 and I am
2/10/17
Satwik Kansal
2/7/17
Contributing for GSOC
Hi everyone, I'm a python enthusiast from New Delhi, India. I'm interested in contributing to
unread,
Contributing for GSOC
Hi everyone, I'm a python enthusiast from New Delhi, India. I'm interested in contributing to
2/7/17
Paul Tremberth
2
3/24/16
Portia and Google Summer of Code 2016
Hello all, Deadline for submitting your final GSoC 2016 proposals is tomorrow Friday March 25, 19:00
unread,
Portia and Google Summer of Code 2016
Hello all, Deadline for submitting your final GSoC 2016 proposals is tomorrow Friday March 25, 19:00
3/24/16
anurag sharma
3/11/16
Does portia support clicking on variants (multiple in my case) or its static page extraction tool
Hello All, I am new to Portia, so kindly pardon me if this question sounds stupid. I am trying to
unread,
Does portia support clicking on variants (multiple in my case) or its static page extraction tool
Hello All, I am new to Portia, so kindly pardon me if this question sounds stupid. I am trying to
3/11/16
Timo Cordes
2/26/16
Portiacrawl delivers no information
Hi together, First the important thing: I searched a long time for an scraper that works for me (
unread,
Portiacrawl delivers no information
Hi together, First the important thing: I searched a long time for an scraper that works for me (
2/26/16
Rachita Chhaparia
, …
Akash Goel
6
2/14/16
Using Variants to extract multiple items from the same page
As far as I can see, the listings are under a <div> with class="listing", and each
unread,
Using Variants to extract multiple items from the same page
As far as I can see, the listings are under a <div> with class="listing", and each
2/14/16
Shivam Malhotra
,
David Bengoa Rocandio
2
12/17/15
Regarding json template stored for annotated page
Hi Shivam, You are correct, the annotated HTML is parsing and matching against the scraped HTML is
unread,
Regarding json template stored for annotated page
Hi Shivam, You are correct, the annotated HTML is parsing and matching against the scraped HTML is
12/17/15
Ruairi Fahy
8/14/15
JavaScript Support in Portia
JavaScript support has recently been made available in the Portia repository. Find out more about how
unread,
JavaScript Support in Portia
JavaScript support has recently been made available in the Portia repository. Find out more about how
8/14/15
Prabhakar D
6/5/15
How portia spider extract data from all related URLs while deployment
In portia we are annotating a webpage but while deploying/running Portia spider, how the related
unread,
How portia spider extract data from all related URLs while deployment
In portia we are annotating a webpage but while deploying/running Portia spider, how the related
6/5/15
Prabhakar D
5/29/15
Link types to be followed in scrapyd from Portia spider
What are the link types to be followed by slybot? I mean links ending with .html, .xml, .jpg, .png, .
unread,
Link types to be followed in scrapyd from Portia spider
What are the link types to be followed by slybot? I mean links ending with .html, .xml, .jpg, .png, .
5/29/15
Prabhakar D
5/21/15
Load HTML content in a variable to Portia
I am having a python script, in that I have HTML content which is stored in a variable Ex: myhtml =
unread,
Load HTML content in a variable to Portia
I am having a python script, in that I have HTML content which is stored in a variable Ex: myhtml =
5/21/15
Prabhakar D
,
Ruairi Fahy
3
5/11/15
Add files to load in Portia [Feature]
Thanks for your valuable reply. I am interested to create middleware for this. Where can I able to
unread,
Add files to load in Portia [Feature]
Thanks for your valuable reply. I am interested to create middleware for this. Where can I able to
5/11/15
jamesjosh
,
Prabhakar D
2
5/7/15
Please let me know , which tools better to crawl scrap ajax, javascript, pdf and word.doc file ?
Hello james, Currently, there is no support to crawl ajax, javascript, pdf and word.doc in Portia. On
unread,
Please let me know , which tools better to crawl scrap ajax, javascript, pdf and word.doc file ?
Hello james, Currently, there is no support to crawl ajax, javascript, pdf and word.doc in Portia. On
5/7/15
Prayash Mohapatra
,
Ruairi Fahy
4
3/24/15
Query for gsoc15 - Browser Addon for Portia
The main codebase should remain consistent with the existing Portia UI. The current system using an
unread,
Query for gsoc15 - Browser Addon for Portia
The main codebase should remain consistent with the existing Portia UI. The current system using an
3/24/15