I am using python `web.py` framework to build a small web app.
It consists of a
1. `Home page` that takes a url as input
2. Reads `anchor text` and `anchor tags` from it
3. Writes it to csv file and downloads it
Here the steps 2 and 3 happens when we clicked on a `export the links` button, below is my code
**code.py**
import web
from web import form
import urlparse
from urlparse import urlparse as ue
import urllib2
from BeautifulSoup import BeautifulSoup
import csv
from cStringIO import StringIO
urls = (
'/', 'index',
'/export', 'export',
)
app = web.application(urls, globals())
render = web.template.render('templates/')
class index:
def GET(self):
return render.home()
class export:
def GET(self):
i = web.input()
if i.has_key('url') and i['url'] !='':
url = i['url']
page = urllib2.urlopen(url)
html = page.read()
page.close()
decoded = ue(url).hostname
if decoded.startswith('www.'):
decoded = ".".join(decoded.split('.')[1:])
file_name = str(decoded.split('.')[0])
csv_file = StringIO()
csv_writer = csv.writer(csv_file)
csv_writer.writerow(['Name', 'Link'])
soup = BeautifulSoup(html)
for anchor_tag in soup.findAll('a', href=True):
csv_writer.writerow([anchor_tag.text,anchor_tag['href']])
web.header('Content-Type','text/csv')
web.header('Content-disposition', 'attachment; filename=%s.csv'%file_name)
return csv_file.getvalue()
if __name__ == "__main__":
app.run()
**home.html**:
$def with()
<html>
<head>
<title>Home Page</title>
</head>
<body>
<form method="GET" action='/export'>
<input type="text" name="url" maxlength="500" />
<input class="button" type="submit" name="export the links" value="export the links" />
</form>
</body>
</html>
The above html code displays a form with a text box that takes a url , and has button `export the links` button that `downloads/exports` the csv file with the anchor tag links and text.
1. For example when we submit `
http://www.google.co.in` and click `export the links`, all the anchor urls and anchor text are saving in to csv file and downloading successfully
2. but for example when we given the other url like `
http://stackoveflow.com` immediately and click `export the links` button, the csv file (created with domain name of the url as shown in the above code) is downloading with tag links , but the downloaded csv file also contains the data(anchor text and links) of the previous url that is `
http://www.google.co.in`.
That is the data is overrriding in the same csv file from different urls, can anyone please let me know whats wrong in the above code(`export class`) that generates the csv file, why the data is overwriting instead of creating a new csv file with the different name created dynamically ?
Finally my intention is to download/export a new csv file with domain name(sliced as above in my code) of the url by writing data (anchor tag text and url ) from the url in to it each time when we give the new url.
Can anyone please extend/make necessary changes to my above code to download an individul csv file for individual url .........