Parse HTML to edit links

Showing 1-1 of 1 messages
Parse HTML to edit links Shubham Goyal 7/22/13 4:54 AM
Hey,

I am trying to parse demo.html to make all relative links absolute. Here is how I try to do this in script.py -

import sys
import io
import requests
import codecs
from bs4 import BeautifulSoup
f = open('demo.html', 'r')
html_text = f.read()
f.close()
soup = BeautifulSoup(html_text)
for a in soup.findAll('a'):
    for x in a.attrs:
        if x == 'href':
            temp = a[x]
            a[x] = "http://www.esplanade.com.sg" + temp
for a in soup.findAll('link'):
    for x in a.attrs:
        if x == 'href':
            temp = a[x]
            a[x] = "http://www.esplanade.com.sg" + temp
for a in soup.findAll('script'):
    for x in a.attrs:
        if x == 'src':
            temp = a[x]
            a[x] = "http://www.esplanade.com.sg" + temp
f = open("demo_result.html", "w")
f.write(soup.prettify().encode("utf-8"))

However, the output file demo_result.html contains many unexpected changes. For example,

    <script type="text/javascript" src="/scripts/ddtabmenu.js" />  
    /***********************************************
    * DD Tab Menu script- (c) Dynamic Drive DHTML code library (www.dynamicdrive.com)
* + Drop Down/ Overlapping Content- 
    * This notice MUST stay intact for legal use
    * Visit Dynamic Drive at http://www.dynamicdrive.com/ for full source code
    ***********************************************/   
    </script>

changes to

  <script src="http://www.esplanade.com.sg/scripts/ddtabmenu.js" type="text/javascript">
  </script>
 </head>
 <body>
  <p>
   /***********************************************
    * DD Tab Menu script- (c) Dynamic Drive DHTML code library (www.dynamicdrive.com)
* + Drop Down/ Overlapping Content- 
    * This notice MUST stay intact for legal use
    * Visit Dynamic Drive at http://www.dynamicdrive.com/ for full source code
    ***********************************************/

Could someone please tell me where I am going wrong?

Thanks and warmest regards.