Problem in Extracting and Saving

18 views
Skip to first unread message

singh...@eisti.eu

unread,
May 22, 2018, 9:48:32 AM5/22/18
to beautifulsoup
I am trying to extract infor mation from a html page, In most of the data is in html table format, but there is no id and class is specified for any element. Here is am pasting some data, Please let me know how to extract it and then save it into sqlite3 database. Please Help me.
Here is the some data

<td width="145"><img height="81" src="gouvernement.gif" width="143"/></td>,
 <td bgcolor="#6699cc"><center><h1><font color="#ffffff">ASTA</font></h1></center></td>,
 <td width="64">
 <a href="tapes_de_lst_pdt.jsp?sel=_"><img border="0" height="34" src="flag_fr.gif" width="64"/></a></td>,
 <td bgcolor="#6699cc" valign="top" width="145"><br/><br/><hr/>
 <b><a href="tapes_fr_mnu_pdt.htm" style="text-decoration:none">
  A. Listes<br/>       des produits</a></b><br/><hr/>
 <b><a href="tapes_fr_mnu_mlt.jsp" style="text-decoration:none">
  B. Recherche<br/>       multi-critères</a></b><br/><hr/>
 <b><a href="tapes_fr_lst_lnk.jsp" style="text-decoration:none"> C. Liens et<br/>       documents</a></b><br/><hr/>
 <center><b><a href="http://www.asta.etat.lu" style="text-decoration:none" target="_blank">A S T A</a></b>
 <br/><hr/></center></td>,
 <td width="10"></td>,
 <td><center>
 <br/><br/><form action="tapes_fr_lst_pdt.jsp">
 <input name="sel" type="hidden" value="_"/>
 <input name="format" type="hidden" value="p"/>
 <input type="submit" value="Format Impression"/></form>
 <h2>Liste de tous les produits phytopharmaceutiques</h2>
 </center>
 Champs marquées par *: propriétés du produit ont changé les derniers 6 mois.<br/><br/>
 <table border="1" cellpadding="5" cellspacing="0">
 <tr>
 <td bgcolor="#6699cc" width="20%">Nom commercial</td>
 <td bgcolor="#6699cc">No. et fin agrément</td>
 <td bgcolor="#6699cc">Forme</td>
 <td bgcolor="#6699cc">Détenteur de l'agrément</td>
 <td bgcolor="#6699cc">Substance active</td>
 <td bgcolor="#6699cc">Teneur</td>
 <td bgcolor="#6699cc">Code abeilles</td>
 <td bgcolor="#6699cc">Fiches signalétiques</td></tr>
 <tr><td><b>ABCDE-prothio</b><br/>(importation parallèle)</td>
 <td>
1836-51 <br/>31.12.2018</td> <td>WG</td><td> DuPont de Nemours (Belgium) B.V.B.A</td> <td valign="top">Nicosulfuron</td> <td valign="top">75,0000 %</td> <td> B4 </td> <td> <a href="tapes_fr_nfo_tox.jsp?pdt=1836" onclick="window.open(this.href,'tox','scrollbars=yes,resizable=yes,width=640,height=720').focus();return false;" target="_blank"> [Mentions]</a>   <a href="tapes_fr_nfo_lap.jsp?pdt=1836&amp;lmz=0" onclick="window.open(this.href,'lstapp','scrollbars=yes,resizable=yes,width=640,height=720').focus();return false;" target="_blank"> [Appli.]</a></td></tr> <tr><td><b>Acrobat Extra WG</b></td> <td> 1445-42 <br/>31.01.2019</td> <td>WG</td><td> BASF Belgium Coordination Center Comm. V</td> <td valign="top">Diméthomorphe<br/>Mancozèbe</td> <td valign="top">75,0000 g/kg<br/>667,0000 g/kg</td> <td> B4 </td> <td> <a href="tapes_fr_nfo_tox.jsp?pdt=1445" onclick="window.open(this.href,'tox','scrollbars=yes,resizable=yes,width=640,height=720').focus();return false;" target="_blank"> [Mentions]</a>   <a href="tapes_fr_nfo_lap.jsp?pdt=1445&amp;lmz=0" onclick="window.open(this.href,'lstapp','scrollbars=yes,resizable=yes,width=640,height=720').focus();return false;" target="_blank"> [Appli.]</a></td></tr> <tr><td><b>Actirob B</b></td> <td>
Reply all
Reply to author
Forward
0 new messages