python lxml not working properly when i compare two html tables

53 views
Skip to first unread message

Saran Raj

unread,
Apr 21, 2018, 10:20:06 AM4/21/18
to Python Challenge
We having two html files with tables of same/difference in rows/columns.

We need to find the difference between two html tables and display the output with <del> or <ins> tag within the existing table.

This is my code snippet :

    table1

    <table class="table table-bordered"><tbody>
    <tr><td>S.no</td><td>Abbrevation</td><td>Explanation</td></tr>
    <tr><td>1</td><td>GxP</td><td><p>Good 'x' Practice</p></td></tr>
    <tr><td>2</td><td>ERP</td><td>Enterprise</td></tr>
    </tbody></table>

    table2

    <table class="table table-bordered"><tbody>
    <tr><td>S.no</td><td>term</td><td>Abbrevation</td><td>Explanation</td></tr>
    
    <tr><td>1</td><td>1</td>  <td>GxP</td><td><p>Good 'x' Practice</p></td></tr>
    <tr><td>2</td><td>1</td><td>ERP</td><td>Enterprise</td></tr>
    <tr><td>2</td><td>1</td><td>ERP</td><td>Enterprise</td></tr></tbody></table>
    
    from lxml xml.html.diff import diff
    
    doc1='''<table class="table table-bordered"><tbody>
    <tr><td>S.no</td><td>Abbrevation</td><td>Explanation</td></tr>
    <tr><td>1</td><td>GxP</td><td><p>Good 'x' Practice</p></td></tr>
    <tr><td>2</td><td>ERP</td><td>Enterprise</td></tr>
    </tbody></table>'''
    
    doc2='''<table class="table table-bordered"><tbody>
    <tr><td>S.no</td><td>term</td><td>Abbrevation</td><td>Explanation</td></tr>
    <tr><td>1</td><td>1</td>  <td>GxP</td><td><p>Good 'x' Practice</p></td></tr>
    <tr><td>2</td><td>1</td><td>ERP</td><td>Enterprise</td></tr>
    <tr><td>2</td><td>1</td><td>ERP</td><td>Enterprise</td></tr></tbody></table>'''
    
    print(htmldiff(doc2,doc1))
    
    Result of difference
    <table class="table table-bordered"><tbody> <tr><td><ins>S.no</ins></td><td><ins>Abbrevation</ins></td><td><ins>Explanation</ins></td></tr> <tr><td><del>S.no</del></td><td><del>term</del></td><td><del>Abbrevation</del></td><td><del>Explanation</del></td></tr><del> </del><td><del>1</del></td> <tr><td>1</td><td>GxP</td><td><p>Good 'x' Practice</p></td></tr> <tr><td>2</td> <td><ins>ERP</ins></td><td><ins>Enterprise</ins></td> <td><del>1</del></td><td><del>ERP</del></td><td><del>Enterprise</del></td><tr><td><del>2</del></td><td><del>1</del></td><td><del>ERP</del></td><td><del>Enterprise</del></td></tr> </tr> </tbody></table>

Why python giving result in two html tables instead of in one single table
Reply all
Reply to author
Forward
0 new messages