Extracting a value after a find_all()

81 views
Skip to first unread message

jehoshua

unread,
Nov 30, 2024, 1:18:36 AM11/30/24
to beautifulsoup
Data ..

 <TRANSACTION id="T000000000000014727" entrydate="2024-04-20" memo="" commodity="AUD" postdate="2024-04-20">
   <SPLITS>
    <SPLIT id="S0001" number="" reconciledate="" price="1/1" payee="P000035" account="A000831" value="-1499/50" memo="AAA Duracell - 48 pack" action="" bankid="" reconcileflag="2" shares="-1499/50"/>
    <SPLIT id="S0002" number="" reconciledate="" price="1/1" payee="P000035" account="A000620" value="1499/50" memo="AAA Duracell - 48 pack" action="" bankid="" reconcileflag="0" shares="1499/50"/>
   </SPLITS>
  </TRANSACTION>

Code

#!/usr/bin/python

import bs4
from bs4 import BeautifulSoup

with open('testxml.xml', 'r') as f:
    file = f.read()

soup = BeautifulSoup(file, 'xml')

# only transactions with account A000831 or A000832
for tag in soup.find_all("SPLIT", account=["A000831","A000832"]):
    print("Anchor: ", tag)

print('finished')

OUTPUT

Anchor:  <SPLIT account="A000831" action="" bankid="" id="S0001" memo="AAA Duracell - 48 pack" number="" payee="P000035" price="1/1" reconciledate="" reconcileflag="2" shares="-1499/50" value="-1499/50"/>

How do I extract the "value" ?  Is it a tag within a tag ?  Once I  can extract the "-1499/50" , then just a matter of conversion to numerics and do the math  =  - 29.98

Isaac Muse

unread,
Nov 30, 2024, 12:46:30 PM11/30/24
to beautifulsoup

See example below:

import bs4 from bs4 import BeautifulSoup XML = """ <TRANSACTION id="T000000000000014727" entrydate="2024-04-20" memo="" commodity="AUD" postdate="2024-04-20"> <SPLITS> <SPLIT id="S0001" number="" reconciledate="" price="1/1" payee="P000035" account="A000831" value="-1499/50" memo="AAA Duracell - 48 pack" action="" bankid="" reconcileflag="2" shares="-1499/50"/> <SPLIT id="S0002" number="" reconciledate="" price="1/1" payee="P000035" account="A000620" value="1499/50" memo="AAA Duracell - 48 pack" action="" bankid="" reconcileflag="0" shares="1499/50"/> </SPLITS> </TRANSACTION> """ soup = BeautifulSoup(XML, 'xml') # only transactions with account A000831 or A000832 for tag in soup.find_all("SPLIT", account=["A000831","A000832"]): print("Anchor: ", tag) print("Value:", tag['value']) print('finished')

jehoshua

unread,
Nov 30, 2024, 5:22:42 PM11/30/24
to beautifulsoup
Great, that works fine. Thank you Isaac  :)

jehoshua

unread,
Nov 30, 2024, 6:56:15 PM11/30/24
to beautifulsoup
Is there any difference between this ..

print("Value:", tag['value'])

and this ?

print("Value:", tag.get('value'))

Isaac Muse

unread,
Nov 30, 2024, 7:59:49 PM11/30/24
to beautifulsoup

.get() is good for returning defaults in case the attribute is not found .get('attr', 'some-default').

jehoshua

unread,
Nov 30, 2024, 8:07:44 PM11/30/24
to beautifulsoup
Okay, thank you  :)
Reply all
Reply to author
Forward
0 new messages