Webscraping of assets in Morningstar

66 views

Skip to first unread message

Estefanìa Chávez

unread,

Apr 3, 2024, 5:42:08 PMApr 3

to beautifulsoup

Hi! My task is recolect specific indicators such as Price/Earnings, Price/Book, Yield-to-maturity, Effective duration and Grades (in the case of assets with a certain percentage of fixed income). So, I've been worked with the first form but this didn´t allow me to extract the indicators mentioned above.

Form 1

pip install beautifulsoup4
pip install lxml
pip install html5lib
import requests
from bs4 import BeautifulSoup
import time
import json

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from contextlib import contextmanager
import sys, os
from functools import reduce
from datetime import datetime
import math

headers = {
'authority' : 'tools.morningstar.co.uk',
'accept':'application/json,text/plain,*/*',
'accept-language':'es-ES,es;q=0.9',
'origin': 'https://www.morningstar.es',
'sec-ch-ua': '"Google Chrome";v="df107","Chromium";v="107","Not=A?Brand";v="24"',
'sec-ch-ua-mobile':'?0',
'sec-ch-ua-platform': '"Windows"',
'sec-fetch-dest':'empty',
'sec-fetch-mode':'cors',
'sec-fetch-site':'cross-site'
}

Id = ['FOESP$$ALL', 'ETEXG$XMAD|ETEUR$$ALL', 'IXMSX$$ALL', 'E0WWE$$ALL']
cat = ['Fondos', 'ETF', 'INDICES', 'ACCIONES']

securityDataPoints = [
'SecId|Name|PriceCurrency|TenforeId|LegalName|ClosePrice|Yield_M12|CategoryName|Medalist_RatingNumber|StarRatingM255SustainabilityRank|ReturnD1|ReturnW1|ReturnM1|ReturnM3|ReturnM6|ReturnM0|ReturnM12|ReturnM36|ReturnM60|ReturnM120|FeeLevel|ManagerTenure|MaxDeferredLoad|InitialPurchase|FundTNAV|EquityStyleBox|BondStyleBox|AverageMarketCapital|AverageCreditQualityCode|EffectiveDuration|MorningstarRiskM255|AlphaM36|BetaM36|R2M36|StandardDeviationM36|SharpeM36|RecordExtension',
'SecId|Name|PriceCurrency|TenforeId|LegalName|ClosePrice|Yield_M12|OngoingCharge|CategoryName|Medalist_RatingNumber|StarRatingM255|SustainabilityRank|GBRReturnD1|GBRReturnW1|GBRReturnM1|GBRReturnM3|GBRReturnM6|GBRReturnM0|GBRReturnM12|GBRReturnM36|GBRReturnM60|GBRReturnM120|MaxFrontEndLoad|ManagerTenure|MaxDeferredLoad|InitialPurchase|FundTNAV|EquityStyleBox|BondStyleBox|AverageMarketCapital|AverageCreditQualityCode|EffectiveDuration|MorningstarRiskM255|AlphaM36|BetaM36|R2M36|StandardDeviationM36|SharpeM36|TrackRecordExtension',
'SecId|Name|PriceCurrency|TenforeId|LegalName|HoldingTypeId|ClosePrice|GBRReturnD1|GBRReturnW1|GBRReturnM1|GBRReturnM3|GBRReturnM6|GBRReturnM0|GBRReturnM12|GBRReturnM36|GBRReturnM60|GBRReturnM120|TrackRecordExtension',
'SecId|LegalName|Name|IndustryName|SectorName|TenforeId|Universe|ExchangeId|Ticker|ClosePrice|StarRatingM255|QuantitativeStarRating|MarketCap|DividendYield|PERatio|PEGRatio|MarketCountryName|EquityStyleBox|ReturnD1|ReturnW1|ReturnM1|ReturnM3|ReturnM6|ReturnM0|ReturnM12|ReturnM36|ReturnM60|ReturnM120|EBTMarginYear1|ROEYear1|ROICYear1|EPSGrowth3YYear1|RevenueGrowth3Y|DebtEquityRatio|NetMargin|ROATTM|ROETTM'
]

dfp = pd.DataFrame()

for i in range(4):
time.sleep(2)
params = {
'page': '1',
'pageSize': '10000',
'sortOrder': 'LegalName asc',
'outputType': 'json',
'version': '1',
'languageId': 'es-ES',
'currencyId': 'EUR',
'universeId': Id[i],
'securityDataPoints': securityDataPoints[i],
'term': '',
'subUniverseId': '',
}

response = pd.DataFrame(requests.get('https://tools.morningstar.co.uk/api/rest.svc/klr5zyak8x/security/screener', params=params, headers=headers).json()['rows'])
response['Cat'] = cat[i]
dfp = pd.concat([dfp, response])

Then I realized that the the first method proportioned me a huge list of asset (photo 1)

The SecId (column 2) coincides with a part of the url:

https://www.morningstar.ch/ch/funds/snapshot/snapshot.aspx?id=F00000THSB&tab=3

Therefore I codify this:

#1 Libraries
pip install beautifulsoup4
pip install lxml
pip install html5lib
import requests
from bs4 import BeautifulSoup
import time
import json

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from contextlib import contextmanager
import sys, os
from functools import reduce
from datetime import datetime
import math

url = "https://www.morningstar.ch/ch/funds/snapshot/snapshot.aspx?id=F00000THSB&tab=3"

response = requests.get(url)
soup = BeautifulSoup(response.content,"url. parser")

Nevertheless, the answer in the kernel was this:

FeatureNotFound: Couldn't find a tree builder with the features you requested: url. parser. Do you need to install a parser library?

I know that there exists APIs but in the case of morningstar I need an account. Furthermore with BeautifulSoup4 I might extract the data with the HTML.

In essence, the data is presented in this way:

Thank you so much in advance. Estefania

Carlos

unread,

Apr 8, 2024, 7:25:01 AMApr 8

to beautifulsoup

Hello, the error FeatureNotFound in that case means that the parser you passed to the BeautifulSoup object (second argument) is not supported. You specified "url.parser", but the supported ones are "html5lib", "lxml" and "html.parser". So just changing that line to

soup = BeautifulSoup(response.content, "lxml")

it should work.

Reply all

Reply to author

Forward

0 new messages