Webscraping of assets in Morningstar

64 views
Skip to first unread message

Estefanìa Chávez

unread,
Apr 3, 2024, 5:42:08 PMApr 3
to beautifulsoup
Hi! My task is recolect specific indicators such as Price/Earnings, Price/Book, Yield-to-maturity, Effective duration and Grades (in the case of assets with a certain percentage of fixed income). So, I've been worked with the first form but this didn´t allow me to extract the indicators mentioned above. 

Form 1
pip install beautifulsoup4
pip install lxml
pip install html5lib
import requests
from bs4 import BeautifulSoup
import time
import json

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from contextlib import contextmanager
import sys, os
from functools import reduce
from datetime import datetime
import math

headers = {
    'authority' : 'tools.morningstar.co.uk',
    'accept':'application/json,text/plain,*/*',
    'accept-language':'es-ES,es;q=0.9',
    'origin': 'https://www.morningstar.es',
    'sec-ch-ua': '"Google Chrome";v="df107","Chromium";v="107","Not=A?Brand";v="24"',
    'sec-ch-ua-mobile':'?0',
    'sec-ch-ua-platform': '"Windows"',
    'sec-fetch-dest':'empty',
    'sec-fetch-mode':'cors',
    'sec-fetch-site':'cross-site'
}

Id = ['FOESP$$ALL', 'ETEXG$XMAD|ETEUR$$ALL', 'IXMSX$$ALL', 'E0WWE$$ALL']
cat = ['Fondos', 'ETF', 'INDICES', 'ACCIONES']

securityDataPoints = [
    'SecId|Name|PriceCurrency|TenforeId|LegalName|ClosePrice|Yield_M12|CategoryName|Medalist_RatingNumber|StarRatingM255SustainabilityRank|ReturnD1|ReturnW1|ReturnM1|ReturnM3|ReturnM6|ReturnM0|ReturnM12|ReturnM36|ReturnM60|ReturnM120|FeeLevel|ManagerTenure|MaxDeferredLoad|InitialPurchase|FundTNAV|EquityStyleBox|BondStyleBox|AverageMarketCapital|AverageCreditQualityCode|EffectiveDuration|MorningstarRiskM255|AlphaM36|BetaM36|R2M36|StandardDeviationM36|SharpeM36|RecordExtension',
    'SecId|Name|PriceCurrency|TenforeId|LegalName|ClosePrice|Yield_M12|OngoingCharge|CategoryName|Medalist_RatingNumber|StarRatingM255|SustainabilityRank|GBRReturnD1|GBRReturnW1|GBRReturnM1|GBRReturnM3|GBRReturnM6|GBRReturnM0|GBRReturnM12|GBRReturnM36|GBRReturnM60|GBRReturnM120|MaxFrontEndLoad|ManagerTenure|MaxDeferredLoad|InitialPurchase|FundTNAV|EquityStyleBox|BondStyleBox|AverageMarketCapital|AverageCreditQualityCode|EffectiveDuration|MorningstarRiskM255|AlphaM36|BetaM36|R2M36|StandardDeviationM36|SharpeM36|TrackRecordExtension',
    'SecId|Name|PriceCurrency|TenforeId|LegalName|HoldingTypeId|ClosePrice|GBRReturnD1|GBRReturnW1|GBRReturnM1|GBRReturnM3|GBRReturnM6|GBRReturnM0|GBRReturnM12|GBRReturnM36|GBRReturnM60|GBRReturnM120|TrackRecordExtension',
    'SecId|LegalName|Name|IndustryName|SectorName|TenforeId|Universe|ExchangeId|Ticker|ClosePrice|StarRatingM255|QuantitativeStarRating|MarketCap|DividendYield|PERatio|PEGRatio|MarketCountryName|EquityStyleBox|ReturnD1|ReturnW1|ReturnM1|ReturnM3|ReturnM6|ReturnM0|ReturnM12|ReturnM36|ReturnM60|ReturnM120|EBTMarginYear1|ROEYear1|ROICYear1|EPSGrowth3YYear1|RevenueGrowth3Y|DebtEquityRatio|NetMargin|ROATTM|ROETTM'
]

dfp = pd.DataFrame()

for i in range(4):
    time.sleep(2)
    params = {
        'page': '1',
        'pageSize': '10000',
        'sortOrder': 'LegalName asc',
        'outputType': 'json',
        'version': '1',
        'languageId': 'es-ES',
        'currencyId': 'EUR',
        'universeId': Id[i],
        'securityDataPoints': securityDataPoints[i],
        'term': '',
        'subUniverseId': '',
    }

    response = pd.DataFrame(requests.get('https://tools.morningstar.co.uk/api/rest.svc/klr5zyak8x/security/screener', params=params, headers=headers).json()['rows'])
    response['Cat'] = cat[i]
    dfp = pd.concat([dfp, response])

Then I realized that the the first method proportioned me a huge list of asset (photo 1)
Screenshot 2024-04-03 161749.png
The SecId (column 2) coincides with a part of the url: 

Therefore I codify this: 

#1 Libraries
pip install beautifulsoup4
pip install lxml
pip install html5lib
import requests
from bs4 import BeautifulSoup
import time
import json


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from contextlib import contextmanager
import sys, os
from functools import reduce
from datetime import datetime
import math
       
url = "https://www.morningstar.ch/ch/funds/snapshot/snapshot.aspx?id=F00000THSB&tab=3"        
       
response = requests.get(url)
soup = BeautifulSoup(response.content,"url. parser")

Nevertheless, the answer in the kernel was this: 
FeatureNotFound: Couldn't find a tree builder with the features you requested: url. parser. Do you need to install a parser library?

I know that there exists APIs but in the case of morningstar I need an account. Furthermore with BeautifulSoup4 I might extract the data with the HTML. 

In essence, the data is presented in this way: 
Screenshot 2024-04-03 162540.png

Thank you so much in advance. Estefania


Carlos

unread,
Apr 8, 2024, 7:25:01 AMApr 8
to beautifulsoup
Hello, the error FeatureNotFound in that case means that the parser you passed to the BeautifulSoup  object (second argument) is not supported. You specified "url.parser", but the supported ones are "html5lib", "lxml" and "html.parser". So just changing that line to

soup = BeautifulSoup(response.content, "lxml")

it should work.

Reply all
Reply to author
Forward
0 new messages