So, I'm getting a ton of text out:
```python
from pathlib import Path
from bs4 import BeautifulSoup
import re
html_path = Path('Doc10000000.htm')
charset = 'windows-1252'
with html_path.open('r', encoding=charset, newline=None, errors='replace') as f:
data = ''.join([re.sub(r'\n$', '', line) for line in f])
soup = BeautifulSoup(data, 'lxml')
print(soup.text)
print(soup.select('img'))
```
Truncated, but you get the idea.
```text
MRTRS RXCVXCRTMXR RXRTCRTX RR RXMVCRMRRTR RR MRTRCMRSRY R RR VCRYTRRMÓR RR YRCMMRMXY Rzgzqxzz #: RC1876031_______ Txeex Rxxezxvx: 1 bx xvzdzz bx 2009 Txeex bx Rxqxqxexóg: 1 bx xvzdzz bx 2010 Rzgzqxzz bx ezgxxbxgexxxxbxb #: 6535402 RXMVCRRXC: Mgzxx Rzqqzqxzxzg (g zzbxd dad daqdxbxxqxxd g xxxxxxbxd bx Mgzxx, x xxd qax xg xz daexdxvz dx xxd bxgzqxgxqé xgbxdzxgzxqxgzx ezqz xx “Rzqqqxbzq” z “Mgzxx”). MRTRS RXCVXCRTMXR VXCRPRYR RDCRRMRRT – DXXRY RRR YRCMMRRY Rvqxxqxgz #: RC1876031_________ Yxvgxzaqx Rxzx: Ravadz 1dz 2009 Rxqxqxzxzg Rxzx: Ravadz 1dz 2010 RRRR # 6535402 MXRRC: Mgzxx Rzqqzqxzxzg (xgb xxx Mgzxx daqdxbxxqxxd xgb xxxxxxxzxd, exqxxgxxzxq "Magxq" zq "Mgzxx"). VCXMRRRXC: Mxxvzd YR.(xg xz daexdxvz xx "Vqzvxxbzq"). Rxqxeexóg: Mzqxgz 1145, 3xq Vxdz Maxgzd Rxqxd, Rqvxgzxgx. R1091RRC (xg xz daexdxvz xx "Vqzvxxbzq"). Rgxxzd qax dx xzagzxg x xgezqqzqxg xx qqxdxgzx (Mxqqax ezg agx “Y” xzd qax qxdaxzxg xqxxexqxxd): Téqqxgzd g ezgbxexzgxd bxx Rzgzqxzz bx Rzqqqxvxgzx bx Mxzxqxxxxd g bx qqxdzxexóg bx dxqvxexzd. R Rxdeqxqexóg bxx Vqzbaezz g Rxxxgbxqxz bx qxvzd. M Rdzégbxqxd bx Rxdxqqxñz / Cxqaxqxqxxgzz bx exxxbxb. R Rxdqzdxexzgxd ezqqxxqxgzxqxxd. R Rxqxezxvxd qxqx ag xqqxxgzx bx zqxqxzz xxqqx bx xxezezx g bx bqzvxd. R Sxexgexx bx adz bx dzxzwxqx. T Vqzzxeexóg bx xzd xezxvzd bx xgxzqqxexóg bxx Rzqqqxbzq. D
```
So, I'm not quite sure what the issue is. Is the content purposely garbled/encoded like this? I don't so much care what the data actually is, only whether it is supposed to be like what is shown.
Now, I didn't get images with `lxml`, but switching to `html.parser` or `html5lib`, I was able to get images, so `lxml` seems to have an issue with something in the file:
```python
[<img border="0width=328" height="183" id="Picture 1" src="Doc10000000_files/image001.png"/>, <img src="Doc10000000_files/image002.png" width="329height=1"/>, <img height='1src="Doc10000000_files/image002.png"' width="329"/>, <img border="0width=329" height="193" id="Picture 6" src="Doc10000000_files/image003.jpg"/>, <img height='1src="Doc10000000_files/image002.png"' width="329"/>, <img height='1src="Doc10000000_files/image002.png"' width="329"/>, <img height='1src="Doc10000000_files/image002.png"' width="329"/>, <img height='1src="Doc10000000_files/image002.png"' width="329"/>]
```