Issue #356: same code base for fetching prices and importing transactions? (blais/beancount)

28 views
Skip to first unread message

Johannes Harms

unread,
Dec 28, 2018, 8:49:28 AM12/28/18
to bean...@googlegroups.com
New issue 356: same code base for fetching prices and importing transactions?
https://bitbucket.org/blais/beancount/issues/356/same-code-base-for-fetching-prices-and

Johannes Harms:

Could / should we use the same code base for fetching prices and importing transactions? Using importers for importing downloaded prices could help reduce duplicate code. The only thing missing is to extend the importer base classes to support fetching (of not only prices, but also transactions).

@blais: I would be glad to hear your thoughts on this, including the previous design rationale to split price-fetching and transaction-importing into separate modules.
@seltzered: I am posting this as follow-up to #329 "Bean-price: support fetch over range of dates", because I did not want to take the other issue off-topic.

Motivation:
My personal observation is that importing prices is very similar to importing transactions.

* fetching: While prices are usually fetched automatically, I found this is not always possible. In a similar way, transactions could be fetched automatically, but that's not always possible, and not worth the effort, given that screen scraping implementations break easily.
* Importing (identifying, filing, extracting): These steps are nearly identical for prices and transactions.

Example: I wrote this importer for yahoo prices:
1. Fetching: As mentioned above, automatic fetching is not always easy. In this case, I found it easier to manually fetch prices by scraping the HTML table using artoo.js.
2. Importing: The code snippet below illustrates that it makes perfectly sense to import prices using import functionality.


```
#!python

"""
Imports prices from CSV that was scraped from yahoo finance
"""
# pylint: disable=C0411,C0330


from _pydecimal import DecimalException
import csv
import logging
from typing import Dict, Iterable, NamedTuple

from beancount.core.amount import Amount
from beancount.core.data import Price, new_metadata, sorted as sorted_entries
from beancount.core.number import D
from beancount.ingest.cache import _FileMemo
from beancount.ingest.importer import ImporterProtocol
from beancount.ingest.importers.csv import Col
from beancount.ingest.importers.mixins.identifier import IdentifyMixin
from beancount.utils.date_utils import parse_date_liberally

logger = logging.getLogger(__name__) # pylint: disable=C0103

Row = NamedTuple(
"Row", [("file_name", str), ("line_number", int), ("data", Dict)]
)


class PricesImporter(ImporterProtocol):
"""Imports prices from CSV"""

def __init__(self, **kwargs): # pylint: disable=R0913
"""
Initializes the importer.
"""
# gets required arguments:
self.columns = kwargs.pop("columns")
self.commodity = kwargs.pop("commodity")
self.currency = kwargs.pop("currency")

# gets optional arguments:
self.debug = kwargs.pop("debug", False)
self.csv_dialect = kwargs.get("csv_dialect", None)
self.dateutil_kwds = kwargs.get("dateutil_kwds", None)
super().__init__(**kwargs)

def extract(self, file: _FileMemo, existing_entries=None):
"""Extracts price entries from CSV file"""
rows = self.read_lines(file.name)
price_entries = sorted_entries(self.get_price_entries(rows))
return price_entries

def read_lines(self, file_name: str) -> Iterable[Row]:
"""Parses CSV lines into Row objects"""
with open(file_name) as file:
reader = csv.DictReader(file, dialect=self.csv_dialect)
for row in reader:
yield Row(file_name, reader.line_num, row)

def get_price_entries(self, lines: Iterable[Row]) -> Iterable[Price]:
"""Converts Row objects to beancount Price objects"""
for line in lines:
try:
self.validate_line(line)
meta = self.build_metadata(line.file_name, line.line_number)
date = self.parse_date(line.data[self.columns[Col.DATE]])
amount = self.parse_amount(line.data[self.columns[Col.AMOUNT]])
amount_with_currency = Amount(amount, self.currency)
yield Price( # pylint: disable=E1102
meta, date, self.commodity, amount_with_currency
)
except (ValueError, DecimalException, AssertionError) as exception:
logger.warning(
"Skipped CSV line due to %s exception at %s line %d: %s",
exception.__class__.__name__,
line.file_name,
line.line_number,
line.data,
)

def validate_line(self, row):
"""Validates CSV rows. If invalid, an AssertionError is thrown."""
data = row.data
assert data[self.columns[Col.AMOUNT]]

def build_metadata(self, file_name, line_number):
"""Constructs beancount metadata"""
line_number = str(line_number)
return new_metadata(
file_name,
line_number,
{"source_file": file_name, "source_line": line_number}
if self.debug
else None,
)

def parse_date(self, date_str):
"""Parses the date string"""
return parse_date_liberally(date_str, self.dateutil_kwds)

def parse_amount(self, amount_str): # pylint: disable=R0201
"""Parses an amount string to decimal"""
return D(amount_str)


class YahooFinancePricesImporter(IdentifyMixin, PricesImporter):
"""
Imports CSV scraped from finance.yahoo.com

Usage:

Scrape historical prices using artoo.js, for example from:
https://finance.yahoo.com/quote/EXS2.DE/history?p=EXS2.DE

artoo.scrapeTable('table[data-test="historical-prices"]', {
headers: 'th',
done: artoo.saveCsv
})

Then run this importer to convert the scraped csv file to beancount prices.
"""

def __init__(self, **kwargs):
kwargs.setdefault(
"columns", {Col.DATE: "Date", Col.AMOUNT: "Adj Close**"}
)
self.matchers = [
("content", r"Date,Open,High,Low,Close\*,Adj Close.*")
]
super().__init__(**kwargs)


class TecdaxImporter(YahooFinancePricesImporter):
"""
Imports CSV scraped from:
https://finance.yahoo.com/quote/EXS2.DE/history?p=EXS2.DE
"""

def __init__(self, **kwargs):
kwargs.setdefault("commodity", "TECDAX")
kwargs.setdefault("currency", "EUR")
super().__init__(**kwargs)
```

In my opinion, the above code illustrates that prices and transactions could use the same import process. I would therefore like to propose: Let's use importers for importing downloaded prices. And let's extend the importer base classes to support fetching of not only prices, but also transactions.


Reply all
Reply to author
Forward
0 new messages