(base) MacBook-Air:beandata jonathan$ pip show smart_importer
Name: smart-importer
Version: 0.3
Summary: Augment Beancount importers with machine learning functionality.
Home-page: https://github.com/beancount/smart_importer
Author: Johannes Harms
Author-email: UNKNOWN
License: MIT
Location: /Users/jonathan/opt/miniconda3/lib/python3.8/site-packages
Requires: scikit-learn, beancount, numpy, scipy
4. I created a new config file I called Jonathan_smart.import
base) MacBook-Air:beandata jonathan$ more jonathan_smart.import
#!/usr/bin/env python3
"""Import configuration."""
import sys
from os import path
sys.path.insert(0, path.join(path.dirname(__file__)))
from beancount_reds_importers import vanguard
from myimporters.bfsfcu import bfsfcu_bank
from myimporters.anz import anz_bank
from fund_info import *
from smart_importer import apply_hooks, PredictPayees, PredictPostings
myBank_smart_importer =my_bank.Importer({
    'main_account'  : 'Assets:US:Banks:Checking:myBank',
    'account_number' : ''xxx'',
    'transfer'    : 'Assets:US:Zero-Sum-Accounts:Transfers:Bank-Account',
    'income'     : 'Income:US:Interest:myBank',
    'fees'      : 'Expenses:US:Bank-Fees:myBank',
    'rounding_error' : 'Equity:US:Rounding-Errors:Imports',
  })
apply_hooks(myBank_smart_importer, [PredictPayees(), PredictPostings()])
CONFIG = [myBank_smart_importer, ...(other importers)]
5. I was following the README documentation that said write bean-extract -f to invoke it on existing data. So I tried the following. Is this right?
bean-extract jonathan_smart.import ~/staging/new_bank_data.qfx -f journal/myledger.beancount > ~/staging/dud.txt
Cannot train the machine learning model because the training data is empty.
Cannot train the machine learning model because the training data is empty.
The output is just like the normal output without all the smart_importer stuff. Â Seems I'm doing something wrong as the staging/dud.txt doesn't have any predictions.Â
Appreciate any assistance on this!Â
thanks,
Jonathan
Hi,
I think your setup looks good, the smart importer hook is in there as otherwise you would not get the errors about not able to train.
I think the issue is on your call
bean-extract jonathan_smart.import ~/staging/new_bank_data.qfx -f journal/myledger.beancount > ~/staging/dud.txt
My guess is that the -f argument needs to come before you specify
the importconfig and the location, so
bean-extract -f journal/myledger.beancount jonathan_smart.import ~/staging/new_bank_data.qfx > ~/staging/dud.txt
Regards,
Patrick
--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/820ef641-8178-47d1-9e97-afbc709e6a83n%40googlegroups.com.
bean-extract -f journal/myledger.beancount jonathan_smart.import ~/staging/62090_818496_1013051ofxdl.qfx > ~/staging/dud.txt
I get these messages:Cannot train the machine learning model because the training data is empty.
Cannot train the machine learning model because the training data is empty.
bean-extract -e journal/accounts.beancount jonathan_smart.import ~/staging/mydata.qfx > ~/staging/dud.txt
gives 2 printouts of
Cannot train the machine learning model because the training data is empty.
Cannot train the machine learning model because the training data is empty.
I just remembered something. The issue could be that the importer you're trying to use does not have the new interface and instead still uses the old (legacy) interface.
the new one looks like this
def extract(self, file, existing_entries):
the old one looks like this
def extract(self, file):
Smart importer uses the existing_entries for training its model.
Regards,
Patrick
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/fe28577c-8220-49cd-b976-40ef9f0b6a91n%40googlegroups.com.
Hm, actually that looks ok, it has the existing_entries on the interface. But to be honest I'm not super familiar with how the apply hook is hooking this in, so there might be an issue.
Maybe someone more familiar with this can respond on that.
Otherwise if you could install smart_importer from git and then
maybe add a bit more debug output in
hooks.py and predictor.py to make sure that the existing entries arrive, this would give a better idea how to progress.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/6248ca60-16fa-4ad0-88b5-1c4bb91f9feen%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/2b800e6d-fb0c-4b78-bde3-477eee6f9e7en%40googlegroups.com.
Ready with 1344 directives (2266 postings in 1133 transactions).
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/f1e3ce25-e842-45b4-bb28-4f3737a3cb9en%40googlegroups.com.
Hi Jonathan,
Let's try to figure this out. In smart importer can you printout the following stuff
in smart_importer/predictor.py
in __call__ around line 64
print(self.account)
print(existing_entries)
in load_training_data around line 91
print(training_data)
and around line 95
print(training_data)
That should give an idea where the information is "lost".
Depending on where the information is lost, you can then dig a bit
deeper into what is happening.
Regards,
Patrick
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/CANUAcYdz12pG%2BPyxiBdn5-L14TtSztkJ8A%2BQ8Fwfd753vN0-tg%40mail.gmail.com.
------CHECKPOINT1-------
1353
1133
0
------CHECKPOINT2-------
[]
---__call__----
Assets:US:Banks:Checking:myBank
------CHECKPOINT1-------
1353
1133
0
------CHECKPOINT2-------
[]
---__call__----
Assets:US:Banks:Checking:myBank
Here is the code I added to predictory.py:
#beg                                                 Â
    print('---__call__----')
    print(self.account)
    #print(existing_entries)                                   Â
#end                                                 Â
    with self.lock:
      self.define_pipeline()
      self.train_pipeline()
      return self.process_entries(imported_entries)
  def load_open_accounts(self, existing_entries):
    """Return map of accounts which have been opened but not closed."""
    account_map = {}
    if not existing_entries:
      return
    for entry in beancount_sorted(existing_entries):
      # pylint: disable=isinstance-second-argument-not-valid-type                Â
      if isinstance(entry, Open):
        account_map[entry.account] = entry
      elif isinstance(entry, Close):
        account_map.pop(entry.account)
    self.open_accounts = account_map
  def load_training_data(self, existing_entries):
    """Load training data, i.e., a list of Beancount entries."""
training_data = existing_entries or []
    self.load_open_accounts(existing_entries)
#beg1 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
    print('------CHECKPOINT1-------')
    print(len(training_data))
#end1 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
    training_data = list(filter_txns(training_data))
    print(len(training_data))
length_all = len(training_data)
    training_data = [
      txn for txn in training_data if self.training_data_filter(txn)
]
    print(len(training_data))
#beg2 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
    print('------CHECKPOINT2-------')
    print(training_data)
#beg2Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
--------
I'm trying to check now that every account in the config file is present in my beancount file. I noticed one missing and that changed what was in the training_data but still getting the warning about training data being empty. I'll keep digging as best I can but definitely can use any additional help.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/858c5ceb-7507-5f9c-793a-4dd5a4bd44e2%40ch.tario.org.
So if I see this correctly, after the filtering of the training data, there is never any data left.
The logic looks like this
   def training_data_filter(self, txn):
       """Filter function for the training data."""
       found_import_account = False
       for pos in txn.postings:
           if pos.account not in self.open_accounts:
               return False
           if self.account == pos.account:
               found_import_account = True
       return found_import_account or not self.account
And from the printout you have something in self.account. So if I see this correctly, either none of your training data is matching the account or the account is actually no longer open.
Maybe worth printing out the self.open_accounts and maybe even debugging/logging some stuff in that training_data_filter code
Regards,
Patrick
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/CANUAcYdNeEw9UjFsZzq3RmcusEVkjZS_XzS1h1PPA2JUPp9Sjw%40mail.gmail.com.
You received this message because you are subscribed to a topic in the Google Groups "Beancount" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/beancount/rjrbf6Y39ew/unsubscribe.
To unsubscribe from this group and all its topics, send an email to beancount+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/3ff79e07-83d4-3895-452f-42b287bc2ca4%40ch.tario.org.
Probably the easiest examples are for the data driven tests you can find here
https://github.com/beancount/smart_importer/tree/master/tests/data
The simples of them probably
https://github.com/beancount/smart_importer/blob/master/tests/data/multiaccounts.beancount
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/CAB5fSso7Z6JX95KJYAKrfABOkrzx2zjXUCO-pz4LLFVkxFm-Yw%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/7e4eded9-dc61-1bf3-4d35-e0ea57cce446%40ch.tario.org.