Hi Lisa,
Thank you for putting time into scraping the data for Mission 609.
You’re definitely correct in your assessment of the data source. Regarding the groupings / sub-groupings the way that we usually do it is to have a field in each row which indicates the entity “type”. I would then use the bottom level groupings (E.g “Commercial Banks”) for the “type” field.
Regarding headers it’s important to distinguish between primary data, data as close to the original page as possible, and transformed data, which is standardised and can be used more readily. In primary data it’s best to capture data (and headers) which are as close to the source as possible; if no headers are present then anything that makes sense can be used, as you have done. Transformer data is more strict with headers, you can find the headers for the simple licence schema here: http://turbot.opencorporates.com/docs/supported_data_types - as an example the “type” header for types of entity would be “jurisdiction_classification” in simple licence output.
There is an example of adding a licence transformer here: http://turbot.opencorporates.com/docs/examples#structured-bots - this allows us to output transformed data as well as primary data.
Hope that answers your questions, do feel free to be in touch.
Best,
Peter