Thanks for the update!
One thing I disagree with is your assumption that the only way to allow languages (or other similar fields) to be queried precisely is to store that data in a separate table. That is simply not true. All you need to do is store those strings in an array, and Postgres provides a native array type, as well as various efficient ways to query data in an array column.
In the Ohana API database, languages, funding_sources, accreditations, and the other “multiple” type of fields that are just a list of strings, are all stored in a column that is either of the native Posgtres array type, or in a text column that is serialized by Rails. It was a design decision not to create separate tables for those fields because it doesn’t make sense to me to have a table that only has one column.
Ohana comes out of the box with the ability to filter locations by language and other array fields, and it does so without any issues. The ability to query on a specific language does not and should not depend on languages being stored in a separate table, whether it’s Ohana or any other app that uses Postgres.
In Ohana API, languages are exposed as an array in the JSON output, not as one long string of text. If languages are entered as comma-separated strings in a spreadsheet, that doesn’t mean that they will be retrieved as such because thought was put into Ohana to make sure those strings were stored properly.
In my opinion, the spec’s priority should be to make it as easy as possible for data to be entered in a spreadsheet, since many people will be entering data manually in a spreadsheet. How the data is then processed and entered into a DB is up to the entity doing that import. I don’t think the spec should concern itself with how the data is going to be saved in a DB and queried.
The user experience of being able to enter multiple attributes, such as languages, in one field is exponentially greater than having to enter that data in a separate spreadsheet, which will end up containing just a bunch of repeated languages and foreign key ids. Here are the step-by-step differences between the two methods:
Method 1: All in one field
1. Open the locations.csv spreadsheet
2. Under the “languages” column, enter the languages as so: English, French, Spanish
Method 2: Separate spreadsheet
1. Open the languages.csv spreadsheet
2. Open the locations.csv spreadsheet to make sure you are entering the language for the right location, or create a macro that will allow you to choose the location from a dropdown
3. Create a new row
4. Enter the location_id
5. Enter “English”
6. Create a new row
7. Enter the same location_id as in step 4
5. Enter “French”
6. Create a new row
7. Enter the same location_id as in step 4
8. Enter “Spanish"
My 2 cents,
Moncef