Hello Everyone,
I'm looking for some guidance or recommendations or reading on designing APIs for large datasets that are often difficult to model. In this case I'm struggling to model some of these items around resources because we have many different classifications of data and datasets.
A Dataset may:
- be raw or cooked
- contain personally identifiable information (or not)
- contain sensitive personally identifiable information (or not)
- be verified or unverified - by verified we mean
- Data lineage
- Data provenance
- Data schema
- be published in multiple formats
For some of these datasets then we want to build a light weight API for product teams. Mostly we are providing aggregated views on entities OR we are aggregating several data sources to provide a new list of IDs.
Some examples are:
* monthly citation counts for a given author id
* monthly citation counts for a given article id or other work
* total reads of an article
Often the results are either a single number for a period of time OR a list of other ids that you can use to query elsewhere.
Any nice guidelines out there?
Thanks