Modeling APIs for aggregated datasets.

Joyce Stack

unread,

Jul 27, 2016, 9:40:53 AM7/27/16

to API Craft

Hello Everyone,

I'm looking for some guidance or recommendations or reading on designing APIs for large datasets that are often difficult to model. In this case I'm struggling to model some of these items around resources because we have many different classifications of data and datasets.

A Dataset may:

be raw or cooked
contain personally identifiable information (or not)
contain sensitive personally identifiable information (or not)
be verified or unverified - by verified we mean

Data lineage
Data provenance
Data schema

be published in multiple formats

For some of these datasets then we want to build a light weight API for product teams. Mostly we are providing aggregated views on entities OR we are aggregating several data sources to provide a new list of IDs.

Some examples are:

* monthly citation counts for a given author id

* monthly citation counts for a given article id or other work

* total reads of an article

Often the results are either a single number for a period of time OR a list of other ids that you can use to query elsewhere.

Any nice guidelines out there?

Thanks

Alex Bowen

unread,

Jul 27, 2016, 11:33:05 AM7/27/16

to API Craft

Hey Joyce, check out DreamFactory. They automatically generate REST APIs!

Where is your data coming from?

Thanks,

Alex

J S M

unread,

Aug 10, 2016, 11:20:23 PM8/10/16

to API Craft

Hi,

My personal opinion is to design your API's around your business model rather than the data model even though it may be very close to data model.

JSM

Chris Mullins

unread,

Aug 12, 2016, 1:09:32 PM8/12/16

to API Craft

Hi Joyce,

There is a big red flag in your post.

You talked about your internal product teams and the heterogeneous nature of your data, but not your customers or what they're looking for. This means you're going to ship your org chart via your API surface. With enough effort, this may work today, but as soon as you change your internal team structure this will no longer be the case, at which point servicing your APIs will be difficult.

Start at the other end - you're a customer. What data - and in what format - do they want? Rather than providing a Dataset with the kitchen sink thrown in, which will be hard to understand and consume, come up with more business focused API's that align with what the API Consumers want. This will likely NOT align with your org chart, but will bring huge clarification to the problem.

Cheers,

Chris

Joyce Stack

unread,

Aug 24, 2016, 5:22:23 AM8/24/16

to API Craft

Update on this:

We have a clear conceptual model of the entities involved.

We have multiple clients who want to 'cook' their own data based on their requirements eg spin up spark jobs to generate the data they would like so in essence building new datasets that might not fit a model.

We do not have provenance of that data at the moment which is key to understanding the data that is being cooked and presented.

The data model is somewhat up in the air.

We are starting our journey on linking all of our data in the enterprise and I think I've had a few light bulb moments. I'm hoping I can come back and share something in the near future on the progress of this.

Thanks for the input.

Reply all

Reply to author

Forward