Pandas x Cube.js: I need your help! 🐍

253 views
Skip to first unread message

Philippe Hebert

unread,
Feb 17, 2021, 12:07:00 AM2/17/21
to PyData
Hi PyData community!

I'm reaching out to the community today because I would like to have your help & your opinion!

Our team at Arthur Intelligence has been using Pandas x Cube.js for nearly a year now, and we have come to a point where we think there is a use-case to build a proper Python client for Cube.js.

What is Cube.js?

Cube.js is an open source analytical API platform dedicated to help data analysts and data scientists build business intelligence viz & tools faster and solve one of the two greatest problems of data science: cache invalidation!

In their own words:

"""
Why Cube.js?

If you are building your own business intelligence tool or customer-facing analytics most probably you'll face the following problems:

  1. Performance. Most of effort time in modern analytics software development is spent to provide adequate time to insight. In the world where every company data is a big data writing just SQL query to get insight isn't enough anymore.
  2. SQL code organization. Modelling even a dozen of metrics with a dozen of dimensions using pure SQL queries sooner or later becomes a maintenance nightmare which ends up in building modelling framework.
  3. Infrastructure. Key components every production-ready analytics solution requires: analytic SQL generation, query results caching and execution orchestration, data pre-aggregation, security, API for query results fetch, and visualization.
Cube.js has necessary infrastructure for every analytic application that heavily relies on its caching and pre-aggregation layer to provide several minutes raw data to insight delay and sub second API response times on a trillion of data points scale.
"""

If you are curious, please take a look:

Official website: https://cube.dev/

Why post on PyData/Pandas community?

Pandas is an amazing tool with amazing power, but not all scientists are well-versed in SQL. With Cube.js we can easily abstract SQL and make the interface between the data scientists and their data compatriots easier.
I also think the flow from Cube.js to Pandas is fantastic, as it allows me to filter and pre-aggregate my results before loading them in a DataFrame for the final touch ups like adding cross-column computed fields, computing statistics, pivot, use recordlinkage, etc.

Why don't I contribute to the repository myself?

Well that's exactly what I intend to do! 😁
This being said, I am only a fledging pythonista, and I need your help, your experience to help me design an API for the client that is pythonic to the core 👌💋

The Github issue for the client implementation is here:


I want to make Cube.js easier to integrate in a Python codebase, and make it so that every data analyst, scientist and engineer out there using Python can benefit from the goodness of Cube.js.

What do you think, is this something you'd like to contribute to?
I'd love to hear your opinion and see what you'll come up with!

Have a great day!
Cheers,
Philippe
@philippefutureboy
Reply all
Reply to author
Forward
0 new messages