Athena Workgroup Cloudformation

0 views

Skip to first unread message

Angelique Syria

unread,

Aug 5, 2024, 3:04:51 PM8/5/24

to cofftrabhearta

Cansomeone help me write a Cloud formation script to update output location of Athena primary workgroup. When i run below code, getting error message "Invalid request provided: primary workGroup could not be created (Service: Athena, Status Code: 400, Request ID: 9945209c-6999-4e8b-bd3d-a3af13b4ac4f)"

you need to "create a change set to import an existing resource into a new or existing stack.", because the 'primary' workgroup was created outside your stack.So you need to use the option: "Create stack with existing resources":

Why not name the workgroup in your cloudformation stack something other than "primary", and let the stack manage the workgroup resource completely? Then, depending on how the rest of the system is set up, it may work out to use this new workgroup instead of the primary one. Even if you do get it working to alter the primary workgroup output location in CF, I think it's clear that's going upstream against CF's most natural usage patterns.

In a previous blog post (Building QuickSight Datasets with CDK - S3) we had a look at how files in S3 could be loaded into a QuickSight dataset. In practice data in S3 is often accessed using Athena. In this new blog post we will see how to build a QuickSight Dataset with CDK directly making use of Athena.

The only parameter specific to Athena data sources is the workgroup.We need to make sure the selected workgroup stores queries results in a location accessible to the QuickSight service role. We therefore define the following Athena workgroup to the data within our bucket at the location athena-prefix/.

We now have a workgroup and can define our data source. When creating the data source, QuickSight will check access by creating and reading a file in the workgroup output folder. We add the managed policy as dependency of the data source to make sure this happens after permissions have been granted.

In this post we briefly prepared an Athena table and learned how to build a Quicksight dataset with CDK using Athena as data source.We experimented with both table loading (RelationalTable) and custom SQL.

As with QuickSight Datasets built with S3 as data source, this may seem like much work when compared to directly using the QuickSight web console. However, keep in mind that the goal is to be able to automate it.

AWS BI solution Amazon QuickSight provides a neat and powerful web console to handle most use cases. Nevertheless, as soon as a need for automation appears, relying on IaC can help increase productivity. - by Franck Awounang Nekdem

AWS BI Solution Amazon QuickSight is a powerful tool to build interactive analyses or dashboards. In this blog post we will see how to get started with designing such analyses. - by Franck Awounang Nekdem

Almost 2 years ago, I started experimenting with QuickSight to solve some of the BI issues of the company I was working for. I appreciated QuickSight's first-class integration with many AWS data services and low cost in comparison to other similar tools. It afforded us the ability to rapidly prototype analyses and dashboards. One glaring missing feature that left us scratching our heads was the lack of DynamoDB has a data source option.

I asked the StackOverflow hive-mind how to Visualize DynamoDB in AWS Quicksight on StackOverflow and it has become the most upvoted QuickSight question on the platform because there is a demand for this feature and there was no direct answer... until recently.

Most of the work-around solutions proposed involved exporting (i.e. duplicating) your DynamoDB data to a different repository such as S3 or RDS that could then be added as a data source in QuickSight. We ended up creating scheduled Glue jobs that would move DynamoDB data to S3. The S3 data was then crawled via AWS Glue Crawlers and exposed as AWS Athena tables which were then added as Quick Sight data sets. This worked but was more custom infrastructure than was desirable and also didn't allow for real time direct queries.

In late 2019, AWS announced you could query any datasource with Amazon Athena's new federated query feature.. This was cool and they even showed in diagrams the concept of querying DynamoDB but this required us to develop and maintain our own implementation of this connector and also this feature was only in Preview and AWS QuickSight integration was not updated to allow the usage of this new Athena feature.

In March 2020, I saw that AWS Athena announced a prebuilt Data Connector for DynamoDB. This was exciting as I was able to quickly setup a data connector in Athena that could actually view and query DynamoDB data without any custom code but this feature was still in Preview and there was no support in QuickSight yet.

Fast forward, it's a year later and Athena Data Connectors have been become GA with the Athena Engine Version 2 in a handful of regions and the integration features in QuickSight that allow you to select the required Athena work groups and data sources are present.

Using the Athena Data Connectors as part of Athena Engine Version 2, I was able to finally visualize DynamoDB data in QuickSight without creating any bespoke resources or duplicating the data to another data source.

Now that our AthenaDynamoDBConnector function is deployed, click the refresh icon next to the "Choose Lambda function" dropdown list and you should now see the newly deployment lambda function. Select your function, give catalog a name, and click the "Connect" button.

Now that we have a DynamoDB data set (via Athena and the DynamoDB data connectors) created, we can finally visual DynamoDB data via analyses and dashboards.

I am just using a tiny sample DynamoDB dataset for this example so it's not the most interesting visualization but hopefully you get the idea!

It's been a long time coming but glad it's finally possible to get DynamoDB data into QuickSight without custom resources and duplicate data. This is just one example of the many data sources that can now be more easily be added via the Athena Engine Version 2 data connectors and the new ability to choose Athena workgroups and Athena catalogs in QuickSight.

I'm sure this article will eventually be obsolete sooner rather than later as technology changes and Amazon continues to release new features but it was a good exercise to explore some of these new features.

When the Connector (Lambda function) is deployed, it will scan your DynamoDB table to work out the table schema, but I'm pretty sure it only takes the first 3 items or so. I found my Athena tables were missing a lot of attributes for this reason. There is a workaround, go into DynamoDB console and create a new item, adding every attribute possible to your item. This item should appear first in your next scan. Head into Athena and deploy the connector, it should identify all attributes from your dummy item.

I haven't looked into this specific issue, but often anything that isn't covered by a cloudformation resource can be targeted with a lambda through a custom resource - it's a bit more work and there are a few interesting behaviours, but if you want everything in code it's the only way to go I think.

The recommended way to attach the AWS Lambda Policy is to navigate to the Quicksight management consul and change permissions in "Security & permissions." From here, "Add or remove" Quicksight access to AWS services. For S3, navigate to details and select buckets to write to (spill bucket). At this time, you should also be prompted to grant Lambda access as well.

Setting up Athena as your datasource requires you set up the proper permissions within AWS for Growthbook to access and then provide the correct credentials to Growthbook make use of those permissions. There are also optional connection data that will help Growthbook create the correct default sql to analyze your data.

The managed AWSQuicksightAthenaAccess is a good starting point. You will also need to give it permission to read from the s3 tables that hold your event data, by taking a modified version of AmazonS3ReadOnlyAccess policy whose resources are confined to only those tables that hold your event data. For example with the following policy after replacing the :

You must then create an access key by clicking on Security Credentials then "Create Access Key". You can then choose "Third-party service". It will warn you that this is not best practice, but unfortunately this is the only way to give Growthbook access at the moment. We are working on other ways to connect in the future. You can then confirm and click next. You can add a tag if you like and then press "Create access key". You should see then see following screen:

In another browser tab open up Growthbook Data Sources tab and choose your event tracker. Select Athena as your data source type. You can then copy the Access Key and Secret Access Key from the AWS browser tab to their corresponding fields.

If you are self hosting then in addition to the method above you can also pass the credentials in via environmental variables or part of the instance metadata. You can select which method you want in the Authentication Method field.

S3 Results URL - This is the s3 URL where the results to Athena queries get saved. When setting up Athena for the S3 results url, we recommend naming your bucket with the prefix aws-athena-query-results- as the AWSQuicksightAthenaAccess gives permission to write to any bucket with this prefix. If Growthbook warns you that it can not write to an s3 location other than the one you select here it is most likely because you have set the workgroup to override client side settings. If that is the case you would either need to change that setting or add the permissions for growthbook to also write to the s3 results url saved there.

Recently Amazon Athena introduced Federated Query which can be used to run SQL queries across data stored in relational, non-relational, object, and custom data sources. Athena uses data source connectors that run on AWS Lambda to execute federated queries. The data source connectors help connect with data sources like CloudWatch, DocumentDB, DynamoDB, HBase, JDBC data sources (like Redshift, MySql, SQL Server)etc.