Cost and Performance Analysis with AWS Glue and Amazon Athena > Preparation > Building a database

Building a database

In the lab, once we have the input data with the above path as at the beginning of the lab, we will configure AWS Glue and Crawler so that it runs on a schedule *once a day *. Crawler will scan the path containing the input Parquet file, save it on S3 and then create a database with accompanying tables. When a new version of report is available, the data sheet is automatically updated.

Amazon Athena helps us access and view parquet file contents through SQL code. Amazon Athena is a serverless solution that supports executing SQL queries on large amounts of data. Athena is charged only for scanned data, unlike a traditional database solution.

The detailed configuration steps for Amazon Athena to access data files through AWS Glue are as follows:

Go to AWS Management Console
- Find AWS Glue
- Select AWS Glue

Prerequisite

In the AWS Glue interface
- Select Crawlers
- Select Create crawler

Prerequisite

Configure Crawler, enter Name as Cost_MasterCrawler. Then select Next

Prerequisite

Select Add a data source

Prerequisite

Configure data source

Prerequisite

Select S3 path

Prerequisite

Complete the data source configuration.

Prerequisite

After configuring the data source, select Next

Prerequisite

For security, select Create new IAM role

Prerequisite

Enter the role name and select Create

Prerequisite

After creating the role, select Next

Prerequisite

Implement more database

Select Add database

Prerequisite

Enter the database name as costmaster. Select Create database

Prerequisite

Complete database creation.

Prerequisite

Add database successfully and select Next

Prerequisite

Check and select Create crawler

Prerequisite

Complete crawler creation.

Prerequisite

Select Run crawler

Prerequisite

It takes about 1 minute to initialize the run crawler.

Prerequisite

Initialize run crawler

Prerequisite

Run crawler successfully.

Prerequisite

Check out the AWS Glue Table. We see a data table monthly_report

Prerequisite

View detailed data sheet information.

Prerequisite