Building a database

In the lab, once we have the input data with the above path as at the beginning of the lab, we will configure AWS Glue and Crawler so that it runs on a schedule *once a day *. Crawler will scan the path containing the input Parquet file, save it on S3 and then create a database with accompanying tables. When a new version of report is available, the data sheet is automatically updated.

Amazon Athena helps us access and view parquet file contents through SQL code. Amazon Athena is a serverless solution that supports executing SQL queries on large amounts of data. Athena is charged only for scanned data, unlike a traditional database solution.

The detailed configuration steps for Amazon Athena to access data files through AWS Glue are as follows:

  1. Go to AWS Management Console

    • Find AWS Glue
    • Select AWS Glue

Prerequisite

  1. In the AWS Glue interface

    • Select Crawlers
    • Select Create crawler

Prerequisite

  1. Configure Crawler, enter Name as Cost_MasterCrawler. Then select Next

Prerequisite

  1. Select Add a data source

Prerequisite

  1. Configure data source

Prerequisite

  1. Select S3 path

Prerequisite

  1. Complete the data source configuration.

Prerequisite

  1. After configuring the data source, select Next

Prerequisite

  1. For security, select Create new IAM role

Prerequisite

  1. Enter the role name and select Create

Prerequisite

  1. After creating the role, select Next

Prerequisite

  1. Implement more database
  • Select Add database

Prerequisite

  1. Enter the database name as costmaster. Select Create database

Prerequisite

  1. Complete database creation.

Prerequisite

  1. Add database successfully and select Next

Prerequisite

  1. Check and select Create crawler

Prerequisite

  1. Complete crawler creation.

Prerequisite

  1. Select Run crawler

Prerequisite

  1. It takes about 1 minute to initialize the run crawler.

Prerequisite

  1. Initialize run crawler

Prerequisite

  1. Run crawler successfully.

Prerequisite

  1. Check out the AWS Glue Table. We see a data table monthly_report

Prerequisite

  1. View detailed data sheet information.

Prerequisite