In the lab, once we have the input data with the above path as at the beginning of the lab, we will configure AWS Glue and Crawler so that it runs on a schedule *once a day *. Crawler will scan the path containing the input Parquet file, save it on S3 and then create a database with accompanying tables. When a new version of report is available, the data sheet is automatically updated.
Amazon Athena helps us access and view parquet file contents through SQL code. Amazon Athena is a serverless solution that supports executing SQL queries on large amounts of data. Athena is charged only for scanned data, unlike a traditional database solution.
The detailed configuration steps for Amazon Athena to access data files through AWS Glue are as follows:
Go to AWS Management Console
In the AWS Glue interface
Cost_MasterCrawler
. Then select Nextcostmaster
. Select Create database