If you’re storing large amounts of data in an Amazon S3 bucket, managing and finding specific files can become challenging. Amazon S3 offers a powerful feature to automatically capture and query metadata using Iceberg tables. In this article, we will see how you can set this up step by step for a bucket named comp-manufacturing-us
.
Metadata helps you store information about your files, like size, creation date, tags, and more. Querying metadata allows you to quickly find files or analyze their properties without manually searching through thousands of objects.
Before starting, make sure:
comp-manufacturing-us
”.If you’re storing large amounts of data in an Amazon S3 bucket, managing and finding specific files can become challenging. Amazon S3 now offers a powerful feature to automatically capture and query metadata using Iceberg tables. In this guide, let’s see how you can set this up step by step for a bucket named comp-manufacturing-us
.
Metadata helps you store information about your files, like size, creation date, tags, and more. Querying metadata allows you to quickly find files or analyze their properties without manually searching through thousands of objects.
Before starting, make sure:
The table bucket stores the metadata tables in Iceberg format.
aws s3 create-table-bucket --name comp-manufacturing-tables --region us-east-1
2. Note the ARN (Amazon Resource Name) for this bucket. Use the command below to check:
aws s3api get-bucket-location --bucket comp-manufacturing-tables
Now we will link the metadata table bucket (comp-manufacturing-tables
) to your main data bucket (comp-manufacturing-us
).
metadata-config.json
:{ "S3TablesDestination": { "TableBucketArn": "arn:aws:s3:::comp-manufacturing-tables", "TableName": "comp_manufacturing_metadata" } }
2. Run the following command to apply this configuration:
aws s3api create-bucket-metadata-table-configuration \ --bucket comp-manufacturing-us \ --metadata-table-configuration file://metadata-config.json \ --region us-east-1
3. Confirm the setup by checking the configuration:
aws s3api get-bucket-metadata-table-configuration \ --bucket comp-manufacturing-us \ --region us-east-1
Now, add or modify files in your comp-manufacturing-us
bucket. The metadata for these objects will be automatically captured.
For example, upload a file:
aws s3 cp manufacturing_report.csv s3://comp-manufacturing-us/reports/manufacturing_report.csv
Amazon Athena allows you to query the metadata table to find information about the files in your bucket.
2. Create a Metadata Table Schema in Athena:
Run this SQL query in Athena to define your metadata table:
CREATE EXTERNAL TABLE comp_manufacturing_metadata ( key STRING, size BIGINT, last_modified_date TIMESTAMP, storage_class STRING, encryption_status STRING ) STORED AS ICEBERG LOCATION 's3://comp-manufacturing-tables/comp_manufacturing_metadata/';
3.Run Queries to Explore Your Metadata:
SELECT key, size FROM comp_manufacturing_metadata ORDER BY size DESC LIMIT 10;
By setting up queryable metadata in S3 with Iceberg tables, you can make managing and analyzing your files much easier. Whether you’re dealing with manufacturing data or any other large datasets, this solution helps you find and understand your data quickly.