AWS Athena Overview

Trushang SutharTrushang Suthar
3 min read

Key Features

  1. Serverless – No need to manage infrastructure; AWS handles provisioning, scaling, and maintenance.

  2. SQL-Based Queries – Uses standard SQL (Presto/Trino) to analyze data stored in Amazon S3.

  3. Pay-Per-Query – Charges are based on the amount of data scanned per query.

  4. Schema-on-Read – Unlike traditional databases, Athena does not require predefined schema; it infers the schema when querying data.

  5. Integration with AWS Services – Works well with AWS Glue for metadata cataloging, AWS QuickSight for visualization, and AWS Lake Formation for data governance.

  6. Secure and Compliant – Supports IAM-based access control, data encryption with AWS KMS, and integration with AWS Macie for security insights.

  7. Supports Multiple Data Formats – Can query structured and semi-structured data in formats like CSV, JSON, ORC, Avro, and Parquet.

  8. Federated Query Support – Can query data across multiple sources, including relational databases, NoSQL databases, and on-premises data stores.


How AWS Athena Works

  1. Data Storage:

    • Athena queries data stored in Amazon S3.

    • Data remains in S3, and Athena does not store data separately.

  2. Schema Definition:

    • Users define a schema using AWS Glue or manually in Athena’s console.

    • Schema-on-Read approach means you don’t need to load data before querying.

  3. Query Execution:

    • Users submit SQL queries via the AWS Console, API, or SDKs.

    • Athena uses a distributed query engine (Presto/Trino) to process the request.

    • The result is returned to the user and optionally stored in S3.

  4. Performance Optimization:

    • Use partitioning to reduce the amount of data scanned.

    • Convert data to columnar formats like Parquet or ORC to improve query performance.

    • Utilize AWS Glue Catalog for schema management.


Use Case Example

Scenario: Analyzing Web Logs Stored in Amazon S3

Context:
A company stores its web server logs in Amazon S3 and wants to analyze user behavior without setting up a traditional database.

Solution Using AWS Athena:

  1. Upload log files (e.g., JSON or CSV format) to Amazon S3.

  2. Create a table in Athena with a schema matching the log file structure.

  3. Run SQL queries like:

     SELECT user_ip, COUNT(*) AS visit_count 
     FROM web_logs 
     WHERE request_url = '/checkout' 
     GROUP BY user_ip 
     ORDER BY visit_count DESC 
     LIMIT 10;
    
  4. Results show the top 10 users accessing the checkout page, helping the business optimize conversions.

Benefits:
✅ No need to set up a database.
✅ Pay only for the queries run.
✅ Scale automatically for large datasets.


Topics for AWS DVA-C02

For the AWS Developer Associate exam (DVA-C02), focus on the following Athena-related topics:

  1. Integration with AWS S3

    • How Athena queries S3 data

    • Best practices for storing and structuring S3 data

  2. Query Performance Optimization

    • Using partitions and bucketing

    • Converting data to Parquet/ORC for efficiency

  3. Security & Access Control

    • Managing IAM policies for Athena

    • Encrypting query results with AWS KMS

  4. AWS Glue & Schema Management

    • Using AWS Glue Catalog to define schemas

    • Automating schema discovery with Glue Crawlers

  5. Federated Queries

    • Querying external databases using Athena Federated Query
  6. Monitoring & Logging

    • Using AWS CloudTrail to audit Athena queries

    • Tracking query execution history in AWS Console


0
Subscribe to my newsletter

Read articles from Trushang Suthar directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Trushang Suthar
Trushang Suthar

Code is like a puzzle—sometimes you just need to step back, take a breath, and the solution clicks.