Build Real-Time Analytics with Java 21, Apache Kafka, Flink, and AWS QuickSight

Vitali RVitali R
3 min read

Design

Real-time analytics is at the heart of many modern applications β€” from fraud detection to personalized dashboards. In this guide, we'll walk through how to build a real-time analytics pipeline using:

  • πŸ”§ Java 21

  • πŸ“¬ Apache Kafka

  • πŸ” Apache Flink

  • ☁️ AWS S3 & QuickSight


πŸ”— Architecture Overview

[Java App] β†’ [Apache Kafka] β†’ [Flink Job] β†’ [Amazon S3 or Redshift] β†’ [AWS QuickSight]

This pipeline ingests events, processes them in real time using Flink, stores results in S3, and visualizes aggregated data in QuickSight.


🧱 Key Components

βœ… Kafka β€” Event Ingestion

We use Kafka to collect and stream real-time events.

Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("key.serializer", StringSerializer.class.getName());
props.put("value.serializer", StringSerializer.class.getName());

KafkaProducer<String, String> producer = new KafkaProducer<>(props);
producer.send(new ProducerRecord<>("transactions", "txn123", "{\"transactionId\":\"txn123\",\"customerId\":\"cust001\",\"amount\":99.99,\"timestamp\":\"2025-06-09T13:00:00Z\"}"));

We'll consume transactions, group them by customer, and aggregate all amounts in a list.

🚚 1. Transaction POJO

public class Transaction {
    private String transactionId;
    private String customerId;
    private double amount;
    private Instant timestamp;
    // Getters and setters
}

πŸ“ˆ 2. AggregatedEvent POJO

public class AggregatedEvent {
    private String customerId;
    private List<Double> amounts;
    // Constructor, getters, setters
}

Map Kafka message (JSON) to Transaction

public class TransactionMapper implements MapFunction<String, Transaction> {
    private static final ObjectMapper mapper = new ObjectMapper();

    @Override
    public Transaction map(String value) throws Exception {
        return mapper.readValue(value, Transaction.class);
    }
}

Map Transaction to AggregatedEvent

public class TransactionToAggregatedEventMapper implements MapFunction<Transaction, AggregatedEvent> {
    @Override
    public AggregatedEvent map(Transaction txn) {
        return new AggregatedEvent(txn.getCustomerId(), new ArrayList<>(List.of(txn.getAmount())));
    }
}

Aggregate all transaction amounts

public class AggregatedTransactionReducer implements ReduceFunction<AggregatedEvent> {
    @Override
    public AggregatedEvent reduce(AggregatedEvent e1, AggregatedEvent e2) {
        List<Double> combined = new ArrayList<>(e1.getAmounts());
        combined.addAll(e2.getAmounts());
        return new AggregatedEvent(e1.getCustomerId(), combined);
    }
}

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

KafkaSource<String> source = KafkaSource.<String>builder()
    .setBootstrapServers("localhost:9092")
    .setTopics("transactions")
    .setGroupId("flink-analytics")
    .setValueOnlyDeserializer(new SimpleStringSchema())
    .build();

DataStream<Transaction> transactions = env
    .fromSource(source, WatermarkStrategy.noWatermarks(), "Kafka Source")
    .map(new TransactionMapper());

DataStream<AggregatedEvent> aggregated = transactions
    .map(new TransactionToAggregatedEventMapper())
    .keyBy(AggregatedEvent::getCustomerId)
    .window(TumblingEventTimeWindows.of(Time.minutes(1)))
    .reduce(new AggregatedTransactionReducer());

aggregated.print(); // Replace with S3 or Redshift sink
env.execute("Real-time Analytics Pipeline");

☁️ Sinking to AWS and Visualizing

Option A: Write to Amazon S3

  • Use Flink’s StreamingFileSink to store windowed aggregates.

  • Partition data by dt=YYYY-MM-DD/HH.

Option B: Load to Redshift

  • Write to S3, then trigger COPY commands into Redshift.

  • Define schema with Glue Crawler and connect from QuickSight.


πŸ“ˆ Build QuickSight Dashboards

  1. Connect to S3 (via Athena) or Redshift.

  2. Import your Glue table or Redshift schema.

  3. Create visualizations with customer-wise aggregations and time-based filters.


πŸ—œοΈ Class Diagram

+---------------------+
|     Transaction     |
+---------------------+
| transactionId       |
| customerId          |
| amount              |
| timestamp           |
+---------------------+

+--------------------------+
|    AggregatedEvent       |
+--------------------------+
| customerId               |
| List<Double> amounts     |
+--------------------------+

🧩 Component Diagram

+-------------+       +-------+       +----------+       +-----------+       +-------------+
| Java Client | ----> | Kafka | ----> | Flink Job| ----> |   S3/Redshift    | --> QuickSight |
+-------------+       +-------+       +----------+       +-----------+       +-------------+

🧠 Final Thoughts

This setup is:

  • Streaming-first: near real-time insights

  • Memory-efficient: thanks to Flink and Kafka

  • Scalable: integrate with full AWS analytics stack

You can take this further by adding:

  • Schema Registry (for Avro/Protobuf support)

  • Flink SQL or CEP patterns

  • Alerting via AWS SNS or EventBridge


Got questions or want to see this deployed in production? Drop them in the comments or reach out! πŸš€

0
Subscribe to my newsletter

Read articles from Vitali R directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Vitali R
Vitali R