Build Real-Time Analytics with Java 21, Apache Kafka, Flink, and AWS QuickSight

Design
Real-time analytics is at the heart of many modern applications β from fraud detection to personalized dashboards. In this guide, we'll walk through how to build a real-time analytics pipeline using:
π§ Java 21
π¬ Apache Kafka
π Apache Flink
βοΈ AWS S3 & QuickSight
π Architecture Overview
[Java App] β [Apache Kafka] β [Flink Job] β [Amazon S3 or Redshift] β [AWS QuickSight]
This pipeline ingests events, processes them in real time using Flink, stores results in S3, and visualizes aggregated data in QuickSight.
π§± Key Components
β Kafka β Event Ingestion
We use Kafka to collect and stream real-time events.
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("key.serializer", StringSerializer.class.getName());
props.put("value.serializer", StringSerializer.class.getName());
KafkaProducer<String, String> producer = new KafkaProducer<>(props);
producer.send(new ProducerRecord<>("transactions", "txn123", "{\"transactionId\":\"txn123\",\"customerId\":\"cust001\",\"amount\":99.99,\"timestamp\":\"2025-06-09T13:00:00Z\"}"));
π‘ Flink Streaming Job (Java 21)
We'll consume transactions, group them by customer, and aggregate all amounts in a list.
π 1. Transaction POJO
public class Transaction {
private String transactionId;
private String customerId;
private double amount;
private Instant timestamp;
// Getters and setters
}
π 2. AggregatedEvent POJO
public class AggregatedEvent {
private String customerId;
private List<Double> amounts;
// Constructor, getters, setters
}
π§Ή 3. Flink Mapper and Reducer
Map Kafka message (JSON) to Transaction
public class TransactionMapper implements MapFunction<String, Transaction> {
private static final ObjectMapper mapper = new ObjectMapper();
@Override
public Transaction map(String value) throws Exception {
return mapper.readValue(value, Transaction.class);
}
}
Map Transaction
to AggregatedEvent
public class TransactionToAggregatedEventMapper implements MapFunction<Transaction, AggregatedEvent> {
@Override
public AggregatedEvent map(Transaction txn) {
return new AggregatedEvent(txn.getCustomerId(), new ArrayList<>(List.of(txn.getAmount())));
}
}
Aggregate all transaction amounts
public class AggregatedTransactionReducer implements ReduceFunction<AggregatedEvent> {
@Override
public AggregatedEvent reduce(AggregatedEvent e1, AggregatedEvent e2) {
List<Double> combined = new ArrayList<>(e1.getAmounts());
combined.addAll(e2.getAmounts());
return new AggregatedEvent(e1.getCustomerId(), combined);
}
}
π§ͺ Full Flink Job Pipeline
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
KafkaSource<String> source = KafkaSource.<String>builder()
.setBootstrapServers("localhost:9092")
.setTopics("transactions")
.setGroupId("flink-analytics")
.setValueOnlyDeserializer(new SimpleStringSchema())
.build();
DataStream<Transaction> transactions = env
.fromSource(source, WatermarkStrategy.noWatermarks(), "Kafka Source")
.map(new TransactionMapper());
DataStream<AggregatedEvent> aggregated = transactions
.map(new TransactionToAggregatedEventMapper())
.keyBy(AggregatedEvent::getCustomerId)
.window(TumblingEventTimeWindows.of(Time.minutes(1)))
.reduce(new AggregatedTransactionReducer());
aggregated.print(); // Replace with S3 or Redshift sink
env.execute("Real-time Analytics Pipeline");
βοΈ Sinking to AWS and Visualizing
Option A: Write to Amazon S3
Use Flinkβs
StreamingFileSink
to store windowed aggregates.Partition data by
dt=YYYY-MM-DD/HH
.
Option B: Load to Redshift
Write to S3, then trigger
COPY
commands into Redshift.Define schema with Glue Crawler and connect from QuickSight.
π Build QuickSight Dashboards
Connect to S3 (via Athena) or Redshift.
Import your Glue table or Redshift schema.
Create visualizations with customer-wise aggregations and time-based filters.
ποΈ Class Diagram
+---------------------+
| Transaction |
+---------------------+
| transactionId |
| customerId |
| amount |
| timestamp |
+---------------------+
+--------------------------+
| AggregatedEvent |
+--------------------------+
| customerId |
| List<Double> amounts |
+--------------------------+
π§© Component Diagram
+-------------+ +-------+ +----------+ +-----------+ +-------------+
| Java Client | ----> | Kafka | ----> | Flink Job| ----> | S3/Redshift | --> QuickSight |
+-------------+ +-------+ +----------+ +-----------+ +-------------+
π§ Final Thoughts
This setup is:
Streaming-first: near real-time insights
Memory-efficient: thanks to Flink and Kafka
Scalable: integrate with full AWS analytics stack
You can take this further by adding:
Schema Registry (for Avro/Protobuf support)
Flink SQL or CEP patterns
Alerting via AWS SNS or EventBridge
Got questions or want to see this deployed in production? Drop them in the comments or reach out! π
Subscribe to my newsletter
Read articles from Vitali R directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by