Design

Real-time analytics is at the heart of many modern applications — from fraud detection to personalized dashboards. In this guide, we'll walk through how to build a real-time analytics pipeline using:

🔧 Java 21
📬 Apache Kafka
🔁 Apache Flink
☁️ AWS S3 & QuickSight

🔗 Architecture Overview

[Java App] → [Apache Kafka] → [Flink Job] → [Amazon S3 or Redshift] → [AWS QuickSight]

This pipeline ingests events, processes them in real time using Flink, stores results in S3, and visualizes aggregated data in QuickSight.

🧱 Key Components

✅ Kafka — Event Ingestion

We use Kafka to collect and stream real-time events.

Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("key.serializer", StringSerializer.class.getName());
props.put("value.serializer", StringSerializer.class.getName());

KafkaProducer<String, String> producer = new KafkaProducer<>(props);
producer.send(new ProducerRecord<>("transactions", "txn123", "{\"transactionId\":\"txn123\",\"customerId\":\"cust001\",\"amount\":99.99,\"timestamp\":\"2025-06-09T13:00:00Z\"}"));

💡 Flink Streaming Job (Java 21)

We'll consume transactions, group them by customer, and aggregate all amounts in a list.

🚚 1. Transaction POJO

public class Transaction {
    private String transactionId;
    private String customerId;
    private double amount;
    private Instant timestamp;
    // Getters and setters
}

📈 2. AggregatedEvent POJO

public class AggregatedEvent {
    private String customerId;
    private List<Double> amounts;
    // Constructor, getters, setters
}

🧹 3. Flink Mapper and Reducer

Map Kafka message (JSON) to `Transaction`

public class TransactionMapper implements MapFunction<String, Transaction> {
    private static final ObjectMapper mapper = new ObjectMapper();

    @Override
    public Transaction map(String value) throws Exception {
        return mapper.readValue(value, Transaction.class);
    }
}

Map `Transaction` to `AggregatedEvent`

public class TransactionToAggregatedEventMapper implements MapFunction<Transaction, AggregatedEvent> {
    @Override
    public AggregatedEvent map(Transaction txn) {
        return new AggregatedEvent(txn.getCustomerId(), new ArrayList<>(List.of(txn.getAmount())));
    }
}

Aggregate all transaction amounts

public class AggregatedTransactionReducer implements ReduceFunction<AggregatedEvent> {
    @Override
    public AggregatedEvent reduce(AggregatedEvent e1, AggregatedEvent e2) {
        List<Double> combined = new ArrayList<>(e1.getAmounts());
        combined.addAll(e2.getAmounts());
        return new AggregatedEvent(e1.getCustomerId(), combined);
    }
}

🧪 Full Flink Job Pipeline

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

KafkaSource<String> source = KafkaSource.<String>builder()
    .setBootstrapServers("localhost:9092")
    .setTopics("transactions")
    .setGroupId("flink-analytics")
    .setValueOnlyDeserializer(new SimpleStringSchema())
    .build();

DataStream<Transaction> transactions = env
    .fromSource(source, WatermarkStrategy.noWatermarks(), "Kafka Source")
    .map(new TransactionMapper());

DataStream<AggregatedEvent> aggregated = transactions
    .map(new TransactionToAggregatedEventMapper())
    .keyBy(AggregatedEvent::getCustomerId)
    .window(TumblingEventTimeWindows.of(Time.minutes(1)))
    .reduce(new AggregatedTransactionReducer());

aggregated.print(); // Replace with S3 or Redshift sink
env.execute("Real-time Analytics Pipeline");

☁️ Sinking to AWS and Visualizing

Option A: Write to Amazon S3

Use Flink’s StreamingFileSink to store windowed aggregates.
Partition data by dt=YYYY-MM-DD/HH.

Option B: Load to Redshift

Write to S3, then trigger COPY commands into Redshift.
Define schema with Glue Crawler and connect from QuickSight.

📈 Build QuickSight Dashboards

Connect to S3 (via Athena) or Redshift.
Import your Glue table or Redshift schema.
Create visualizations with customer-wise aggregations and time-based filters.

🗜️ Class Diagram

+---------------------+
|     Transaction     |
+---------------------+
| transactionId       |
| customerId          |
| amount              |
| timestamp           |
+---------------------+

+--------------------------+
|    AggregatedEvent       |
+--------------------------+
| customerId               |
| List<Double> amounts     |
+--------------------------+

🧩 Component Diagram

+-------------+       +-------+       +----------+       +-----------+       +-------------+
| Java Client | ----> | Kafka | ----> | Flink Job| ----> |   S3/Redshift    | --> QuickSight |
+-------------+       +-------+       +----------+       +-----------+       +-------------+

🧠 Final Thoughts

This setup is:

Streaming-first: near real-time insights
Memory-efficient: thanks to Flink and Kafka
Scalable: integrate with full AWS analytics stack

You can take this further by adding:

Schema Registry (for Avro/Protobuf support)
Flink SQL or CEP patterns
Alerting via AWS SNS or EventBridge

Got questions or want to see this deployed in production? Drop them in the comments or reach out! 🚀

Build Real-Time Analytics with Java 21, Apache Kafka, Flink, and AWS QuickSight

Design

🔗 Architecture Overview

🧱 Key Components

✅ Kafka — Event Ingestion

💡 Flink Streaming Job (Java 21)

🚚 1. Transaction POJO

📈 2. AggregatedEvent POJO

🧹 3. Flink Mapper and Reducer

Map Kafka message (JSON) to `Transaction`

Map `Transaction` to `AggregatedEvent`

Aggregate all transaction amounts

🧪 Full Flink Job Pipeline

☁️ Sinking to AWS and Visualizing

Option A: Write to Amazon S3

Option B: Load to Redshift

📈 Build QuickSight Dashboards

🗜️ Class Diagram

🧩 Component Diagram

🧠 Final Thoughts

Subscribe to my newsletter

Vitali R

Vitali R

Build Real-Time Analytics with Java 21, Apache Kafka, Flink, and AWS QuickSight

Design

🔗 Architecture Overview

🧱 Key Components

✅ Kafka — Event Ingestion

💡 Flink Streaming Job (Java 21)

🚚 1. Transaction POJO

📈 2. AggregatedEvent POJO

🧹 3. Flink Mapper and Reducer

Map Kafka message (JSON) to Transaction

Map Transaction to AggregatedEvent

Aggregate all transaction amounts

🧪 Full Flink Job Pipeline

☁️ Sinking to AWS and Visualizing

Option A: Write to Amazon S3

Option B: Load to Redshift

📈 Build QuickSight Dashboards

🗜️ Class Diagram

🧩 Component Diagram

🧠 Final Thoughts

Subscribe to my newsletter

Vitali R

Vitali R

Map Kafka message (JSON) to `Transaction`

Map `Transaction` to `AggregatedEvent`