Apache Beam: WithKeys Transform
Nikhil Rao
1 min read
Overview
If you need to convert a PCollection into a KV pair using dynamically generated keys derived from the input element, you should check out the WithKeys
transform.
When You Should Use the WithKeys Transform
When you only want to convert a PCollection<V>
to a PCollection<KV<K, V>>
. You can then use aggregations or transforms your newly generated KV's.
How to Use the WithKeys Transform
Just apply the built-in Transform to any PCollection and supply a function which will take the input element as an argument and return a value which will be used as a key.
Example: Create a KV pair from a list of Words
PCollection<String> words = pipeline.apply(Create.of("Hello", "World", "Beam", "is", "fun"));
PCollection<KV<Integer, String>> lengthAndWord =
words.apply(
WithKeys.of(
new SerializableFunction<String, Integer>() {
@Override
public Integer apply(String s) {
return s.length();
}
}));
// returns
// KV{5, World}
// KV{5, Hello}
// ... etc
Conclusion
Check out other useful transforms from the official Apache Beam documentation.
0
Subscribe to my newsletter
Read articles from Nikhil Rao directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Nikhil Rao
Nikhil Rao
Los Angeles