Apache Beam: ToString Transform
Overview
If you want to convert every input element from a PCollection to a string, you should check out the ToString
transforms. It can do everything from simply converting an object to a string by implicitly calling its toString()
method to concatenate a kv pair with a custom delimiter.
When You Should Use the ToString Transform
You should use the ToString
transforms when you want to perform the following transformations on your input data:
Transform each element into a string using the
Object.toString()
methodTransform an input element of type
Iterables
into a string using a delimiterTransform an input element of type
KV
into a string using a delimiter
How to Use the ToString Transform
Just apply the built-in transform to a PCollection of elements, lists, or KVs. The output PCollection will always have the type of the String
.
Example: Convert KV to String
// Create key-value pairs
PCollection<KV<String, String>> pairs =
pipeline.apply(
Create.of(
KV.of("fall", "apple"),
KV.of("spring", "strawberry"),
KV.of("winter", "orange"),
KV.of("summer", "peach"),
KV.of("spring", "cherry"),
KV.of("fall", "pear")));
// Use ToString on key-value pairs
PCollection<String> result = pairs.apply(ToString.kvs());
// results in a PCollection containing
// fall,apple
// string,strawberry
// ...etc
Conclusion
Check out other useful transforms from the official Apache Beam documentation.
Subscribe to my newsletter
Read articles from Nikhil Rao directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Nikhil Rao
Nikhil Rao
Los Angeles