Data Professionals spend an average of 23% of their time cleaning data — here’s a BulkTextReplaceValue function to save you some time 🕒

Ian SantillanIan Santillan
3 min read

Proportion time spent on data science activities

In Bob Haye’s article from earlier this year, results of a recent study of over 23,000 data professionals found that data scientist spend about 40% of gathering and cleaning data, 20% of their time building and selecting models and 11% of their time finding insights and communication them to stakeholders.

With the recent announcement of Automated Machine Learning (AutoML) in Power BI going GA (general availability), we’ll want to spend less time cleaning data and more time exploring AutoML in PowerBI!

Note that it’s best practice to do your transformations as close to the data source as possible. If for any reason, you’re not able to do your transformations closer to the data source, this post will help you to use the BulkTextReplaceValuefunction to save you some time!

If you’ve ever had to replace values in a column numerous times, you know how tedious it can be, especially when you want to document each step. Having to right click on each step and clicking on properties just takes too much time…

From here, you then move on to writing the replacements out in the advanced query editor and you already know to be careful here — not to miss the current line’s previous line identifier.

In Miguel Escobar’s video, by providing a list of pairs of the old values and the new values as a conversion table, we can use the BulkTextReplaceValue function to replace all the required values in one step!

👩‍💻 You can download the samples files here → https://github.com/ievsantillan/PowerBI/tree/master/PowerQuery/BulkTextReplaceValue

In Miguel’s video he used the Text.Replace function but I ran into an issue where I had a part of the text to be replaced included in another row as shown in the screenshot here.

The Feature column being the old text and the Software Name to be the new text.

I needed to replace Text.Replace with Replacer.ReplaceValue.

The syntax and definition for both functions are identical but the difference here is Text.Replace will replace all occurrences where as Replacer.ReplaceValue will match the entire cell contents.

Text.Replace

Text.Replace(text as nullable text, old as text, new as text) as nullable text

Returns the result of replacing all occurrences of text value old in text value text with text value new. This function is case sensitive.

Replacer.ReplaceValue

Replacer.ReplaceValue(value as any, old as any, new as any) as any

Replaces the old value in the original value with the new value. This replacer function can be used in List.ReplaceValue and Table.ReplaceValue.

Let’s take a look at the BulkTextReplaceValue function

https://github.com/datascience-ninja/PowerBI/blob/master/PowerQuery/BulkTextReplace/BulkTextReplaceValue.m

(x as text) as text =>letmaxIterations = Table.RowCount(ConversionTable) ,Iterations = List.Generate( () =>[Result = Replacer.ReplaceValue(x, ConversionTable[OldText]{0}, ConversionTable[NewText]{0}), Counter = 0],each [Counter] < maxIterations,each [Result = Replacer.ReplaceValue([Result], ConversionTable[OldText]{Counter}, ConversionTable[NewText]{Counter}),Counter = [Counter] +1], each [Result]),output = Iterations{maxIterations-1}inoutput

We loop through the inputted column (x), replacing each occurrence found of the OldText with the NewText as defined by the provided ConversionTable using the Replacer.ReplaceValue function.

We can replace each of the Replace Value query step in the screenshot above with just one line by using the Table.AddColumn function and invoking the BulkTextReplaceValue function on the column [OldName]that we need to clean up and providing a new column name [Result Column Name]for the results.

= Table.AddColumn(#Previous line identifier", "Result Column Name", each #"BulkTextReplaceValue"([OldName]))

Alternatively, you can follow the steps below to accomplish the same thing without using the advanced query editor.

0
Subscribe to my newsletter

Read articles from Ian Santillan directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Ian Santillan
Ian Santillan

Data Architect ACE - Analytics | Leading Data Consultant for North America 2022 | Global Power Platform Bootcamp 2023 Speaker | Toronto CDAO Inner Circle 2023