top of page
  • Writer's pictureOscar Martinez

Power Query Text Cleaning Function

Updated: Aug 15, 2023

Our job as data analysts often involves many hours of manually cleaning up messy text data. Throughout my career analysing data and using Power Query, I've developed a Power Query function for cleaning up text columns.

In this blog post, I will unveil the magic behind my Power Query function, which performs many text-cleaning operations effortlessly. Say goodbye to tedious tasks such as replacing characters, eliminating pesky elements, and trimming text individually. My function does it all with a single, powerful sweep!

The steps this function performs are the following:

  1. Replace characters.

  2. Replaces matching words.

  3. Remove characters.

  4. Trims text.

  5. Replace full cell contents.

Instead of a boring explanation, see the video below that shows how the text "University fo Zürich" goes through different transformations; all column rows will go through the same transformation.

Instructions for using the Power Query Text Cleaning Function function

1. Copy and paste the following code into a new query, creating a new function.

2. In the table containing the column you want to clean, paste the following code at the end; make sure you replace the "Previous Step" and "Column to be cleaned" parameters to match your table and previous code.

3. Depending on your needs, your will need to modify the code.

  • If you need to replace different/more characters, add more rows to this part of the code.

//This step replaces characters
    #"ReplaceMatchingChar" = Text.Combine(
        List.ReplaceMatchingItems(
            Text.ToList (Source), { 
                {"ü", "u"},
                {"ö", "o"}
            }
        )
    ),
  • If you need to replace more complete words, modify this section.

//This step replaces words
    #"ReplaceMatchingWord" = Text.Combine(
        List.ReplaceMatchingItems(
            Text.Split(#"ReplaceMatchingChar"," "),
            {
                {"fo", "of"},
                {"teh", "the"}              
            }
        )," "
    ),
  • If you need to remove more characters, add more between the double-quotes.

//This step removes characters
    #"Cleaned" = 
        Table.TransformColumns(
            #table(1, {{#"ReplaceMatchingWord"}}),
            {{
                    "Column1", 
                    each 
                        List.Accumulate(Text.ToList(",."),_,(String,Remove) => Text.Replace(String,Remove,""))
            }}
        ),
  • If you need to replace more complete cell contents, add more rows to this part of the code

//This step replaces full cell contents
    #"ReplaceFullContent" = List.ReplaceMatchingItems(
        {#"Trimmed Text"{0}[Column1]},
        {
            {"University of Zurich","UZH"},
            {"University of the Abroad","University abroad"}
        }
    ),

We hope this function came in handy, please let us know your experiences with, if you have a different approach or if there is something else you would be interested in learning.

910 views

Related Posts

See All

Comments


bottom of page