top of page
  • Writer's pictureOscar Martinez

Power Query Text Cleaning Function

Updated: May 4

Power Query Text Cleaning Function

Exploring Power Query Text Cleaning Function: Unlocking Efficient Text Manipulation in Power Query

As data analysts, we often spend numerous hours manually cleaning up messy text data. Throughout my career, I've refined a specific Power Query text cleaning function that streamlines the cleaning of text columns in your datasets. This function offers a robust solution for conducting multiple text cleaning operations with ease, reducing the necessity for manual character replacement, unwanted element removal, or individual trimming

Introduction to the Text.Clean Function

The Text.CleanString function is designed to streamline text manipulation, allowing users to replace characters, swap words, eliminate specific characters, and even replace entire cell contents based on user-defined parameters. This function is versatile and can be adapted to various text cleaning needs in Power Query.

How the Function Works

The function operates through several steps, each tailored to handle different aspects of text cleaning:

  1. Character Replacement: The function allows for the replacement of individual characters based on a user-defined list.

  2. Word Replacement: It also supports the replacement of whole words, which is particularly useful for correcting common typographical errors.

  3. Character Removal: Unwanted characters can be removed entirely from the text.

  4. Word removal: Words that are not needed are removed from the text.

  5. Full Cell Content Replacement: For more comprehensive modifications, the function can replace the entire contents of a cell with new text.

Text.CleanString Function Code

Expand the item below to see the full code:

Text.CleanString Function

Using the Text.Clean Function

To utilize the Text.CleanString function, follow these instructions:

  1. Copy the Function: Copy and rename the provided function code into a new query in Power Query. This code defines the function and its parameters, which include the text to be cleaned and optional parameters for characters to replace, words to replace, characters to remove, words to remove, and full-cell content replacements.

  2. Apply the Function: Apply the function to your dataset's text or column(s) that require cleaning. The function will process each row in the column according to the specified parameters, ensuring your data is cleaned consistently and accurately.

Example

After copying the code and renaming it "CleanString," we added a new custom function column and filled in the paragraphs.

  • Replace characters:"ü" for "u"

  • Replace words: Zuerich for Zurich and Uni for University

  • Remove characters: ","

  • Remove Words: null

  • Replace full cell contents: "UZH" for "University of Zurich".


Conclusion

The Text.CleanString function is a powerful tool for any data professional looking to improve the quality of text data. With its flexible parameter settings and comprehensive cleaning capabilities, it significantly reduces the effort required to clean data. Try it out in your next data project and experience the efficiency it brings to your workflow!

1,520 views

Related Posts

See All

Comments


bottom of page