Text Cleaner
Remove extra spaces, empty lines, special characters, and HTML tags from text.
Text Cleaner removes unwanted formatting, extra whitespace, and common junk from text in a single click. When you copy text from PDFs, Word documents, web pages, or emails, the result often contains extra line breaks, multiple consecutive spaces, invisible Unicode characters, smart quotes that break code, or HTML entities like & and . Cleaning this manually is tedious — the Text Cleaner automates it.
Select the cleaning operations you need and click Clean. Available operations include: removing extra whitespace and blank lines, trimming leading and trailing spaces, collapsing multiple spaces into one, converting smart/curly quotes to straight quotes, removing HTML tags, decoding HTML entities, stripping non-printable characters, and converting Windows-style line endings (\r\n) to Unix-style (\n). You can combine multiple operations in one pass.
Text cleaning is a common first step in data processing pipelines. When importing text data into databases, machine learning models, or APIs, unexpected whitespace and special characters are a frequent source of parsing errors. Cleaning the text first prevents these issues downstream. For everyday writing tasks, clean text pastes more reliably into editors and publishing tools without introducing hidden formatting artifacts.
Frequently Asked Questions
Code Implementation
import re
import unicodedata
def remove_control_chars(text: str) -> str:
"""Remove non-printable control characters (keep tab, newline, carriage return)."""
return "".join(
ch for ch in text
if unicodedata.category(ch) not in ("Cc", "Cf") or ch in ("\t", "\n", "\r")
)
def normalize_line_endings(text: str, style: str = "lf") -> str:
"""Normalize line endings to LF (Unix) or CRLF (Windows)."""
text = text.replace("\r\n", "\n").replace("\r", "\n")
if style == "crlf":
text = text.replace("\n", "\r\n")
return text
def collapse_whitespace(text: str) -> str:
"""Replace multiple consecutive spaces/tabs on each line with a single space."""
return "\n".join(re.sub(r"[ \t]+", " ", line) for line in text.splitlines())
def trim_lines(text: str) -> str:
"""Strip leading and trailing whitespace from each line."""
return "\n".join(line.strip() for line in text.splitlines())
def remove_blank_lines(text: str) -> str:
"""Collapse multiple consecutive blank lines into one."""
return re.sub(r"\n{3,}", "\n\n", text)
def clean_text(text: str,
control_chars: bool = True,
normalize_endings: bool = True,
collapse_spaces: bool = True,
trim: bool = True,
blank_lines: bool = True) -> str:
"""Run all cleaning steps in sequence."""
if control_chars:
text = remove_control_chars(text)
if normalize_endings:
text = normalize_line_endings(text)
if collapse_spaces:
text = collapse_whitespace(text)
if trim:
text = trim_lines(text)
if blank_lines:
text = remove_blank_lines(text)
return text
sample = " Hello\t world! \n\n\nExtra blank lines \n\x00Null byte here "
print(repr(clean_text(sample)))
# 'Hello world!\n\nExtra blank lines\nNull byte here'Comments & Feedback
Comments are powered by Giscus. Sign in with GitHub to leave a comment.