By the DedupeLines engineering team · Published 2026-05-16 · Updated 2026-05-17 · 8 min read · release

Introducing DedupeLines — a browser-local line cleaner

DedupeLines is a focused toolbox for cleaning line-oriented text — emails, SKUs, URLs, log lines, CSV columns — without ever uploading them. The whole engine runs in your browser tab. Pull the Ethernet cable after the page loads and every tool still works.

This post is the v1.0 launch note: why we built it, what shipped, and where it sits next to the existing PDF-converter-shaped industry. If you just want the tool, open the homepage.

Why a browser-local deduper

Every data person has hit the same wall:

You have a list of 50K emails / SKUs / URLs / log lines.
You need to dedupe it, count duplicates, or strip blank rows.
Excel’s Remove Duplicates changes column order under the hood.
Google Sheets UNIQUE() gets sluggish above 50K rows and offers no case-sensitive option.
Online tools want you to upload your customer list to a server you’ve never heard of.

That last one is the killer. Most “dedupe online” tools work by accepting your file on a backend. For a personal todo list that’s fine. For a B2B exported customer list, an internal SKU set, or a production log dump, that’s a compliance conversation you don’t want to have.

DedupeLines runs 100% in the browser tab. There is no server endpoint that accepts your text. Open DevTools → Network, paste 80 MB of input, click Run, and watch: not a single new request fires. The engine ships as a static /static/js/dedupe.js file (no minify, no source map needed) and runs as a Web Worker for any input above 2 MB or 100K lines.

What ships in v1.0

Six tools, all sharing the same engine, all browser-local, all free:

Remove duplicate lines — the homepage tool. Order-preserving dedupe with optional case / trim / blank-line / shuffle toggles. 80 MB hard cap per run.
Regex extractor — paste a list, write one regex, keep only matching lines. JavaScript-flavour RegExp (lookbehind, named groups, Unicode property escapes all supported).
Reverse lines — flip line order. Useful for log review (newest-first), commit-history flips, and chained workflows like the “keep last occurrence” pattern.
Remove blank lines — strip empty and whitespace-only lines (catches NBSP, en-space, em-space, ideographic space, all the invisible Unicode whitespaces).
Shuffle lines — Fisher-Yates fair random reorder. Good for raffles, A/B sample selection, ML data prep.
Trim lines — strip leading and trailing whitespace from every line.
Add line numbers — cat -n in a browser tab. Tab-separated by default so the output drops straight into Excel column A.

Plus the toolbox is shipped in 6 languages (English, Deutsch, Español, Français, 日本語, 中文) so the tools themselves are accessible in your team’s native language. (This blog stays English — see “What we deliberately don’t do” below for why.)

How the engine works

The whole thing is one file: dedupe.js. Roughly 350 lines of vanilla JavaScript. Same engine handles dedupe, frequency counting, regex line filtering, reverse, shuffle, blank-line removal, trim, and line numbering — selected via a mode option.

The dedupe path is an O(n) hash table built on a prototype-free Object.create(null) map. Total complexity is O(n + u log u + s log s) where n = input lines, u = unique lines, s = output length. On a 2024 M3 MacBook Air it processes 100K lines in ~380 ms.

For inputs above 2 MB (or 100K lines) on desktop, or 300 KB (5K lines) on mobile, the tool skips the <textarea> entirely (browser native word-wrap is O(n) on insert and locks the tab on multi-megabyte text) and routes processing to a Web Worker. The result downloads as .txt instead of rendering inline. The hard ceiling is 80 MB per run — large enough for a multi-million-row log, conservative enough that a stray gigabyte input doesn’t crash the tab.

How it compares to existing tools

The closest comparisons:

Excel “Remove Duplicates” — same intent, but case-insensitive only, no frequency table, and re-sorts the column under the hood. Hard 1,048,576 row limit and gets sluggish well below that. DedupeLines preserves the original row order by default and shows you exact occurrence counts when the freq toggle is on.
Google Sheets UNIQUE() — live formula, no case-sensitive option, starts to lag every keystroke past 50K rows. DedupeLines runs in a worker thread and stays responsive on million-row inputs.
SmallPDF / iLovePDF (the obvious comparison for “text utility online”) — different category (PDF), but worth noting they both upload your file to a server. That’s the industry default. DedupeLines is the “why upload at all” rebuttal for line-level text work.
Command line (sort -u, awk '!seen[$0]++', grep) — faster on raw throughput (grep wins ~2× on a 100K-line regex test). DedupeLines wins on “no SSH, no install, works on a locked-down corporate Windows or an iPad”. Different audience.

What we won’t do

No upload of your input. Already covered. The whole reason to use this tool is that your pasted text never leaves your machine.
No login, no signup. Nothing to gate, nothing to recover, nothing to phish.
No tracking of your input text. Your text is processed entirely by JavaScript inside the browser tab and never reaches our servers, our analytics, or our ad partners. We measure anonymous page-view counts (which page you opened, country, browser class) separately from your content — the two streams are never combined. Full details in our privacy policy.
Free to use. Six languages, six tools, in-depth posts, all free, no “upgrade for more”. Operating costs are covered by anonymous display advertising (when enabled) and keeping the infrastructure simple.
No multi-language blog. The tools are 6-locale. The blog — this one — stays English. Looking at the broader tool-site landscape (TextMechanic, JSONFormatter, CleanCSS, Calculator.net, even mid-size Tinify), small tool sites overwhelmingly run English-only blogs. We’re following that pattern: invest in the tool localization, keep the deep articles single-locale until traffic data justifies otherwise.
No AI rewriting your data. The engine is hand-written deterministic JavaScript. No LLMs, no servers, no surprises.

Where this is going

The shortlist for the next few months:

A few more single-purpose tools (line sort, word-level dedup, simple find-and-replace) when there’s a clear use case.
One or two long-form deep-dive posts per month on this blog — tool combinations, engine internals, real workflows. (Two are already drafted: keep last occurrence with reverse + dedupe and regex email filter recipes.)
If a particular non-English locale shows real traffic (1K+ MAU on the tools), we’ll consider translating that blog.

What we are deliberately not adding: an account system, a paid tier, or anything that asks you to upload your data. If you need to pipe data through this engine programmatically, the source is right there — /static/js/dedupe.js — fork it.

Get involved

Bug reports, feature suggestions, blog topic requests, translation corrections, or just a hello: [email protected].

Frequently asked questions

Is DedupeLines really free, with no signup or paid tier?

Yes. Six tools, six languages, no account system, no “upgrade for more” gate. Operating costs are covered by anonymous display advertising (Google AdSense, when enabled) and keeping the infrastructure as simple as a static HTML/JS site behind a CDN. We chose ads over a paid tier because we want everyone — including students, hobbyists, and people doing one-off list cleanups — to use the full toolbox without a signup wall. Your text never reaches the ad network; the ads target page content and country, not your input.

Can I use DedupeLines offline?

Mostly yes. Load the homepage once with a connection, then disconnect — every tool still works for the rest of the session. The browser holds the JavaScript engine and the CSS in memory. Closing the tab loses the offline state; reopening with no connection won’t load the page. We don’t ship a service worker for full PWA offline-first behaviour, but it’s on the shortlist if there’s real demand.

How does DedupeLines compare to SmallPDF or iLovePDF for text files?

Different categories — SmallPDF and iLovePDF are PDF utilities, not line-of-text utilities. The relevant comparison is the architecture: both upload your file to a server, process it there, then return a download. DedupeLines never has an upload step. If your input contains anything you wouldn’t willingly hand to a third-party SaaS (customer lists, internal SKUs, raw logs), the architectural difference matters more than the feature comparison.

What languages does DedupeLines support?

The tools and homepage ship in English, Deutsch, Español, Français, 日本語, and 中文. Six languages, full UI localisation. The blog (this one and the others on /updates) is English-only by design — small tool sites overwhelmingly do this because per-language blog maintenance scales badly and the ROI on translating engineering blog posts is low.

Where can I read the engine source?

It ships as plain JavaScript at /static/js/dedupe.js. No minifier, no source map. We’ve also written a walkthrough of how it works: Six tools, one engine — a tour of dedupe.js.

Related guides

Six tools, one engine — a tour of dedupe.js — engine architecture, complexity, design trade-offs.
Filter 50K emails in 5 minutes — regex extractor recipes — practical patterns for one of the included tools.
The blank-line trap — Unicode whitespace handling across the toolbox.

The fastest way to see what this is about: paste a list, hit Run, copy the output back. No upload, no signup, ~380 ms for 100K lines.

Open the tool