HTML to Markdown: A Practical Guide to Clean, Portable Content
Understand with AI
Discuss with your preferred AI assistant
Reviewing content in Markdown removes most tag noise from Git diffs versus reviewing raw HTML.
Markdown typically uses far fewer characters than equivalent HTML, cutting tokens for AI prompts.
Conversion happens client-side, so your HTML is never uploaded — safe for private drafts.
HTML is how the web renders content, but it is a terrible format to write, version, or move between systems. Markdown is the opposite: plain text that is easy to read, diff in Git, and re-render anywhere. An HTML to Markdown converter bridges the two — taking the markup your CMS, email editor, or web page produces and turning it into clean, portable Markdown you can paste into a docs repo, a static-site generator, or an AI prompt.
This guide explains when to convert HTML to Markdown, how a good converter actually works, and how to get the cleanest possible output without hand-editing every line.
Why Convert HTML to Markdown?
Teams reach for an HTML to Markdown converter whenever content needs to leave a rich editor and live somewhere lighter. The most common reasons:
- Content migration — moving posts out of WordPress, Contentful, or a legacy CMS into a static-site generator like Next.js, Hugo, Astro, or Jekyll, which all read Markdown.
- Documentation — turning a rendered help-center page back into the
.mdfiles that power tools like Docusaurus, GitBook, or a README. - Version control — Markdown diffs cleanly in Git, so reviewers see real content changes instead of a wall of tag noise.
- AI and LLM prompts — Markdown is compact and token-efficient, so feeding a model Markdown instead of raw HTML cuts cost and improves comprehension.
- Repurposing — pulling a clean, style-free version of a page so you can rewrite it for a newsletter, a Notion doc, or a social post.
What a Good HTML to Markdown Converter Handles
Real-world HTML is messy. It carries inline styles, nested wrappers, and tags that have no Markdown equivalent. A reliable converter maps the structural elements faithfully and quietly discards the rest:
- Headings —
<h1>through<h6>become#to######(or Setext underlines for the top two levels). - Text emphasis —
<strong>and<b>become**bold**;<em>and<i>become_italic_;<del>becomes~~strikethrough~~. - Links and images — anchors become
[text](url)and images become, preserving the title attribute when present. - Lists — ordered, unordered, and nested lists are indented correctly, and GitHub task-list checkboxes become
- [ ]and- [x]. - Blockquotes and code —
<blockquote>becomes>lines, inline<code>is wrapped in backticks, and<pre>blocks become fenced code with the language preserved. - Tables —
<table>markup becomes GitHub-flavored pipe tables.
The table below summarizes the core mappings most teams rely on:
| HTML | Markdown |
|---|---|
| <h2>Title</h2> | ## Title |
| <strong>text</strong> | **text** |
| <a href="/x">link</a> | [link](/x) |
| <ul><li>item</li></ul> | - item |
| <blockquote>quote</blockquote> | > quote |
How to Convert HTML to Markdown, Step by Step
1. Paste your HTML
Copy the HTML you want to convert — from a CMS export, your browser's "view source," or an editor's raw-HTML view — and paste it into the input box. A good tool parses it instantly, in your browser, so nothing is uploaded to a server.
2. Review the live preview
Watch the Markdown render as you type. Confirm that headings, lists, and links survived the conversion and that no important structure was flattened into plain paragraphs.
3. Tune the options
Choose underscore or asterisk emphasis, set your preferred bullet character, toggle fenced versus indented code blocks, and decide whether to keep link titles. These small choices keep the output consistent with whatever style guide or linter your repo enforces.
4. Copy or download
Copy the result to your clipboard or download it as a .md file ready to commit. Because the conversion is deterministic, the same input always yields the same Markdown — which matters when you are migrating hundreds of pages.
Best Practices for Clean Markdown Output
- Strip presentational markup first — inline styles, empty spans, and tracking
<div>wrappers add nothing to Markdown and only clutter the parse. - Match your repo's linter rules — pick the emphasis and bullet markers your
markdownlintor Prettier config expects so the output passes CI on the first commit. - Spot-check tables and code blocks, the two structures most likely to need a tweak after any HTML-to-Markdown conversion.
- Keep raw text safe — characters like
*,_, and backticks are escaped so prose is not accidentally reinterpreted as formatting.
Common Mistakes to Avoid
- Pasting an entire page including navigation, ads, and footers — convert only the article body for usable output.
- Assuming every HTML element has a Markdown equivalent; complex layouts, forms, and embeds will be simplified or dropped by design.
- Forgetting to re-check internal links after migration, since relative URLs may need rewriting for the new site structure.
Expert Tips
Convert the body, not the whole page
Paste only the article content — not the nav, ads, or footer. You will get usable Markdown the first time instead of paragraphs of boilerplate to delete.
Match your linter before you commit
Pick the emphasis and bullet markers your markdownlint or Prettier config expects. The output then passes CI on the first push with zero hand-editing.
Frequently Asked Questions
Is converting HTML to Markdown lossless?
It is lossless for structural content — headings, text, links, images, lists, quotes, code, and tables all map cleanly. Purely presentational HTML such as inline styles, custom classes, and layout wrappers has no Markdown equivalent and is intentionally dropped, which is usually what you want.
Is my HTML uploaded to a server?
No. This converter runs entirely in your browser using deterministic JavaScript, so your HTML never leaves your device. That makes it safe for internal documentation and unpublished drafts.
Does it support tables and code blocks?
Yes. HTML tables become GitHub-flavored pipe tables, and <pre><code> blocks become fenced code blocks with the language preserved when it is present in a language-* class.
What Markdown flavor does it output?
It produces CommonMark with popular GitHub-flavored extensions — strikethrough, task lists, and tables — which renders correctly on GitHub, in static-site generators, and in most documentation tools.