Transform documentation sites
into AI-ready markdown

Most AIs can't navigate documentation like humans. DocFetch converts entire documentation websites into clean, single-file markdown with intelligent llm.txt indexing.

$ npm install -g doc-fetch
$ doc-fetch --url https://react.dev/learn --output docs.md --llm-txt
npm version PyPI version Go module License

Why DocFetch?

๐Ÿค–

AI/LLM Optimized

Single-file consumption with clean, structured markdown. Perfect token efficiency for LLM context windows.

๐Ÿ“‘

LLM.txt Indexing

Intelligent semantic categorization. Your AI agents know whether they're reading an API reference or a tutorial.

โšก

One Command

Replace hours of manual copy-pasting with a single CLI command. Concurrent fetching with configurable depth.

๐Ÿงน

Clean Extraction

Automatically strips navigation, headers, footers, ads, and buttons. Only the content matters.

๐Ÿท๏ธ

Smart Classification

Automatic page classification: APIs, guides, references, examples. With semantic descriptions.

๐Ÿ”ง

Production Ready

Respects robots.txt, rate limiting, cross-platform support. Multiple installation options.

Installation

Python Recommended
pip install doc-fetch
Node.js
npm install -g doc-fetch
pnpm add -g doc-fetch
Go
go install github.com/AlphaTechini/doc-fetch/cmd/docfetch@latest
Binary
# Download from GitHub Releases
# Windows, macOS, Linux
Download binaries โ†’

Usage

Basic Usage

# Fetch entire documentation site
doc-fetch --url https://golang.org/doc/ --output ./docs/golang-full.md

# With LLM.txt generation for AI optimization
doc-fetch --url https://react.dev/learn --output docs.md --llm-txt

Advanced Usage

doc-fetch \
  --url https://docs.example.com \
  --depth 4 \
  --concurrent 10 \
  --llm-txt \
  --user-agent "MyBot/1.0"

Command Options

FlagShortDescriptionDefault
--url-uBase URL to fetch documentation fromRequired
--output-oOutput file pathdocs.md
--depth-dMaximum crawl depth2
--concurrent-cNumber of concurrent fetchers3
--llm-txtGenerate AI-friendly llm.txt indexfalse
--user-agentCustom user agent stringDocFetch/1.0

Real-World Examples

Go Documentation

doc-fetch --url https://golang.org/doc/ \
  --output ./docs/go-documentation.md \
  --depth 4 --llm-txt

Complete Go documentation with language spec, tutorials, and API references.

React Documentation

doc-fetch --url https://react.dev/learn \
  --output ./docs/react-learn.md \
  --concurrent 10 --llm-txt

React learn section with all tutorials and guides in one file.

Your Project Docs

doc-fetch --url https://your-project.com/docs/ \
  --output ./internal/docs.md \
  --llm-txt

Fetch your own project's documentation for AI agent training.

How It Works

1

Link Discovery

Parses the base URL to find all internal documentation links

2

Content Fetching

Downloads all pages concurrently with respect for robots.txt

3

HTML Cleaning

Removes non-content elements (navigation, headers, footers)

4

Markdown Conversion

Converts cleaned HTML to structured markdown

5

Classification

Categorizes pages as API, GUIDE, REFERENCE, or EXAMPLE

6

Output Generation

Combines all docs into one file + generates llm.txt index

How llm.txt Supercharges Your AI

Without DocFetch

"What does the net/http package do?"

Your AI has no idea. It needs to guess or ask you to provide context.

With DocFetch

"Check the [API] net/http section in llm.txt"

Your AI knows exactly where to look: "HTTP client/server implementation"

Example llm.txt Output

# llm.txt - AI-friendly documentation index

[GUIDE] Getting Started
https://golang.org/doc/install
Covers installation, setup, and first program.

[REFERENCE] Language Specification  
https://golang.org/ref/spec
Complete Go language specification and syntax.

[API] net/http
https://pkg.go.dev/net/http
HTTP client/server implementation.

Stop wasting time copying documentation

Start building AI agents with complete knowledge.