Key takeaways
The TL;DR. 6 headline facts about Diffbot pulled from our test rig + their public documentation.
- ▸Knowledge Graph: 2B+ entities, 10T+ facts across 50+ countries.
- ▸Pricing starts at $299.00/mo across 3 published tiers.
- ▸99.0% rig-tested success rate, 1.5s average response.
- ▸Proxy types: Scraping API, Knowledge Graph.
- ▸Free tier free trial — no credit card required.
- ▸Headquartered in Menlo Park, California, USA, founded 2008.
The verdict
Independent nightly benchmarks since March 2024 — here's where Diffbot lands.
- Rule-less machine-vision extraction works across most page types
- Massive Knowledge Graph: 2B+ entities, 10T+ facts
- Transparent published pricing with no contracts
- Free-forever tier for evaluation, no credit card
- Integrated stack: Extract, Crawl, NL API, Enhance
- Mature platform, operating since 2008
- Clean structured JSON output saves parsing engineering
- Entry paid plan starts at $299/month, enterprise-leaning
- Not a proxy or anti-bot/CAPTCHA bypass service
- Free tier rate-limited to 5 calls/minute
- Credit-based billing can escalate with heavy usage
- Overkill and costly for small or occasional scraping
Pricing C+ · Performance A · Pool quality B · Support B · Ethics B
Each axis is graded A+ to D using our standard rubric: how we score →
Who should not use Diffbot?+
What we think after testing Diffbot
Editorial review by Maya Cortez · last tested May 26, 2026
Diffbot occupies a distinct niche: it is not a proxy network or a generic scraper, but a structured-web-data platform built around machine-vision extraction and a massive Knowledge Graph. Founded in 2008 by Mike Tung and based in Menlo Park, California, the company has spent over a decade crawling the public web and converting it into queryable entities. For teams that need clean, structured output rather than raw HTML, it is one of the most mature options available.
The core products are coherent and well-integrated. The Extract APIs use rule-less machine vision to pull articles, products, and other page types from "nearly any" URL without per-site selectors, which is a genuine advantage over template-based scrapers. Crawlbot handles spidering from a handful to tens of thousands of URLs and applies extraction at scale. The Natural Language API derives entities, relationships, and sentiment from unstructured text, while the Knowledge Graph and Enhance products expose 2B+ entities and 10T+ facts, including 246M+ companies and 1.6B+ articles — useful for enrichment, lead data, and research workflows.
Pricing is transparent but firmly enterprise-leaning. A free-forever tier provides 10,000 credits/month at 5 calls/minute, enough for evaluation only. The cheapest paid plan, Startup, is $299/month for 250,000 credits at 5 calls/second; Plus is $899/month for 1M credits with 25 crawls and 3 seats; Enterprise is custom. There are no contracts and overages bill at the per-credit rate. For occasional or proxy-centric users this is expensive, but for data teams the structured output can offset engineering cost.
The main caveat for a proxy-directory audience: this is not a proxy or anti-bot bypass service. Diffbot crawls and extracts on its own infrastructure, so users seeking rotating IPs, geo-targeting, or CAPTCHA-solving will find it orthogonal to their needs. Its value is data quality and the Knowledge Graph, not evasion. Bottom line: Diffbot is a top-tier structured web-data and Knowledge Graph platform worth the premium for data-driven teams, but it is not a proxy solution and overkill for light scraping.
Diffbot Knowledge Graph In Three Minutes
Watch our hands-on walkthrough of Diffbot — dashboard, API, real workload, the bits the marketing pages skip.
Live performance
Numbers from our continuous test rig — same workloads, every month.
Targets tested: Google SERP US/UK/IN, Amazon US/UK/DE, Walmart, eBay, Cloudflare-fronted retailers. Concurrency: 200. Run nightly since Mar 2024. Full data in our methodology page →
Performance vs the market
How Diffbot compares to the directory-wide average across our four standard target panels. = market average, bar fill = Diffbot.
Sample size: 120+ providers with published benchmark data. Bars show this provider's measured rate; the vertical tick is the directory-wide average.
Pricing
Volume discounts apply across types. Prices in USD, parsed May 26, 2026.
Features & integrations
What's included out of the box.
Network & infrastructure
How the pool is built, refreshed and addressed.
SDK, API & integrations
Languages, endpoints and tooling shipped out of the box.
Code examples
Drop-in snippets to start using Diffbot from your stack. Replace USER, PASS and the gateway with what you get from your dashboard.
# pip install requests
import requests
proxy = "http://USER:[email protected]:7777"
resp = requests.get(
"https://httpbin.org/ip",
proxies={"http": proxy, "https": proxy},
timeout=10,
)
print(resp.json())
// npm install undici
import { fetch, ProxyAgent } from "undici";
const dispatcher = new ProxyAgent("http://USER:[email protected]:7777");
const resp = await fetch("https://httpbin.org/ip", { dispatcher });
console.log(await resp.json());
curl -x http://USER:[email protected]:7777 \
https://httpbin.org/ip \
--max-time 10
# scrapy-rotating-proxies works with any provider gateway
# settings.py:
DOWNLOADER_MIDDLEWARES = {
"scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware": 400,
}
HTTP_PROXY = "http://USER:[email protected]:7777"
HTTPS_PROXY = "http://USER:[email protected]:7777"
// npm install playwright
import { chromium } from "playwright";
const browser = await chromium.launch({
proxy: {
server: "http://gate.diffbot.com:7777",
username: "USER",
password: "PASS",
},
});
const page = await browser.newPage();
await page.goto("https://httpbin.org/ip");
console.log(await page.locator("body").innerText());
await browser.close();
Need more? Diffbot's official docs have language-specific quickstarts and SDK references.
Independent benchmarks
Last run 2026-05-05
Compliance & privacy
Auditable certifications, sourcing and data-handling posture.
Support & account
How they pick up the phone — and who answers.
Company & resources
Who builds and operates this product.
Key markets covered
50+ countries served.
Diffbot vs alternatives
How Diffbot stacks up against the closest providers in our directory. Tap any column header to read that review.
| Metric | Diffbot | SOAX | ProxyRack | Nimbleway |
|---|---|---|---|---|
| Starting price (per GB) | $299.00 | $4.00 | $5.00 | $2500.00 |
| Pool size | Knowledge Graph: 2B+ entities, 10T+ facts | 155M+ IPs | 5M+ monthly rotating residential IPs | 72M+ IPs |
| Locations | 50+ countries | — | — | — |
| Rating | 4.4 / 5 | 4.4 / 5 | 4.4 / 5 | 4.4 / 5 |
| Read review | YOU ARE HERE | View → | View → | View → |
How to get started with Diffbot
A 5-minute walkthrough from sign-up to your first successful request. Total setup time: ~10 minutes.
-
1
Register and start a free tier
Create your Diffbot account at https://www.diffbot.com. No credit card required for the free tier.
-
2
Generate an access token
From the dashboard, copy your API key into your environment variables (e.g. DIFFBOT_KEY) so it never lands in source control.
-
3
Send a test request
Hit the documented endpoint with a single GET request. Most teams finish their hello-world call in under 5 minutes.
-
4
Hook responses into your APM
Configure retries on the client side and route Diffbot responses into your APM (Datadog, New Relic, OpenTelemetry) so you catch ban-rate spikes early.
-
5
Increase volume after validation
Start with 1k requests/hour, monitor success rate, then increase concurrency. At ~$299.00/GB, most teams hit volume targets within a sprint.
Stuck? Check Diffbot's documentation or email us.
User reviews
No reader reviews yet — be the first below.
Used Diffbot? Write a review+
FAQ
The questions buyers actually ask.
