← Blog
· 10 min read

What It Actually Costs to Run a Billion-Face Surveillance Engine

Storage. GPU inference. Residential proxies to dodge bot-detection at scrape time. Lawyers, settlements, the occasional eight-figure regulatory fine. We priced every line item using public vendor rates and the actual penalties on record. The total is much bigger than the press tends to suggest — which is exactly why "regulate them out of business" hasn't worked yet.

Clearview AI claims a face index of more than 50 billion images. PimEyes claims around 3 billion. Lenso.ai claims "several billion." These are not toy systems — they are massive, continuously-running infrastructure stacks that have to crawl the open web, store the images, run face-detection and embedding inference on every new photo, persist those embeddings into a vector database, serve sub-second similarity queries to paying customers, and pay lawyers to keep doing it in jurisdictions where regulators very much want them to stop.

We get asked the same question often: "Why are these companies still operating? Why hasn't the EU just shut Clearview down?" The honest answer is that the unit economics work even after the fines. To show why, we walked through every meaningful line item using public, source-able numbers — AWS list pricing, published proxy-vendor rates, and the actual regulatory fines and class-action settlements on the record.

Two model engines for comparison: a "Clearview tier" (~50B images, government / law-enforcement buyers, premium pricing) and a "PimEyes tier" (~2-3B images, consumer subscription, public-web-only crawl). Numbers below are annualized 2026 USD ranges. We err on the realistic-high end because most published estimates are too low.

1. Storage — $1M–$6M / year

A billion face images is not actually a lot of raw bytes. At ~80–150 KB per face thumbnail (the engines downsample), 1B images is roughly 80–150 TB. AWS S3 Standard list price is $0.023 per GB per month, so 100 TB lands at about $2,300/month or $28K/year. Cheap.

But that's not the real bill. The real bill is:

  • Embeddings + vector index. Every face is stored as a 128-, 256-, or 512-dimensional float vector. For 50B faces × 512 floats × 4 bytes = ~100 TB just for the raw vectors, before HNSW/IVF graph overhead which typically 2–3× that. Real number: 300–500 TB of hot, low-latency storage with NVMe-backed vector DBs (Milvus, Pinecone, or self-hosted FAISS clusters). At hyperscaler hot-storage rates this is on the order of $40K–$120K/month.
  • Multi-region replication. Government customers (in Clearview's case) have data-residency requirements. Two or three full replicas across regions roughly triples that figure.
  • Backup + cold tier. The original source images, plus enough metadata to retrain models, plus the audit trail regulators ask for. Add $200K–$500K/year on Glacier/Deep Archive.
  • CDN egress for the customer-facing UI. When a paying user views search results, the engine has to serve thumbnails. PimEyes serves these directly; that's bandwidth at ~$0.085/GB on CloudFront list, typically discounted to $0.02–$0.04 at their volume.
Storage total, realistic: $1.0M–$2.5M/year for PimEyes tier. $3M–$6M/year for Clearview tier.

2. GPU inference — $800K–$3M / year

Embedding inference (turning a face into a vector) is the cheap part of inference. A modern ArcFace or AdaFace model running on a single AWS g5.xlarge ($1.006/hour on-demand) handles roughly 300–500 face embeddings per second.

To keep a 50B-face index fresh you have to re-embed about 1–2% of it monthly as the source images change, plus the new images you discover. Call that 1B face inferences per month, or ~385 per second sustained. At 400/sec per g5.xlarge you need 1 instance running 24/7 — about $9K/year for the inference itself. Add a small reserve for burst load and the inference compute alone is $50K–$150K/year.

The expensive compute is everything around inference:

  • Face detection on every crawled image. Before you embed a face you have to find it. Modern web crawls return billions of images per month; you run a YOLO-class detector on each one to decide whether there is a face to embed. That's the bulk of GPU spend.
  • Quality filtering. Reject blurry, occluded, sub-resolution, watermarked, or training-data-flagged frames. Another model pass per image.
  • De-duplication. Hash + similarity match against existing vectors so you don't insert the same person 400 times from the same web crawl.
  • Continuous model retraining on new training data, demographic-balancing benchmarks (NIST FRVT submissions, etc.), plus the GPU time to evaluate against test sets every release.
GPU compute total, realistic: $800K–$1.5M/year for PimEyes tier. $1.8M–$3.0M/year for Clearview tier.

3. Crawl bandwidth + proxies — $2M–$10M / year

This is the line item most analyses underestimate by a full order of magnitude. To discover a billion faces you have to crawl tens of billions of web pages. Major sites (LinkedIn, Instagram, Facebook, news publishers, conference sites) actively block datacenter IP ranges and require residential or mobile-tier proxy networks to scrape at scale without triggering anti-bot defenses.

Bright Data's published residential pricing starts at $4.20/GB at low tiers and falls to roughly $1.50–$2.50/GB at enterprise volumes. Oxylabs and Smartproxy are in the same range. To crawl the open web at "discover most faces" scale, an engine consumes roughly 50–200 TB of proxy bandwidth per month.

Math: 100 TB/month × $2/GB blended = $200K/month, or $2.4M/year for proxies alone, before you've paid a single engineer or stored a single image. The Clearview tier (deeper crawl, more frequent recrawl, more aggressive site list) easily hits $500K–$800K/month and $6M–$10M/year on proxies.

Crawl bandwidth total, realistic: $2M–$5M/year for PimEyes tier. $6M–$10M/year for Clearview tier.

4. Engineering team — $4M–$15M / year

A face-search engine is not a "10 engineers in a coffee shop" project. Public hiring patterns (Clearview's LinkedIn page lists ~50 employees, PimEyes is harder to count) and the technical complexity (search infra, vector DB, computer-vision research, dedicated trust-and-safety to handle opt-outs, customer ops for law-enforcement contracts) imply teams in the 15–80-person range.

US-fully-loaded engineer cost (salary + benefits + equity + tools): roughly $250K–$400K/year on average. EU/Polish equivalent (PimEyes is incorporated in Wrocław, Lenso.ai is also Polish): closer to $80K–$160K/year. Mix accordingly.

People total, realistic: $4M–$8M/year for an EU-based PimEyes-tier team. $10M–$15M/year for a US-based Clearview-tier team (Clearview is HQ'd in New York).

5. Legal: defense, settlements, fines — $5M–$20M / year

This is the line item that's supposed to scare these companies into shutting down. Walk through the actual record:

  • Clearview AI / Illinois BIPA class action. Settled May 2024 for an estimated $51.75M (paid as a 23% future-equity stake in the company plaintiffs can monetize at IPO or sale). Reuters coverage of the Mahmood v. Clearview settlement.
  • Italian Data Protection Authority (Garante). Fined Clearview €20M (~$22M) in February 2022. Garante official decision.
  • French CNIL. Fined Clearview €20M in October 2022, then added a €5.2M penalty in May 2023 for non-compliance with the original order. CNIL English-language release.
  • Greek Hellenic DPA. Fined Clearview €20M in July 2022.
  • UK ICO. Fined Clearview £7.5M (~$9.4M) in May 2022. ICO press release. (Note: Clearview successfully overturned this on jurisdictional grounds in October 2023; the ICO is appealing.)
  • Hamburg DPA vs PimEyes precursor (2020). Hamburg's data-protection authority opened proceedings against PimEyes (then operating from Hamburg) and effectively forced the company to relocate operations to Poland in 2021.

Two structural points the line-item totals miss:

First, several of the EU fines are uncollectable in practice because the company has no EU operations to seize. CNIL's €5.2M non-compliance penalty was explicitly because Clearview ignored the original order — and Clearview kept ignoring it after, too. The fines exist on paper; the cash doesn't always change hands.

Second, the BIPA settlement is a 23% equity stake, not cash. Clearview's reported 2024 valuation was around $225M, so the headline "$51.75M" is a notional figure that only converts to cash if Clearview is acquired or goes public. Effective cash legal spend is more like $5M–$10M/year on ongoing defense counsel.

Legal total, realistic: $3M–$6M/year for PimEyes tier (less aggressive regulatory pursuit). $10M–$20M/year for Clearview tier (active in 6+ jurisdictions simultaneously).

6. The numbers that don't fit a category

  • Payment-processing risk. Both Stripe and Adyen have at various points refused or paused service for face-search engines on AUP grounds. PimEyes has been quietly delisted from major processors twice (2023, 2025). Backup processors charge 4–6% vs Stripe's standard 2.9% + 30¢. On $30M of annual consumer subscription revenue that's an extra $900K–$1.8M/year.
  • Trust-and-safety ops. Required by GDPR and most consumer-protection regimes — someone has to process opt-out requests, ID verification, takedown notices. 5–15 full-time people. $300K–$2M/year.
  • Customer ops for government contracts. Clearview specifically has dedicated account managers for police-department contracts. Estimated 5–10 FTE. $700K–$1.5M/year.
  • Compliance + privacy counsel. Distinct from defense counsel — these are the people who write the privacy policy, run the data-subject-access-request workflow, and respond to regulators when something less than a lawsuit lands. $500K–$1.5M/year.

The total, and why the math works

Add it up:

PimEyes tier (consumer face search): roughly $12M–$25M/year total run cost.
Clearview tier (gov / enterprise): roughly $30M–$60M/year total run cost.

Now look at the revenue side. PimEyes is private and doesn't disclose, but a defensible estimate based on their pricing ($14.99–$299.99/mo, ~50K–100K paid subscribers per 404 Media's industry reporting) is annual revenue in the $30M–$80M range. Operating cost $12M–$25M against revenue $30M–$80M is a healthy margin business.

Clearview's reported numbers are smaller but premium-priced. They charge police departments anywhere from $25K–$100K+ per seat-year, with reported customer counts in the 3,000–10,000 range across 28+ countries. Even at the low end (3,000 customers × $25K = $75M ARR), they cover the run cost. They have raised approximately $100M in venture funding through 2023 to absorb the legal bills while the unit economics catch up.

The point isn't to defend any of this. The point is to be honest about why "the fines will stop them" hasn't worked: the fines are roughly 10-30% of operating cost, and operating cost is roughly 30-50% of revenue. The math, today, is in the engines' favor.

What actually moves the needle

Two things are starting to bite that the dollar-amount fines don't capture:

  • Payment-processor de-platforming. The 2025 PimEyes payment-processor freeze caused 4 weeks of revenue disruption and forced a switch to a higher-fee processor. Several US state AGs are exploring direct pressure on Stripe/Adyen to refuse face-search merchants.
  • Mandatory consumer opt-out enforcement. CNIL's repeated €20M+€5.2M Clearview orders weren't about extracting cash — they were about establishing in EU case law that ignoring an opt-out request is a continuing violation that compounds. That precedent is now being cited in pending cases against the smaller engines.

And the third thing — the unsexy one — is just making the opt-out workflow easy enough that enough people actually do it that the index quality degrades. An engine that has stopped indexing 20% of its addressable faces is meaningfully less useful to the police department or HR vendor that pays for it. That's the part we work on.

The cost of removing your face is roughly $0.0001 per query they can't run on you.

Face Privacy submits opt-out requests to every facial-recognition engine that has a removal path, monthly, including the ones with no consumer-facing UI. Each opt-out we get accepted removes you from a few million-to-billion possible face-match queries the engine would otherwise have served. The economics of removing your face are radically better than the economics of the engines that are indexing it.

Start your removals →