← Blog
· 9 min read

How Face Search Engines Actually Get Your Photos

You never uploaded yourself to PimEyes. So how does it have you? Face-search engines build their index by scraping the open web at scale and converting every face they find into a searchable mathematical fingerprint. Once you understand the pipeline, two things become obvious: why deleting one post doesn't work, and why removal has to be a standing process.

A face-search engine is not searching the live internet when someone uploads a photo of you. It's searching its own pre-built database — a private copy of faces it harvested ahead of time and indexed for instant lookup. The whole business depends on having already collected you before anyone searches. Here's how that collection actually happens, stage by stage.

Stage 1 — Crawling: where the images come from

Engines run web crawlers that pull images from the parts of the internet that are publicly reachable. The richest sources, roughly in order of value to them:

  • Public social media. Open profiles, public posts, and anything not locked down. Profile photos are gold — frontal, well-lit, labeled with a name.
  • Other people's posts of you. Tagged photos, group shots, event galleries. You don't control these, which is why a private account doesn't make you safe.
  • Professional and institutional pages. Company "team" pages, conference speaker lists, university directories, bar-association and licensing profiles, news articles. Your LinkedIn headshot is among the most-scraped photos of you.
  • News, blogs, and forums. Any site that published a photo with your face — a local paper, a race-results page, a hobby forum avatar.
  • Aggregated and resold image sets. Some engines bootstrap from large existing image datasets rather than crawling everything from scratch.

Crucially, the crawler doesn't need your permission and doesn't ask. If an image is reachable by a public URL, it's collectible.

Stage 2 — Face detection: finding the faces in the images

A crawled image is just pixels. The engine runs a face-detection model over every image to locate the faces in it — drawing a box around each one and cropping it out. A single group photo can yield a dozen separate face crops, each treated as its own candidate. This is the step that turns "a photo that happens to contain you" into "a record about your face specifically."

Stage 3 — Embedding: turning your face into a faceprint

This is the heart of it. Each cropped face is fed through a neural network that outputs a faceprint — a vector of numbers (a "face embedding") that encodes the geometry of your face. Two photos of the same person produce two faceprints that sit very close together in that mathematical space; two different people sit far apart.

Why this is the part that matters: the faceprint, not the photo, is what makes you searchable. It's compact, it's permanent (your facial geometry doesn't change much), and it lets the engine match you against a new photo it has never seen — including one you never posted. Once your faceprint exists in their database, deleting the original image barely matters; the searchable thing already lives in their system.

Stage 4 — Indexing: making billions of faceprints instantly searchable

Storing faceprints isn't enough; the engine has to find matches in milliseconds across potentially billions of records. It builds a specialized index (a vector / nearest-neighbor index) so that when someone uploads a query photo, the engine can compute that photo's faceprint and near-instantly return the stored faces closest to it — along with the source URLs where those faces were found.

Stage 5 — The lookup: a photo becomes your identity

When someone runs a search, the engine: (1) detects the face in their uploaded photo, (2) embeds it into a faceprint, (3) finds the nearest stored faceprints, and (4) returns the matches plus the web pages they came from. Those source pages are what carry your name, your accounts, and your context. The face is the key; the source URLs are the payoff. This is the de-anonymization chain in action.

Why "just delete the photo" is whack-a-mole

Now the pipeline makes the futility obvious:

  • They already have a copy and a faceprint. Deleting your source post doesn't reach into their database. The faceprint they extracted persists independently — see why deleted photos still show up.
  • You don't control most of the inputs. Even a perfectly scrubbed personal presence leaves the tagged photos, the event galleries, and the institutional pages other people posted.
  • They recrawl continuously. The web keeps producing new images of you, and the crawlers keep coming back. A successful removal today can be undone by a freshly scraped photo next month.

That last point is the one people miss. Because collection is continuous, removal can't be a one-time act — it has to be a standing process that re-checks and re-files whenever you reappear.

What removal actually targets

Given the pipeline, getting "out" means getting your faceprint deleted from each engine's database — not just hiding a photo. Each engine has an opt-out / erasure process (usually requiring a reference photo so they can scope the deletion to your faces). We identify which engines hold you, file those requests, fight the rejections, and keep monitoring so the next recrawl doesn't quietly put you back.

Scope, stated plainly: this removes your faceprint from the facial-recognition engines that can identify you from a photo. It's a different thing from getting a page out of Google's index, or making a website delete an image — those are deindexing and content-removal problems with their own tools. The faceprint removal is the one that addresses the "any photo → your identity" capability at its source.

The searchable thing isn't your photo — it's the faceprint they built from it.

We find the engines that hold a faceprint of you, file the erasure requests, handle rejections, and keep monitoring as the crawlers come back around. That's what the pipeline requires.

Start your removals →