About GleanMark

How GleanMark's AI Trademark Search Works: Phonetic, Visual, and Semantic Matching

By GleanMark Team
March 9, 2026
5 min read

AI trademark search is fundamentally different from the keyword-based tools that trademark professionals have relied on for decades. Rather than requiring users to imagine every phonetic variant, construct dozens of manual queries, and sort through unranked results, modern AI search applies multiple similarity algorithms simultaneously -- phonetic, visual, and semantic -- across a database of nearly 14 million USPTO records.

The DuPont factors formalize likelihood of confusion analysis into thirteen considerations, but at its core, the question is deceptively simple: could a reasonable consumer mistake one mark for another? Answering that question at scale requires the kind of multi-layered similarity analysis that, until recently, only a trained human could perform.

This article explains how AI-powered trademark search works under the hood: the specific algorithms that power phonetic matching, visual similarity detection, and goods-and-services overlap scoring, and why these techniques produce fundamentally different results than traditional keyword search.

Why Keyword Search Fails for Trademarks

Before examining what modern trademark search does, it is worth understanding why the most intuitive approach -- typing a word and looking for exact matches -- misses the majority of real conflicts.

Trademark law does not require marks to be identical to create a likelihood of confusion. Under the likelihood of confusion standard, marks that are similar in sound, appearance, or meaning can conflict if they cover related goods or services. A search for "SPARKLE" that only returns exact matches will miss "SPRKLE," "SPARKEL," "SPARKL," and dozens of other variations that an examining attorney would flag without hesitation.

The traditional approach to this problem -- the one practitioners used for decades with TESS and now use with the USPTO's replacement search tool -- is manual query construction. You think of every possible phonetic variant, every plausible misspelling, every truncation and substitution, and you run a separate search for each one. An experienced attorney searching "FOTOGRAF" might construct queries for "PHOTOGRAPH," "PHOTOGRAF," "FOTOGRAPH," "PHOTOGRAFF," and a dozen other permutations. Miss one, and you miss the conflict.

This approach has three fundamental problems:

  1. It depends on human imagination. You can only find variants you think to search for. The marks you do not imagine are the ones that become Section 2(d) refusals.
  2. It does not scale. A multi-word mark like "BRIGHT HARVEST KITCHEN" requires permuting variants across each word, multiplying the query count exponentially.
  3. It produces unranked results. Even when you find similar marks, you have no systematic way to determine which ones pose the greatest risk. Every result looks equally relevant -- or equally irrelevant.

Modern AI trademark search eliminates all three of these problems by applying multiple similarity algorithms simultaneously and combining their results into a ranked, scored output.

Layer 1: Phonetic Matching with the Metaphone Algorithm

The first and most critical layer of AI trademark search is phonetic matching -- the ability to find marks that sound alike regardless of how they are spelled.

The search engine uses the Metaphone algorithm, a phonetic encoding system that converts words into standardized sound codes. Metaphone analyzes the consonant and vowel patterns in a word and produces a code that represents how the word sounds when spoken aloud. Words that sound similar produce identical or nearly identical codes, even when their spellings differ dramatically.

Here is how this works in practice:

Written MarkMetaphone CodeWhat It Catches
FOTOGRAFFTRFMatches PHOTOGRAPH, PHOTOGRAF, FOTOGRAPH
KLEARKLRMatches CLEAR, CLEER, KLEER
NITENTMatches NIGHT, KNIGHT, NYTE
FONEFNMatches PHONE, PHON, FONNE
TUFFTFMatches TOUGH, TOUGHE, TUGH

The power of phonetic matching is that it automates what practitioners previously did by intuition. Instead of manually brainstorming every possible sound-alike spelling, the algorithm reduces each mark to its phonetic essence and compares those essences directly. A single search query for "FOTOGRAF" automatically surfaces "PHOTOGRAPH" and every other phonetically equivalent mark in the database -- without requiring the user to think of the variant.

This matters because phonetic similarity is one of the most heavily weighted factors in likelihood of confusion analysis. The USPTO's Trademark Manual of Examining Procedure (TMEP Section 1207.01(b)(iv)) explicitly states that marks may be found confusingly similar based on sound alone, even when the spelling and appearance differ significantly. An AI search that captures phonetic similarity is not a convenience feature. It is matching how the law actually evaluates trademark conflicts.

Layer 2: Trigram Similarity for Visual and Spelling Variants

Phonetic matching catches marks that sound alike. But trademark conflicts can also arise from visual similarity -- marks that look alike on paper or screen, even when they do not sound identical.

Trigram similarity addresses this by comparing marks at the character level. A trigram is a sequence of three consecutive characters. The word "SPARKLE" contains the trigrams: "SPA," "PAR," "ARK," "RKL," "KLE." To measure similarity between two marks, the algorithm counts how many trigrams they share relative to the total trigrams in both marks, producing a score between 0 (completely different) and 1 (identical).

This technique catches a category of conflicts that phonetic matching alone would miss:

Mark AMark BTrigram SimilarityWhy It Matters
SPARKLESPRKLE0.73Dropped vowel -- visually similar
AMAZONAMAZONE0.82Trailing letter -- easy to confuse in print
NETFLIXNETFLIXX0.85Double letter -- nearly identical appearance
BRIGHTBRITE0.50Phonetically identical, visually distinct
GOOGLEGOOOGLE0.77Extra character -- visually deceptive

Trigram similarity is particularly effective at catching the kind of typographical variations that trademark squatters exploit: minor additions, deletions, or substitutions of characters that make a mark look almost identical to an existing one while being technically "different" in exact-match search systems.

The combination of phonetic matching and trigram similarity creates a two-dimensional detection system. Marks that sound alike are caught by Metaphone. Marks that look alike are caught by trigram analysis. Marks that do both -- which represent the highest-risk conflicts -- score highly on both dimensions.

Layer 3: Word-Level Analysis and First-Word Weighting

Single-word marks are straightforward to compare. Multi-word marks introduce a layer of complexity that neither phonetic codes nor trigram scores fully address on their own.

Consider a search for "HARVEST MOON BAKERY." The relevant conflicts might include "HARVEST MOON KITCHEN," "MOON HARVEST CO," "HARVEST BAKERY," or even just "HARVEST MOON" by itself. A single similarity score comparing the full strings would underweight matches that share critical words but differ in less important ones.

The search addresses this by decomposing multi-word marks into their component words and scoring each component independently. The individual word scores are then combined with weighting that reflects how trademark law actually evaluates multi-word marks.

The most significant weighting factor is first-word emphasis. In trademark practice, the first word of a multi-word mark carries disproportionate weight in likelihood of confusion analysis. The reason is practical: consumers tend to recall and refer to brands by their first word. "HARVEST MOON BAKERY" becomes "Harvest Moon" or just "Harvest" in casual conversation. The TMEP and decades of TTAB case law support this principle.

The scoring reflects this reality:

QueryResult MarkFirst Word MatchOverall Score
HARVEST MOONHARVEST KITCHENYes (HARVEST)High
HARVEST MOONMOON HARVESTNo (reordered)Medium
HARVEST MOONAUTUMN MOONNo (different first word)Medium-Low
HARVEST MOONMOON LIGHTNoLower

This word-level decomposition also handles articles and common modifiers intelligently. Words like "THE," "A," and "AN" at the beginning of a mark are stripped before comparison, because trademark law generally disregards these in confusion analysis. "THE HARVEST" is evaluated the same as "HARVEST" -- which is exactly how an examining attorney would approach it.

Layer 4: Goods and Services Overlap Scoring

Two identical marks can coexist peacefully if they cover completely unrelated goods and services. "DELTA" is simultaneously a registered trademark for an airline, a faucet manufacturer, and a dental insurance company. The marks do not conflict because no reasonable consumer would confuse an airline ticket with a kitchen faucet.

This is why trademark search cannot stop at mark-to-mark similarity. The goods and services dimension is equally important, and it is the dimension that most free trademark search tools ignore entirely.

The search evaluates goods and services overlap in two ways:

Nice Classification overlap. The international Nice Classification system divides all goods and services into 45 classes. Marks in the same class face heightened scrutiny for confusion. Marks in related classes -- such as Class 25 (clothing) and Class 35 (retail store services for clothing) -- also receive elevated conflict scores. The search engine understands these class relationships and factors them into the ranking.

Description similarity. Beyond class numbers, the actual text of the goods and services description matters. Two marks might both be in Class 9 (electronics), but one covers "downloadable mobile applications for fitness tracking" while the other covers "computer hardware for industrial manufacturing." The description-level comparison catches this distinction and adjusts the conflict score accordingly.

The result is a ranking that reflects what trademark attorneys call the "relatedness of goods" factor -- one of the most important DuPont factors in likelihood of confusion analysis. A phonetically identical mark in an unrelated field drops in the rankings, while a moderately similar mark in an overlapping goods category rises.

How Multiple Scoring Layers Combine

Each of these four layers -- phonetic matching, trigram similarity, word-level analysis, and goods/services overlap -- produces an independent signal. The real power of AI trademark search is in how these signals are combined into a single relevance ranking.

Rather than relying on a single algorithm, the search uses multiple scoring functions optimized for different query types:

Query TypeOptimization StrategyExample
Short marks (1-3 characters)Fast pattern matching, exact and near-exact hits"IBM," "AT&T," "UPS"
Single-word marksFull phonetic + trigram + goods overlap"SPARKLE," "FOTOGRAF"
Multi-word marksWord decomposition, first-word weighting, article stripping"THE LION KING," "BRIGHT HARVEST"

This adaptive approach matters because the challenges of trademark search vary dramatically by query type. A three-character mark like "IBM" has thousands of potential trigram matches that would overwhelm a general-purpose algorithm. A multi-word mark like "THE LION KING" needs article stripping and word-level analysis that would add unnecessary overhead for a single-word query. By routing each query through the algorithm best suited to its structure, the search produces faster, more relevant results across all query types.

How Search Quality Is Measured

Building a search system is one thing. Knowing whether it actually works is another.

GleanMark measures search quality by benchmarking against actual USPTO Section 2(d) citation pairs -- a rigorous, data-driven approach that most search tools do not employ.

Here is what that means. When a USPTO examining attorney reviews a trademark application and finds an existing registration that creates a likelihood of confusion, they issue a Section 2(d) refusal citing the conflicting mark. These citation pairs -- the applied-for mark and the cited conflicting mark -- represent ground-truth determinations by trained government examiners that two marks are confusingly similar.

By testing against hundreds of these citation pairs, the system measures objective recall: what percentage of the time does the search engine surface the same conflict that the USPTO examiner identified? If an examiner cited "PHOTOGRAF" as confusingly similar to "PHOTOGRAPH," does a search for "PHOTOGRAPH" return "FOTOGRAF" in its results? If so, at what rank?

This methodology provides a level of empirical validation that is rare in trademark search tools. Rather than relying on anecdotal testing or subjective quality assessments, the system is measured against the decisions of the very examiners whose judgments determine whether a trademark application succeeds or fails.

How This Compares to the USPTO's Search Tool

The USPTO's current search system -- the cloud-based tool that replaced TESS in late 2023 -- is a significant improvement over its predecessor. It offers field-tag searches, regular expression support, Coordinated Class searching, and a more modern interface.

But the fundamental architecture of the government search is keyword-based. It finds what you tell it to find. The burden of constructing the right queries, imagining the right phonetic variants, and manually evaluating the relatedness of goods and services falls entirely on the user.

CapabilityUSPTO Search ToolAI-Powered Search
Database coverageNearly 14M federal recordsNearly 14M federal records
Phonetic matchingManual (user constructs variants)Automatic (Metaphone algorithm)
Visual similarityManual (user constructs patterns)Automatic (trigram analysis)
Multi-word handlingManual (user searches each word)Automatic (word decomposition + weighting)
Goods/services overlapFilter by class (no scoring)Scored by class + description similarity
Result rankingUnranked listRanked by composite similarity score
Query constructionComplex field tags and regexSingle natural-language query
Section 2(d) validationN/ABenchmarked against USPTO citation pairs

This is not a criticism of the USPTO's system. The government tool is designed as a record retrieval system -- it helps you look up trademarks. AI-powered search is designed as a conflict detection system -- it helps you find the marks most likely to create legal problems. These are different goals, and they call for different architectures.

For a more detailed comparison of the two systems, see the full TESS comparison guide.

What AI Search Cannot Do

Intellectual honesty requires acknowledging the boundaries of what algorithmic search can and cannot accomplish.

AI trademark search excels at detecting textual similarity: marks that sound alike, look alike, or share structural components. It excels at identifying goods and services overlap. It excels at ranking results by composite risk.

It does not replace legal judgment. The likelihood of confusion analysis involves factors that no algorithm fully captures: the strength of the cited mark, the sophistication of the relevant consumers, the conditions under which purchases are made, the fame of the senior mark. These require contextual reasoning that remains the domain of experienced trademark attorneys.

AI search also has inherent limitations with conceptual similarity. "DAKSHIN" (meaning "south" in Hindi) and "SOUTH SPICE" might be conceptually related, but no phonetic or trigram algorithm will detect that relationship. Conceptual equivalents, foreign-language translations, and marks with meaning-based connections require human analysis that textual algorithms cannot replicate.

The purpose of AI search is not to eliminate the attorney from the analysis. It is to eliminate the hours of manual query construction that precede the analysis, so attorneys can spend their time on the judgment calls that actually require legal expertise.

Putting It Into Practice

Understanding how the search works is useful. Using it effectively is what matters.

For practitioners conducting trademark clearance searches, the multi-layered approach means a single query does the work that previously required dozens. Instead of brainstorming phonetic variants of "FOTOGRAF" and running separate searches for each one, you enter the mark once. The phonetic, visual, and goods/services scoring layers run simultaneously, and the results come back ranked by composite similarity -- highest-risk conflicts first.

For founders and brand managers who may be searching trademarks for the first time, the same technology works without requiring any knowledge of search operators, field tags, or trademark classification systems. The AI handles the complexity; the user sees a clean, ranked list of potential conflicts with similarity scores that make the risk level immediately legible.

For attorneys building a comprehensive search strategy, AI search does not replace the full clearance process -- it accelerates the most time-consuming part of it. The hours previously spent constructing manual queries and sorting through unranked results can now be spent on the substantive legal analysis that clients actually value: evaluating the strength of potential conflicts, assessing the relatedness of goods, and advising on the strategic path forward.

The technology behind AI trademark search is not simple. Phonetic algorithms, trigram analysis, word-level decomposition, goods-and-services scoring, and adaptive query routing each solve a specific piece of the trademark similarity puzzle. But for the practitioner sitting at their desk with a new client's proposed mark, the experience is simple: type the mark, review the ranked results, and start the analysis where it matters -- at the top of the list.


Ready to see it in action? Try a free trademark search and see what the algorithms find.

Share this article

Put This Research Into Practice

Search 13.9M USPTO trademarks — no account required.

Cookie Preferences

We use cookies (including Google Analytics) to improve our site and understand how visitors use it.