DOCUMENTATION

Data Guide

Understand where Stratensight data comes from, how scores are computed, and what the system can and cannot do.

Supported data sources

Stratensight automatically detects your export format and maps columns.

Derwent Innovation

✅ Full supportHigh confidence

All 4 scores available with strong citation data

PatSnap

✅ Full supportHigh confidence

All 4 scores available

Questel Orbit Intelligence

✅ Full supportHigh confidence

Full 4 scores with FAMPAT family dedup. Covers 100+ countries worldwide.

PatentSight

✅ Full supportHigh confidence

Professional export with all 4 scores and family dedup

Espacenet / EPO

✅ Full supportHigh confidence

Official EPO database, recommended starting point

Google Patents

✅ Full supportHigh confidence

Free access, excellent for broad technology coverage

TotalPatent One

✅ Full supportHigh confidence

Standardized Assignee + Family ID + Application Date

Generic CSV

⚡ Basic supportVariable confidence

Any CSV with patent data — auto-detection maps columns

Required fields

FIELD NAME	DESCRIPTION
patent_id / Publication Number	Unique identifier for each patent
title	Patent title (used for clustering)
abstract	Abstract (for AI concept extraction)
filing_date	Filing date (for Momentum Index™ calculation)
assignee / Current Assignee	Patent owner (for Openness Score™ competitive analysis)
cpc_codes / CPC Classifications (patent technology categories)	Technology classification (for Lifecycle Position™)

Auto-detection: Stratensight automatically detects your export format and maps columns. No manual configuration required.

Optional fields — improve score accuracy

Including these fields improves Intelligence Grade™ accuracy.

forward_citationsImproves Momentum Index™Stronger signal

family_idEnables patent family deduplicationReduces noise

priority_dateImproves Lifecycle Position™ calculationMore precise staging

inventorsEnables inventor network analysisNetwork signals

ipc_codesSecondary classification supportBroader coverage

Self-audit — Signal Integrity™

Every analysis audits itself before it is presented. The Critical Reader™ layer surfaces mathematical inconsistencies, data-quality artifacts and scoring contradictions on every plan, with no gating.

9 DETERMINISTIC RULES

CE1 — Momentum vs YoY divergence
CE2 — Verdict vs AND-logic criteria
CE3 — White Space saturation impossibility
CE4 — CPC diversity contradiction
CE5 — Temporal span vs phase consistency
S1 — Source coverage anomaly
M_CAGR_LAST_YEAR_ARTIFACT — CAGR distortion from last-year filing lag
L_ACADEMIC_DOMINANCE — Lifecycle bias from academic fallback
D_ABSTRACT_FILL_CRITICAL — Abstract fill rate below clustering threshold

severity = critical

Mathematical or pipeline inconsistency. Verdict shown but should be reviewed before action.

severity = warning

Data-quality concern that may bias the signal.

severity = info

Legitimate downgrade by a Layer B guard. Surfaces the constraint, never blocks.

Why this layer exists: a verdict you cannot audit is a verdict you cannot trust. Stratensight shows the audit before the verdict, not after — on every analysis, every plan, with no gating. Read the full mechanics in the methodology page.

Layer C — Tier gate (Phase 5.3): adds a final coherence check on top of Layer B. The verdict shown to users is subordinated to evidence_certainty. With HIGH certainty, the raw verdict is preserved. With MODERATE or LOW certainty, the verdict is mapped to a directional signal (e.g. OPPORTUNITY_SIGNAL instead of INVEST). With VERY_LOW certainty, the verdict is fully withheld and replaced by INSUFFICIENT_DATA. See the methodology page for the complete Tier gate logic.

File format

CSV

Recommended

UTF-8 encoding
Comma or semicolon separator
First row = headers

Excel (.xlsx)

Supported

Single sheet
Headers in row 1
Up to 2 GB depending on your plan

JSON

Supported

Array of objects
Camel or snake_case keys
UTF-8 encoding

Champs critiques pour la qualité du signal

Un champ manquant n’empêche pas l’analyse. Stratensight affiche un avertissement contextuel et indique précisément quel score est affecté.

CRITIQUE · SCORE DÉGRADÉ SI ABSENT

filing_date→ Momentum Index™

assignee→ Openness Score™

title / abstract→ Clustering + relevance

IMPORTANT · GRADE RÉDUIT SI ABSENT

publication_datecpc_codesjurisdiction

OPTIONNEL · ENRICHISSEMENT SEULEMENT

citationsfamily_idlegal_status

Un champ manquant n’empêche pas l’analyse

Stratensight affiche un avertissement contextuel et indique précisément quel score est affecté.

How to prepare your export

Export your search results as CSV or Excel from your patent search tool. Include as many fields as available. No configuration required.

1Run your search in your patent database

2Export results as CSV or Excel

3Upload the file to Stratensight

4Source Detection Engine maps and normalizes columns automatically

5Analysis starts — results in 2–5 minutes

No dataset? Use the Query Engine

Type a technology keyword and Stratensight retrieves patent data automatically from open sources.

Try the Query Engine →

Recommended volume

The number of patents in your dataset directly affects score reliability.

VOLUME	RESULT
< 50 patents	Analysis impossible
50 – 200	Directional scores — Intelligence Grade™ reduced
200 – 500	Reliable analysis for niche technologies
500 – 3,000	Optimal zone — all scores fully calibrated
> 10,000	Comprehensive analysis — longer processing time

Recommended time window

The filing date range in your export affects Momentum Index™ accuracy. Below 6 years, the score may underestimate actual innovation velocity.

Recent technology (< 5 years of activity)6–8 years back

Growing technology10–12 years back

Mature technology12–15 years back

Best practice: Use the filing date (Application Date), not the publication date. Publication dates lag by 12–18 months on average, which compresses the innovation curve and underestimates Momentum Index™.

How your data quality affects Intelligence Grade™

Intelligence Grade™ gates the confidence of all other scores. Below 45%, scores are flagged LOW CONFIDENCE.

DATA QUALITY	INTELLIGENCE GRADE™	IMPACT
Questel Orbit / PatentSight full export	85–99%	Full analysis: all 4 scores and Decision Engine™
Partial fields (no citations)	65–80%	Core scores only
Generic CSV	50–70%	Basic analysis
< 100 patents	40–60%	Low confidence, directional only

QUERY ENGINE — TIME WINDOW

Time Window Selection

When using /explore (Query Engine), Stratensight automatically selects the optimal patent time window based on the technology lifecycle stage detected by AI. You can always override it manually.

AUTO MODE (default)

Stratensight AI estimates the lifecycle stage and suggests the window automatically. The active button is highlighted with an AI badge.

More mature = shorter window (old patents = noise, not signal)

MANUAL OVERRIDE

Click any pill button (1y · 3y · 5y · 10y · all) to override the AI suggestion. The display shows the selected year range and the source (AI suggested / user selected).

Format: YYYY – YYYY (X years — AI suggested / user selected)

LIFECYCLE STAGE	SUGGESTED WINDOW	RATIONALE
research	all	Very few patents — use all available history
emerging	10y	Growing signal — wide window to capture trajectory
acceleration	7y	Fast growth — focus on recent surge
growth	5y	Active competition — recent filings most relevant
mature	3y	Old patents = noise — only recent matters

Regional Coverage

Stratensight retrieves patent data from EPO OPS and Google Patents. Coverage varies by region and filing route.

International filings (EP, US, PCT-visible)Well covered

Chinese international filingsPartially visible

Chinese domestic-only activity (CNIPA-only)Not directly covered

Japanese and Korean filingsWell covered via PCT + EPO

Interpretation note: Directional signal remains useful across all sectors. Absolute volume may underestimate China-heavy sectors where domestic-only filings represent a significant share of activity. For maximum China coverage, upload your own export from Derwent, PatSnap, or Questel with CNIPA data included.

The quality of the decision depends on the quality of the dataset.

Open data provides directional signals. Uploaded datasets provide higher-confidence analysis.

When to be cautious with your analysis

Not all analyses carry the same weight. These conditions reduce signal certainty.

Dataset < 200 patentsConservative signals — scores are directional only

Temporal window < 7 yearsCAGR unreliable — insufficient baseline for trend measurement

Mixed CPC classesPossible off-topic patents — clusters may not be coherent

Open data source (EPO, Google Patents)Directional signal only — upload a professional dataset for higher confidence

Momentum N/AVerdict is conservative — filing velocity cannot be measured

What this analysis does NOT capture

Stratensight analyses patent filing patterns only. The following dimensions are outside the analytical scope.

Market sizeNo revenue, sales, or TAM estimation

Revenue & profitabilityPatent signals do not correlate with financial performance

Technology adoptionFiling activity reflects R&D intent, not market penetration

Cost structureNo manufacturing, licensing, or deployment cost modeling

Regulation & policyRegulatory approvals, trade barriers, and subsidies are not captured

GUIDE 1

Which source for which objective

Your objective determines the right data source. Quick exploration and strategic decisions require different levels of data quality and coverage.

OBJECTIVE	RECOMMENDED SOURCE	INTELLIGENCE GRADE™
Quick signal on a technology	Explorer Open source query	50–70%
Reliable strategic decision	Upload Premium dataset (Derwent, PatSnap, Questel)	85–99%
Global US / EP / PCT landscape	Explorer Sufficient coverage via EPO OPS	60–75%
Asia / emerging market analysis	Upload Required — open sources miss domestic CN/IN filings	80–95%
Reproducible, auditable analysis	Upload Dated export for full traceability	85–99%
Technology monitoring / watch	Explorer Real-time open data, repeat periodically	55–70%
Competitive benchmarking	Upload Full assignee data with normalized names	80–95%
Due diligence / M&A context	Upload Tier 1 source with citation + family data	90–99%

Rule of thumb: Explorer is ideal for fast directional signals on Western markets. For any decision with material consequences, upload a professional dataset to reach Intelligence Grade™ above 80%.

GUIDE 2

Understanding CPC classification

The Cooperative Patent Classification (CPC) system is a hierarchical taxonomy of 250,000+ technology codes used by the EPO and USPTO to classify every patent.

CPC sections (A–H)

A Human Necessities

Agriculture, food, health

B Operations & Transport

Separating, shaping, vehicles

C Chemistry & Metallurgy

Organic chemistry, alloys

D Textiles & Paper

Weaving, papermaking

E Fixed Constructions

Building, mining

F Mechanical Engineering

Engines, pumps, weapons

G Physics

Optics, computing, control

H Electricity

Electronics, semiconductors

How to read a CPC code

Each level adds specificity. A broader code captures more patents; a narrower code isolates a precise technology.

HSectionH = ElectricityBroadest level — 8 sections total

H01ClassH01 = Basic electric elements~130 classes

H01LSubclassH01L = Semiconductor devices~640 subclasses

H01L 29/00Main groupH01L 29 = Semiconductor device structuresThousands of groups

H01L 29/66SubgroupH01L 29/66 = FET-specific structures250,000+ leaf codes

CPC examples by technology domain

DOMAIN	KEY CPC CODES	DESCRIPTION
AI / Machine Learning	G06N	Computing arrangements based on specific computational models
CRISPR / Gene Editing	C12N 15/11	DNA or RNA fragments; modified forms thereof
Solid-State Batteries	H01M 10/0562	Solid electrolytes for secondary cells
Autonomous Vehicles	G05D 1/02	Control of position or course in two dimensions
Quantum Computing	G06N 10/00	Quantum computing; quantum information processing
mRNA Therapeutics	A61K 48/00	Medicinal preparations containing genetic material
Carbon Capture	B01D 53/62	Carbon dioxide removal from gas mixtures
5G / 6G Networks	H04W 72/04	Wireless resource management in multi-carrier systems

Why Stratensight uses CPC

CPC codes are language-independent, hierarchical, and examiner-assigned. They eliminate keyword ambiguity and provide consistent technology mapping across all patent offices. In Explorer, check the “View query” section of Source Coverage to see exactly which CPC codes were used.

GUIDE 3

Preparing your dataset

Follow these recommendations for the best possible Intelligence Grade™. The more complete your export, the higher the analytical confidence.

Required columns

These fields are mandatory for Stratensight to generate a valid analysis.

COLUMN	PURPOSE	SCORE IMPACT
Title	Patent title text. Used for AI clustering and topic extraction.	Clustering quality
Abstract	Full abstract text. Enables semantic analysis and concept mapping.	+20–30% cluster accuracy
CPC Codes	Cooperative Patent Classification codes. Technology taxonomy backbone.	Lifecycle Position™

Optional columns — improve score accuracy

Including these fields significantly improves Intelligence Grade™ and unlocks advanced analytics.

COLUMN	PURPOSE	BOOST
Filing Date	Application date. Core input for temporal analysis and trend calculation.	Momentum Index™ precision
Assignee	Patent owner / applicant. Required for competitive landscape analysis.	Openness Score™
Priority Date	Earliest priority filing date. Improves lifecycle staging accuracy.	Lifecycle Position™
Forward Citations	Number of times cited by later patents. Strengthens momentum signal.	Stronger Momentum signal
Family ID	Patent family identifier. Enables deduplication across jurisdictions.	Reduces noise
Inventors	Inventor names. Enables inventor network analysis.	Network signals
IPC Codes	International Patent Classification. Secondary classification fallback.	Broader coverage

Compatible sources

Stratensight auto-detects export format from these platforms. No manual column mapping required.

SOURCE	TYPICAL GRADE	NOTES
Derwent Innovation	90–99%	Full fields including citations and family data
PatSnap	88–97%	Complete export with assignee normalization
Questel Orbit	90–99%	FAMPAT family dedup, 100+ countries
PatentSight	88–96%	Professional export, family dedup included
TotalPatent One	85–95%	Standardized assignee + Family ID
Espacenet	60–75%	Free, good starting point, limited citation data
Google Patents	55–70%	Free, broad coverage, variable assignee quality

How to improve Intelligence Grade™

✓Include Abstract field (improves clustering quality by 20–30%)

✓Include CPC codes (enables accurate Lifecycle Position™)

✓Include Filing Date not just Publication Date (Momentum precision)

✓Include Assignee/Applicant (required for Openness Score™)

✓Include forward citations if available (boosts Momentum signal)

✓Include Family ID if available (enables deduplication, reduces noise)

OPTIMAL SIZE

200–3,000

patents for best score calibration

RECOMMENDED WINDOW

8–12 years

of filing history for reliable Momentum

GUIDE 4

Interpreting scores

Each score measures a distinct dimension of the technology landscape. Understanding what HIGH, MEDIUM, and LOW mean for each is essential for correct interpretation.

Momentum Index™

Measures the velocity and acceleration of patent filing activity over time. Derived from CAGR, year-over-year trends, and citation weighting.

HIGH 65–100

Strong, sustained filing growth. Technology attracting increasing R&D investment from multiple actors.e.g. Quantum Computing ~93

MEDIUM 35–64

Moderate activity. Filing rate is stable or showing early growth signals. Worth monitoring.e.g. Solid-State Batteries ~58

LOW 0–34

Declining or stagnant filing activity. Technology may be mature, niche, or losing momentum.e.g. Legacy lithography ~12

Lifecycle Position™

Identifies the maturity phase of the technology based on filing patterns, growth trajectory, and actor concentration. Determines strategic timing for market entry.

ResearchEarly academic exploration. Few filings, high diversity. Entry cost is low.

EmergingFirst commercial interest. Growing volume, early consolidation. Prime window for first-movers.

AccelerationRapid expansion. New entrants flooding in. High competition, high opportunity.

GrowthEstablished ecosystem. Clear leaders, stable dynamics. Best for strategic positioning.

MatureSaturated domain. Declining novel filings. Incremental innovation only.

Openness Score™

Measures how concentrated or fragmented the competitive landscape is. Based on the Herfindahl-Hirschman Index (HHI) of patent assignees, transformed to a 0–100 scale where higher = more open.

OPEN 80–100Highly fragmented market. Hundreds of actors, no dominant player. Low barriers to entry.

CONTESTED 55–79Active competition. Multiple significant players but no single dominant force.

CONCENTRATED 30–54Few dominant actors control most filings. Market entry requires differentiation or licensing.

DOMINATED 0–29One or two players hold the vast majority. High barriers, high IP risk for new entrants.

Intelligence Grade™

Meta-score that evaluates the quality and completeness of the underlying dataset. Gates the confidence of all other scores. Below 45%, all scores are flagged LOW CONFIDENCE.

HIGH 70–100%All fields populated, sufficient volume, good temporal coverage. All scores reliable. Decision Engine™ verdict carries full confidence.

MEDIUM 45–69%Some fields missing or limited temporal depth. Core scores are valid but edge cases may be imprecise. Review outlier scores manually.

LOW 0–44%Significant data gaps. Scores are indicative only. Do not use for strategic decisions without supplementary data.

GUIDE 5

Limits to know

Transparency is a core principle. Understand these limitations before making decisions.

SOURCE	LIMITATION	IMPACT ON ANALYSIS	SEVERITY
EPO OPS	2,000 patent cap per query	Large domains may be undersampled. Momentum and Openness affected.	Medium
EPO OPS	3–6 month indexing delay	Very recent filings missing. Short-term momentum may be understated.	Low
EPO OPS	18-month indexation delay (full)	CAGR on short windows may appear negative while the market is actually growing. A warning flag is displayed.	Medium
Stratensight	Technology maturity priors	For well-known technologies (e.g. Wind Energy, Li-Ion), lifecycle may be adjusted to the industry consensus. A transparency flag is always displayed when a prior is applied.	Low
Stratensight	Output Intelligence flags	Signals cases where results require caution (CAGR indexation, lifecycle adjusted, short window). Flags are non-blocking and shown as an additional intelligence panel.	Low
Stratensight	Signal Summary (plain language)	Deterministic 3-line summary generated from verdict × lifecycle × momentum. No AI involved. Never replaces full score analysis — shown as novice-layer guidance only.	Low
Google Patents	Assignee auto-normalization varies	Corporate group precision varies. Openness Score™ may be imprecise.	Medium
Google Patents	No citation data in export	Citation-weighted Momentum unavailable. Score relies on volume only.	Medium
Espacenet	Limited bulk export capabilities	Manual export caps at 500 results. Insufficient for broad domains.	Medium
All open sources	No domestic CN/IN/TR filings	Asia and emerging markets underrepresented. Upload required for coverage.	High
OpenAlex (fallback)	Academic publications, not patents	Scores reflect research activity, not commercial IP strategy.	High
Any source	< 50 patents in dataset	Statistical reliability insufficient. All scores flagged LOW CONFIDENCE.	Critical

When to trust the verdict

Intelligence Grade™ ≥ 70% (HIGH) + uploaded dataset from a Tier 1 source (Derwent, PatSnap, Questel). All four scores are reliable and the Decision Engine™ verdict carries full analytical weight.

When results are directional only

Intelligence Grade™ between 45% and 69%. Explorer-based analysis. Dataset under 200 patents. Scores indicate direction but not magnitude. Use as a starting point, not a final answer.

When to upload your own data

When analyzing Asia/emerging markets. When reproducibility matters. When Intelligence Grade™ on Explorer is below 65%. When the EPO cap warning appears. For any strategic decision with material consequences.

Warning signals in a report

Academic source fallback banner (orange). EPO cap warning. Intelligence Grade™ below 45%. Coverage gap alert showing missing actors. Any of these should prompt verification with a premium data source before making decisions.

GET STARTED

Ready to analyze your data?

Upload your patent export and get your first intelligence report in minutes.

Start your first analysis →

Building a Reliable Dataset

Intelligence Grade™ Layer detects 12 analytical tensions. Follow this checklist to maximize your analysis quality.

Pre-analysis checklist

☐Volume minimum: 100+ patents recommended

☐Time coverage: 5+ years recommended

☐Filing dates: less than 20% missing

☐CPC scope: neither too broad nor too narrow

☐Source: indicate EPO / Derwent / PatSnap / other

☐Regional bias: check CN proportion if relevant

☐Duplicates: enable family deduplication

Very small dataset (fewer than 30 patents)

Why it matters: Scores are statistically unreliable. A single outlier can shift the entire analysis.

How to fix: Broaden your search query, add broader CPC classes, or upload a larger export from a Tier 1 source.

Limited dataset (30-99 patents)

Why it matters: Directional signal is valid but statistical confidence improves significantly above 100 patents.

How to fix: Add broader keywords to your search. Consider combining multiple CPC codes.

Dataset covers less than 3 years

Why it matters: CAGR and Lifecycle signals require temporal depth. Short windows produce unreliable trend signals.

How to fix: Filter your export to include filings from at least 5 years ago. The Query Engine suggests appropriate time windows automatically.

No historical baseline (all recent filings)

Why it matters: Without historical comparison, momentum direction is speculative. The system cannot distinguish acceleration from emergence.

How to fix: Extend your date range to include pre-2020 filings. Historical context is essential for lifecycle accuracy.

Over 20% of patents lack filing dates

Why it matters: Missing dates create blind spots in temporal analysis. Lifecycle and Momentum scores lose precision.

How to fix: Re-export your dataset with complete filing date coverage. Most premium sources include dates by default.

High CN activity with open-source data

Why it matters: EPO and Google Patents capture international PCT filings but may miss domestic CNIPA-only activity.

How to fix: For comprehensive CN coverage, use Derwent Innovation or PatSnap and upload the export directly.

Growth lifecycle with Momentum below 20

Why it matters: This combination is analytically unusual. Common causes: limited recent filing coverage or overly broad query scope.

How to fix: Verify your query captures recent filings (2020-present) and refine your CPC scope.

High CAGR on fewer than 50 patents

Why it matters: Small samples amplify statistical noise. One abnormal filing period can distort the growth rate.

How to fix: Increase dataset size above 100 patents before trusting CAGR direction.

Open market score but concentrated key players

Why it matters: Accessibility and competitive equality are different. Low barriers coexist with established dominance.

How to fix: Analyze individual clusters separately to identify genuinely open sub-domains.

Low Intelligence Grade with INVEST or AVOID verdict

Why it matters: A strong verdict on weak data is risky. The direction may be correct but the conviction is premature.

How to fix: Improve dataset completeness: add missing dates, increase volume, use a richer source.

Mature technology with strong momentum

Why it matters: Unusual but analytically interesting. May signal a second innovation cycle, regulatory trigger, or disruptive variant.

How to fix: Investigate sub-domains and recent cluster formation to identify the source of renewed activity.

One actor holds over 75% of patents

Why it matters: The analysis reflects one player's IP strategy, not the broader technology ecosystem.

How to fix: Broaden your query scope or exclude the dominant assignee to reveal ecosystem dynamics.

What Stratensight does NOT do

Transparency is a core principle. Here is what Stratensight explicitly does not claim to do.

No prediction

Stratensight does not predict the future. Scores reflect current and historical patent signals, not market forecasts.

No guesswork

Every score is deterministic and computed from explicit formulas. There is no hidden model, no opaque weighting, no proprietary black box.

No hallucination

AI is used for text interpretation only (Claude Haiku). Scores are never generated by AI. If a score cannot be computed, it is absent — never fabricated.

No investment advice

Verdicts (INVEST, MONITOR, EXPLORE, AVOID) are analytical signals based on patent data. They are not financial recommendations.

No complete market coverage

Patent data reflects internationally visible filing activity. Domestic-only filings (e.g. CNIPA) may be underrepresented in open sources.

Understanding the limits of your signal

Stratensight produces directional signals, not absolute truth. Here's what to keep in mind:

Coverage

EPO and Google Patents capture internationally visible filings. Domestic-only activity in China, Japan, or Korea may be underrepresented.

Scope dependency

Your scores reflect the patents in your dataset. A narrow query produces a narrow signal.

Momentum reliability

Requires at least 3–5 years of filing history. Short datasets produce N/A — not zero.

Intelligence Grade™

Every analysis automatically flags these limitations. Watch for LIMITED or FRAGILE badges.

Confidence is a measure of analytical reliability, not market probability

Intelligence Grade™ reflects the reliability of the analysis based on dataset quality, coverage sufficiency, temporal depth, and signal consistency. It is not a probability of commercial success, market prediction, or AI certainty.

Chinese domestic patent activity may be underrepresented in open sources. The directional signal may remain useful, but coverage is not complete.

Stratensight provides patent intelligence signals, not legal opinions or freedom-to-operate assessments. Not a substitute for IP counsel.