SatIR icon

SATIR

Precision Patient–Trial Matching

Scalable High-Recall Constraint-Satisfaction-Based
Information Retrieval for Clinical Trials Matching

Stanford NLP Group
Stanford OVAL Lab
Mayo Clinic
Cyrus Zhou
Cyrus Zhou1
Yufei Jin
Yufei Jin2*
Yilin Xu
Yilin Xu3*
Yu-Chiang Wang
Yu-Chiang Wang4
Chieh-Ju Chao
Chieh-Ju Chao4
Monica S. Lam
Monica S. Lam1

1Stanford   2UCLA   3Emory   4Mayo Clinic

*Equal contribution

+72%
More eligible trials found per patient vs. TrialGPT
93.7%
Recall (vs. 62.4% for TrialGPT)
2.95s
Per-patient retrieval over 3,621 trials
59
Patients evaluated across 3,621 trials

SatIR teaser: converting free-text eligibility criteria into formal constraints
SatIR in action. A tofersen SOD1-ALS study showing how SatIR turns free-text eligibility criteria into formal, interpretable constraints executable in a database. Terms in blue are ontology-grounded. Clauses not safely translatable to recall-safe predicates are dropped, preserving recall.

Formal Methods Meet Clinical Trial Matching
Clinical trials are central to evidence-based medicine, yet many struggle to meet enrollment targets, despite the availability of over half a million trials on ClinicalTrials.gov. Existing retrieval techniques — largely based on keyword and embedding-similarity matching — often struggle with low recall, low precision, and limited interpretability due to complex eligibility constraints.

We propose SatIR, a scalable clinical trial retrieval method based on constraint satisfaction. Our approach uses formal methods — Satisfiability Modulo Theories (SMT) and relational algebra — to efficiently represent and match key constraints from clinical trials and patient records. We use Large Language Models (LLMs) to convert informal reasoning — regarding ambiguity, implicit clinical assumptions, and incomplete patient records — into explicit, precise, controllable, and interpretable formal constraints.

Evaluated on 59 patients and 3,621 trials, SatIR outperforms TrialGPT on all three retrieval objectives. It retrieves 32%–72% more relevant-and-eligible trials per patient, improves recall by 22–38 points, and serves more patients with at least one useful trial — in just 2.95 seconds per patient.
Constraint Satisfaction Clinical Trial Matching SMT Formal Methods High Recall Retrieval SNOMED Ontology LLM + Formal Methods

How SatIR Works
Trials are processed offline into formal constraint representations stored in a database. At query time, patient records are formalized and matched against the database.
SatIR overview
Figure 1. SatIR overview. Clinical trial listings and patient records are independently parsed into formal SMT representations. Trial constraints are stored offline; patient constraints are executed as SQL queries for fast, high-recall retrieval.

Ontology-Grounded Representation

An SMT-based representation grounded in SNOMED CT captures Boolean logic, numeric thresholds, temporal windows, and ontology-aware concept identity and entailment.

Accurate Semantic Parsing

LLM-based parsers translate complex clinical text — handling underspecified criteria, negation, temporality, and implicit assumptions — into executable formal constraints.

Salience-Based Missingness

SatIR introduces salience to resolve ambiguity from incomplete records — using LLMs to assess whether missing data is clinically significant enough to affect eligibility, with explicit auditable judgments.

Scalable Database Retrieval

SMT constraints are projected into relational algebra and stored in SQL, enabling fast recall-preserving retrieval across hundreds of thousands of trials in under 3 seconds.



Handling Missingness with Salience
Patient records are routinely incomplete. SatIR introduces salience — a principled, interpretable mechanism for deciding how missing information should affect eligibility.

The core assumption: any clinically salient fact about a patient will be documented in their medical record. If it is absent, that absence is itself informative.

SatIR uses targeted LLM queries to assess the salience of each missing concept — whether it is the kind of thing a physician would always record if present. These judgments are compiled into explicit, auditable policy decisions rather than left as implicit LLM reasoning.

Why this matters: end-to-end LLM matchers make implicit missingness judgments that are hard to inspect or audit. SatIR externalizes these as explicit salience assessments, making it possible to review, override, or standardize how incomplete records are handled — a critical property for clinical deployment.
Whole-fact salience
Whole-fact missingness. When a trial condition has no support in the record, salience determines whether absence is evidence (high salience) or the record is simply silent (low salience — remain inclusive).
Specificity salience
Specificity mismatch. A patient documented with appendicitis may match Acute Appendicitis (low salience), but not Ruptured Suppurative Appendicitis (high salience — would be explicitly recorded).

State-of-the-Art on All Objectives
SatIR outperforms all baselines — including TrialGPT — across three retrieval objectives on 59 patients and 3,621 trials.
Recall — Treat-Chief
SatIR93.66
TrialGPT62.44
BGE-large59.51
+31.2 pts vs. TrialGPT
Recall — Treat-Any
SatIR92.07
TrialGPT53.66
BGE-large51.22
+38.4 pts vs. TrialGPT
Recall — Relevant-to-Any
SatIR92.53
TrialGPT70.27
BGE-large68.27
+22.3 pts vs. TrialGPT
Mean Relevant-and-Eligible Trials Retrieved per Patient
ObjectiveRetrieved/Patient BMRetrieverClinBestBGE-large PubMedBERTTrialGPTSatIR
Treat-Chief361.441.752.072.002.173.25
Treat-Any642.002.542.852.682.985.12
Relevant-to-Any1216.397.248.688.928.9311.76
Patient-Level Outcomes vs. TrialGPT (59 patients)
ObjectiveSatIR winsTieTrialGPT wins Served by SatIRServed by TrialGPT
Treat-Chief203363935
Treat-Any292824239
Relevant-to-Any351775450

Interpretable, auditable, and permanently correctable. Because every eligibility decision traces back to explicit symbolic constraints — not opaque model weights — failures are diagnosable. When a parsing error or an ontology gap is identified and fixed, that fix propagates immediately and permanently across every patient and every trial that touches the corrected predicate. There is no need to retrain, re-embed, or re-evaluate a model. This is the core advantage of SatIR's formal approach: errors are local, fixes are global.

View failure case analysis →


BibTeX
@article{zhou2026satir,
  title  = {Scalable High-Recall Constraint-Satisfaction-Based
           Information Retrieval for Clinical Trials Matching},
  author = {Cyrus Zhou and Yufei Jin and Yilin Xu and
           Yu-Chiang Wang and Chieh-Ju Chao and Monica S. Lam},
  note   = {Under review},
  year   = {2026},
}

Support & Thanks

We gratefully acknowledge support from the Verdant Foundation, the Hasso Plattner Institute, Microsoft Azure AI credits, Itaú Unibanco, BMO Financial Group, and the Stanford HAI Institute. Cyrus Zhou is partially supported by the Stanford School of Engineering Fellowship.

We thank Yucheng Jiang for the clinician annotation interface, and our clinical collaborators at Stanford and Mayo Clinic for their expert guidance.