Handling Missingness with Salience
Patient records are often incomplete or under-specified. SatIR augments its constraint representation to handle such incompleteness through salience-based reasoning over missing or underspecified evidence, and in some cases by inferring diagnoses from patient notes.
The fundamental assumption: any salient information about a patient's condition will be documented in their medical record. This allows us to address missing data by focusing on whether the absent information is truly salient.
Because salience is not formally defined in the medical literature, SatIR uses targeted LLM queries to assess concept importance. These judgments are recorded explicitly, keeping matching decisions transparent, interpretable, and open to expert review. In contrast, end-to-end LLM matching is much harder to inspect.
For each clinical trial condition, SatIR uses the LLM to determine whether potentially missing information is salient, and correspondingly whether it should be interpreted as supporting the condition, refuting it, or remaining inconclusive.
A patient is logically eligible for trials targeting conditions that subsume the patient's diagnosis. However, since medical records can be under-specified, it may also be reasonable to match patients when their diagnosis subsumes the trial's targeted condition. Whether we should do so depends on the salience of the targeted condition.
For example, a patient documented only with appendicitis may still match a trial for Acute Appendicitis, since the record may omit that extra specificity. But the same patient should not match a trial for Ruptured Suppurative Appendicitis, because such a salient condition would likely be explicitly recorded.
Augmentation compiles into ORs. Rather than bending the matching logic at runtime, SatIR encodes salience decisions directly into the trial-side formula as disjunctive clauses. When the extra specificity of a condition relative to its ontology parent is low-salience, the condition is expanded to specific_condition OR parent_concept. The augmented formula remains valid, and the solver's guarantees are preserved.
Some patient notes describe symptoms or clinical findings without stating a diagnosis explicitly. In these cases, SatIR augments its representation by inferring likely diagnoses using the LLM parsing pipeline. This allows constraints that reference specific diagnoses to be evaluated even when the record documents only the underlying clinical evidence.