Handling Missingness with Salience

Patient records are often incomplete or under-specified. SatIR augments its constraint representation to handle such incompleteness through salience-based reasoning over missing or underspecified evidence, and in some cases by inferring diagnoses from patient notes.


Salience as a Principle for Handling Missingness

The fundamental assumption: any salient information about a patient's condition will be documented in their medical record. This allows us to address missing data by focusing on whether the absent information is truly salient.

Because salience is not formally defined in the medical literature, SatIR uses targeted LLM queries to assess concept importance. These judgments are recorded explicitly, keeping matching decisions transparent, interpretable, and open to expert review. In contrast, end-to-end LLM matching is much harder to inspect.

Why this matters: end-to-end LLM matchers make implicit missingness judgments that are hidden inside model weights. SatIR externalizes these as explicit salience assessments, making it possible to review, override, or standardize how incomplete records are handled — a critical property for clinical deployment.

Whole-Fact Missingness in Records
When a patient record lacks information to support or refute a specific trial constraint, salience determines how that absence should be interpreted.

For each clinical trial condition, SatIR uses the LLM to determine whether potentially missing information is salient, and correspondingly whether it should be interpreted as supporting the condition, refuting it, or remaining inconclusive.

Whole-fact salience
Fig. — Whole-fact missingness. When a queried trial condition has no direct support in the patient record, salience determines whether the missing fact is tolerable (low salience — remain inclusive), or decision-critical (high salience — absence is evidence).

Salience of Specificity
When a patient's documented diagnosis is less specific than a trial's target, salience determines whether the coarser evidence is sufficient.

A patient is logically eligible for trials targeting conditions that subsume the patient's diagnosis. However, since medical records can be under-specified, it may also be reasonable to match patients when their diagnosis subsumes the trial's targeted condition. Whether we should do so depends on the salience of the targeted condition.

For example, a patient documented only with appendicitis may still match a trial for Acute Appendicitis, since the record may omit that extra specificity. But the same patient should not match a trial for Ruptured Suppurative Appendicitis, because such a salient condition would likely be explicitly recorded.

Augmentation compiles into ORs. Rather than bending the matching logic at runtime, SatIR encodes salience decisions directly into the trial-side formula as disjunctive clauses. When the extra specificity of a condition relative to its ontology parent is low-salience, the condition is expanded to specific_condition OR parent_concept. The augmented formula remains valid, and the solver's guarantees are preserved.

Specificity salience
Fig. — Specificity salience. A patient documented with appendicitis may match Acute Appendicitis (low salience — specificity gap is tolerable), but not Ruptured Suppurative Appendicitis (high salience — would be explicitly recorded).

Inferring Diagnoses from Patient Notes

Some patient notes describe symptoms or clinical findings without stating a diagnosis explicitly. In these cases, SatIR augments its representation by inferring likely diagnoses using the LLM parsing pipeline. This allows constraints that reference specific diagnoses to be evaluated even when the record documents only the underlying clinical evidence.