# BK-73 KG-Aware Retrieval Evaluation

Date: 2026-05-04

Issue: BK-73

## Goal

Test graph-aware retrieval as a support layer for the SOTA Bilgi Kitabı conversational agent without replacing the current hybrid baseline or promoting model-derived KG candidates to source truth.

## Fixed Evaluation Surface

- Pack: `docs/verification/sota-conversational-agent-pack-2026-05-03.json`
- Baseline: `docs/verification/generated/hybrid-retrieval-eval-2026-05-03.json`
- Baseline metrics: top-k hit rate `1.0`, MRR `1.0`
- Role: `support`

## Editable Surface

- Added `scripts/kg_aware_message_block_retrieval.py`
- Added `scripts/run_kg_aware_retrieval_eval.py`
- Added tests in `tests/test_kg_aware_retrieval.py`

The KG-aware layer attaches:

- canonical source graph context from `docs/models/generated/kg-source-graph-2026-05-04.json`
- unreviewed candidate term/claim support from `docs/models/generated/bk72-kg-extraction-candidates-2026-05-04.json`
- an explicit rank contract that keeps answer grounding tied to source surfaces

## Result

Output: `docs/verification/generated/bk73-kg-aware-retrieval-eval-2026-05-04.json`

| Mode | Top-k hit rate | MRR |
| --- | ---: | ---: |
| Hybrid baseline | 1.0 | 1.0 |
| KG-aware support | 1.0 | 1.0 |

Trace coverage:

- Source graph trace on expected-hit cases: `1.0`
- Candidate support on expected-hit cases: `0.8333`

Disposition: `keep_as_support`

## Interpretation

The KG-aware layer should be kept as an auditable retrieval support layer because it does not regress the frozen hybrid baseline and it adds visible provenance useful for downstream answer planning.

This is not evidence that graph-aware retrieval is better at finding blocks on the current pack, because the baseline is already perfect. The useful gain is auditability: retrieved blocks now carry canonical source-graph context, adjacent source-neighborhood metadata, and clearly marked unreviewed candidate support.

## Restraint Rule

Unreviewed BK-72 terms and claims remain candidate support only. They may help retrieval inspection and future answer planning, but canonical answer grounding must cite source surfaces, not model-derived candidates.
