Audrey HTTP Score
Up from 34.3% before the tag isolation fix (+23.2 pts).
Audrey Hit Rate
Expected memory appeared in top-k on 6 of 7 probes, up from 3 of 7.
Audrey P95 Recall
Latency changed +0.1 ms after the fix.
Before And After
Current Graphs
Adapter Results After Fix
| Adapter | Score | Hit Rate | P95 Recall | Precision | Contamination Penalty |
|---|---|---|---|---|---|
| typed-semantic | 72.7% | 100.0% | 0.4 ms | 34.5% | 45.2% |
| hybrid | 72.7% | 100.0% | 0.3 ms | 34.5% | 45.2% |
| audrey-http | 57.5% | 85.7% | 3.8 ms | 29.8% | 50.0% |
What Changed
Audrey recall previously treated multiple tags as an OR filter. MemoryGym passes memorygym, the run id, and the scenario id on recall. OR matching let unrelated scenario memories through because every row shared the memorygym tag. Audrey now requires all requested tags to match, which raised Audrey HTTP from 34.3% to 57.5%.
Remaining Failures
The next benchmark work is current-belief precedence and context-aware ranking. Audrey now usually finds the expected memory, but still ranks stale or wrong-project memories too high.
| Probe | Query | Result | Returned IDs | Top Recall |
|---|---|---|---|---|
| typed-profile-updates / current-workspace | Which collaboration environment should Maya use right now? | 0.0% score 0.0% hit | maya-workspace-old, jonas-distractor-workspace, maya-pref-morning | Maya's active workspace used to be Atlas for the onboarding sprint. |
| typed-profile-updates / meeting-preference | How should Maya receive meeting preparation? | 70.7% score 100.0% hit | maya-pref-morning, maya-workspace-old, maya-workspace-current | Maya prefers morning async notes before any meeting-heavy work. |
| retrieval-context-routing / audrey-release-gate | Which project requires a benchmark gate and doctor check for release validation? | 59.0% score 100.0% hit | termivibe-first-contact, audrey-benchmark-gate, repopulse-trace-contract | TermiVibe first contact validation should cover config precedence, typed mode, no wake word mode, and telemetry persistence. |
| retrieval-context-routing / repopulse-trace | Which memory mentions pipeline version and dense rerank trace boundaries? | 57.0% score 100.0% hit | audrey-benchmark-gate, repopulse-trace-contract, termivibe-first-contact | Audrey release validation includes build, typecheck, benchmark gate, doctor, demo, pack dry-run, and host smoke coverage. |
Submission Status
| Target | Status | URL / next action |
|---|---|---|
| MCP.Directory | Submitted for review | Review promised within 24 hours. |
| MCP.so | Signed-in server record created and GitHub issue opened | mcpso issue 2198 |
| Glama | Submitted for review | Signed-in form completed. |
| CodeSOTA | Coverage request submitted, not a fake score | Waiting for editorial reply. |
| AMB | Provider/leaderboard issue opened and updated with this run | AMB issue 11 |
| Hugging Face | Dataset and Space updated with current evidence | Space and Dataset |
Files In This Evidence Bundle
memorygym-before-tag-filter.jsonmemorygym-after-tag-filter.jsonmemorygym-current-run.jsonmemorygym-current-summary.jsonmemorygym-before-after.svgmemorygym-score.svgmemorygym-hit-rate.svgmemorygym-p95-recall-latency.svgaudrey-benchmark-report.png