Retrieval agent implementation and benchmark updates by Tianyang-Zhang · Pull Request #1103 · MemMachine/MemMachine

Tianyang-Zhang · 2026-02-11T01:46:49Z

Purpose of the change

Adding an option to involve the retrieval agent during memory search to improve the memory retrieval accuracy.

Description

Implement three agents to help resolve complex multi-hop queries.
Add llm_model field in LongTermMemory configuration.
Refactor evaluation logic and add more result details. Add WikiMultihop and HotpotQA benchmarks.
Simplify the benchmark cmdlines.
Add corresponding unit tests and update existing unit tests.
Update Python SDK and TypeScript SDK.
Update stale documentation.

Type of change

[Please delete options that are not relevant.]

New feature (non-breaking change which adds functionality)
Documentation update

How Has This Been Tested?

Test via benchmark scripts and unit tests

[Please delete options that are not relevant.]

Unit Test
End-to-end Test
Test Script (please provide)

Test Results: [Attach logs, screenshots, or relevant output]
As shown in the Sample output part under evaluation/README.md.

Checklist

[Please delete options that are not relevant.]

I have signed the commit(s) within this pull request
My code follows the style guidelines of this project (See STYLE_GUIDE.md)
I have performed a self-review of my own code
I have commented my code
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added unit tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have checked my code and corrected any misspellings

Maintainer Checklist

Confirmed all checks passed
Contributor has signed the commit(s)
Reviewed the code
Run, Tested, and Verified the change(s) work as expected

Further comments

Future improvements:

more agent types for different question types.
configurable llm model reasoning effort.
configurable time cost and token cost threshold on the retrieval agent.
self-adapt search limit/top-k

Copilot

Pull request overview

This PR adds an “agent_mode” option to route long-term memory search through a new retrieval-agent orchestration (multi-hop, split-query, tool-selection), updates benchmarks/scripts, and propagates config + SDK changes to support an LLM model for the retrieval agent.

Changes:

Add retrieval-agent framework (ToolSelect / ChainOfQuery / SplitQuery / MemMachine retriever) and wire it into LongTermMemory.search_scored(..., agent_mode=...).
Extend configs + SDKs (Python + TS) to support agent_mode in search and add llm_model to long-term memory config/resources.
Add/refresh evaluation scripts + docs for new benchmark workflows (LoCoMo, WikiMultiHop, HotpotQA) and simplify CLI usage.

Reviewed changes

Copilot reviewed 69 out of 89 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
tests/memmachine/server/api_v2/test_spec.py	Asserts default `agent_mode` behavior in request spec parsing.
tests/memmachine/server/api_v2/test_router.py	Verifies router passes `agent_mode` through to service layer.
tests/memmachine/retrieval_agent/test_retrieval_agent.py	New unit tests covering retrieval-agent behaviors and reranking.
tests/memmachine/rest_client/test_memory.py	Adds Python REST client tests for `agent_mode` payload field.
tests/memmachine/main/test_memmachine_mock.py	Updates mocked config defaults to include `llm_model` for LTM.
tests/memmachine/common/session_manager/test_session_manager.py	Adds `llm_model` to episodic memory test configuration.
tests/memmachine/common/configuration/test_configuration.py	Validates merge/update behavior for new `llm_model` field.
src/memmachine/server/api_v2/service.py	Passes `agent_mode` through to `MemMachine.query_search()`; adds logging.
src/memmachine/retrieval_agent/common/agent_api.py	Introduces retrieval-agent base API, policies, and rerank helper.
src/memmachine/retrieval_agent/agents/tool_select_agent.py	Adds LLM-based tool routing agent.
src/memmachine/retrieval_agent/agents/split_query_agent.py	Adds query splitting agent for multi-entity single-hop queries.
src/memmachine/retrieval_agent/agents/memmachine_retriever.py	Adds direct MemMachine memory retriever agent wrapper.
src/memmachine/retrieval_agent/agents/coq_agent.py	Adds chain-of-query rewriting + sufficiency-check agent.
src/memmachine/retrieval_agent/agents/init.py	Exposes retrieval-agent classes in package exports.
src/memmachine/rest_client/memory.py	Adds `agent_mode` parameter to Python SDK `Memory.search()`.
src/memmachine/rest_client/README.md	Documents `agent_mode` usage in Python REST client README.
src/memmachine/main/memmachine.py	Propagates `agent_mode` through episodic search; validates `llm_model`.
src/memmachine/episodic_memory/long_term_memory/service_locator.py	Loads `llm_model` resource for long-term memory params.
src/memmachine/episodic_memory/long_term_memory/long_term_memory.py	Implements agent-mode branching and initializes retrieval agent.
src/memmachine/episodic_memory/episodic_memory.py	Plumbs `agent_mode` through episodic query path; adds logging.
src/memmachine/common/language_model/openai_responses_language_model.py	Adds `reasoning_effort` and token-usage-returning API method.
src/memmachine/common/filter/filter_parser.py	Marks `FilterExpr` as `@runtime_checkable`.
src/memmachine/common/errors.py	Adds language-model config errors for missing/default LLM model.
src/memmachine/common/configuration/language_model_conf.py	Adds `contains_language_model()` helper.
src/memmachine/common/configuration/episodic_config.py	Adds `llm_model` to long-term memory config/partial config.
src/memmachine/common/configuration/init.py	Adds default/check helpers for long-term-memory `llm_model`.
src/memmachine/common/api/spec.py	Adds `agent_mode` field to `SearchMemoriesSpec`.
src/memmachine/common/api/doc.py	Documents `agent_mode` and adds examples.
src/memmachine-ts/rest_client/tests/memmachine-memory.spec.ts	Adds TS client test asserting `agent_mode` in search payload.
src/memmachine-ts/rest_client/src/memory/memmachine-memory.types.ts	Adds `agent_mode` option to TS `SearchMemoriesOptions`.
src/memmachine-ts/rest_client/src/memory/memmachine-memory.ts	Includes `agent_mode` field in TS search request payload.
src/memmachine-ts/rest_client/README.md	Documents TS usage example for `agent_mode`.
sample_configs/episodic_memory_config.gpu.sample	Adds `llm_model` to long-term memory sample config.
sample_configs/episodic_memory_config.cpu.sample	Adds `llm_model` to long-term memory sample config.
memmachine-compose.sh	Updates config generation to rewrite LTM `llm_model` value.
evaluation/utils/memmachine_helper_restapiv2.py	Adds `agent_mode` to evaluation REST helper payload.
evaluation/utils/agent_utils.py	New helper utilities for retrieval-agent benchmark workflows.
evaluation/retrieval_agent/wikimultihop_search.py	Adds WikiMultiHop search benchmark script (agent/memmachine/llm modes).
evaluation/retrieval_agent/wikimultihop_ingest.py	Adds WikiMultiHop ingestion script.
evaluation/retrieval_agent/run_test.sh	New unified runner for locomo/wikimultihop/hotpotqa ingest/search.
evaluation/retrieval_agent/locomo_search.py	Adds LoCoMo search benchmark script using new agent utils.
evaluation/retrieval_agent/locomo_ingest.py	Adds LoCoMo ingestion script using new agent utils.
evaluation/retrieval_agent/locomo_delete.py	Simplifies LoCoMo delete script signature/typing.
evaluation/retrieval_agent/llm_judge.py	Updates judging model + parsing; simplifies IO paths.
evaluation/retrieval_agent/hotpotQA_test.py	Adds HotpotQA ingest/search benchmark script.
evaluation/retrieval_agent/generate_scores.py	Adds consolidated scoring report for retrieval-agent benchmark outputs.
evaluation/retrieval_agent/evaluate.py	Updates evaluation pipeline to new output schema fields.
evaluation/locomo/episodic_memory/locomo_ingest.py	Removes older legacy ingestion script.
evaluation/locomo/episodic_memory/locomo_config.yaml	Removes legacy YAML config stub.
evaluation/locomo/episodic_memory/README.md	Removes legacy locomo episodic_memory README.
evaluation/locomo/episodic_agent/run_experiments.py	Removes legacy episodic agent runner.
evaluation/locomo/episodic_agent/restapiv2_locomo_search_agent.py	Removes legacy restapiv2 search agent implementation.
evaluation/locomo/episodic_agent/memmachine_locomo.py	Removes legacy MemMachine locomo runner.
evaluation/locomo/episodic_agent/locomo_agent.py	Removes legacy OpenAI Agents-sdk locomo agent code.
evaluation/locomo/episodic_agent/generate_scores.py	Removes legacy scoring script.
evaluation/locomo/episodic_agent/init.py	Removes legacy package marker.
evaluation/locomo/episodic_agent/README.md	Removes legacy episodic agent README.
evaluation/episodic_memory/restapiv2_locomo_search.py	Adds CLI flag to enable `agent_mode` in legacy episodic workflow.
evaluation/episodic_memory/README.md	New README for legacy episodic workflow (with agent-mode option).
evaluation/README.md	Replaces top-level evaluation guide with retrieval-agent workflow docs.
docs/open_source/configuration.mdx	Documents `llm_model` under long-term memory config.
docs/install_guide/integrate/GPTStore.mdx	Documents `agent_mode` in schema snippet for GPT Store integration.
docs/getting_started/benchmarks.mdx	Points users to updated evaluation guide; expands benchmark mentions.
docs/examples/rest.mdx	Documents `agent_mode` in REST search example.
docs/api_reference/ts-rest/interfaces/SearchMemoryResult.mdx	Updates TS REST docs for renamed interfaces/options and adds `agent_mode`.
docs/api_reference/ts-rest/classes/MemMachineMemory.mdx	Updates TS class docs for new option type names and response structure.
docs/api_reference/python/memory_api.mdx	Updates Python SDK docs for expanded `search()` signature + `agent_mode`.
USAGE.md	Documents REST + Python SDK defaults including `agent_mode`.
README.md	Adds bibtex reference section relevant to agent prompts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/memmachine/retrieval_agent/common/agent_api.py

src/memmachine/retrieval_agent/agents/tool_select_agent.py

src/memmachine/retrieval_agent/agents/split_query_agent.py

Copilot · 2026-02-11T01:50:11Z

src/memmachine/episodic_memory/long_term_memory/long_term_memory.py

+        param: AgentToolBaseParam = AgentToolBaseParam(
+            model=model,
+            children_tools=[split_agent, coq_agent, memory_agent],
+            backend=None,
+            extra_params={"default_tool_name": coq_agent.agent_name},
+        )


Same issue as above: backend is not a valid field for AgentToolBaseParam, so this will fail during initialization. Suggestion: remove backend=None (or add the field in AgentToolBaseParam and update all call sites consistently).

src/memmachine/retrieval_agent/agents/memmachine_retriever.py

src/memmachine/server/api_v2/service.py

evaluation/utils/memmachine_helper_restapiv2.py

src/memmachine/retrieval_agent/agents/tool_select_agent.py

Tianyang-Zhang requested review from SarahScargall, Copilot, edwinyyyu, jealous, malatewang, o-love and sscargal February 11, 2026 01:46

Copilot AI reviewed Feb 11, 2026

View reviewed changes

SarahScargall mentioned this pull request Feb 11, 2026

[Docs]: Implement Retrieval Agent Updates #1104

Open

Tianyang-Zhang force-pushed the retrieval_agent branch 2 times, most recently from d47be46 to af8e56a Compare February 12, 2026 01:15

Tianyang-Zhang added 12 commits February 12, 2026 21:40

Add retrieval agent for complex query search

646094c

Add retrieval agent unit tests

f38aa99

Refactor and add evaluation benchmarks

5d8779a

Add llm baseline tests and update README

7e4faec

Update agent_mode in Python SDK, type script, and docs

bd727e1

Fix unit tests

7ac79af

Fix lint and integration test

29c3144

Fix tests

952ac9b

Ruff code format

c5157d2

Fix ty static check

8db30e7

Fix REST client tests

4324660

Fix lint, ruff, and unit tests

53a2d98

Tianyang-Zhang force-pushed the retrieval_agent branch from af8e56a to 53a2d98 Compare February 12, 2026 21:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retrieval agent implementation and benchmark updates#1103

Retrieval agent implementation and benchmark updates#1103
Tianyang-Zhang wants to merge 12 commits intomainfrom
retrieval_agent

Tianyang-Zhang commented Feb 11, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Feb 11, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Tianyang-Zhang commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose of the change

Description

Type of change

How Has This Been Tested?

Checklist

Maintainer Checklist

Further comments

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Tianyang-Zhang commented Feb 11, 2026 •

edited

Loading