Retrieval agent implementation and benchmark updates#1103
Retrieval agent implementation and benchmark updates#1103Tianyang-Zhang wants to merge 12 commits intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds an “agent_mode” option to route long-term memory search through a new retrieval-agent orchestration (multi-hop, split-query, tool-selection), updates benchmarks/scripts, and propagates config + SDK changes to support an LLM model for the retrieval agent.
Changes:
- Add retrieval-agent framework (ToolSelect / ChainOfQuery / SplitQuery / MemMachine retriever) and wire it into
LongTermMemory.search_scored(..., agent_mode=...). - Extend configs + SDKs (Python + TS) to support
agent_modein search and addllm_modelto long-term memory config/resources. - Add/refresh evaluation scripts + docs for new benchmark workflows (LoCoMo, WikiMultiHop, HotpotQA) and simplify CLI usage.
Reviewed changes
Copilot reviewed 69 out of 89 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/memmachine/server/api_v2/test_spec.py | Asserts default agent_mode behavior in request spec parsing. |
| tests/memmachine/server/api_v2/test_router.py | Verifies router passes agent_mode through to service layer. |
| tests/memmachine/retrieval_agent/test_retrieval_agent.py | New unit tests covering retrieval-agent behaviors and reranking. |
| tests/memmachine/rest_client/test_memory.py | Adds Python REST client tests for agent_mode payload field. |
| tests/memmachine/main/test_memmachine_mock.py | Updates mocked config defaults to include llm_model for LTM. |
| tests/memmachine/common/session_manager/test_session_manager.py | Adds llm_model to episodic memory test configuration. |
| tests/memmachine/common/configuration/test_configuration.py | Validates merge/update behavior for new llm_model field. |
| src/memmachine/server/api_v2/service.py | Passes agent_mode through to MemMachine.query_search(); adds logging. |
| src/memmachine/retrieval_agent/common/agent_api.py | Introduces retrieval-agent base API, policies, and rerank helper. |
| src/memmachine/retrieval_agent/agents/tool_select_agent.py | Adds LLM-based tool routing agent. |
| src/memmachine/retrieval_agent/agents/split_query_agent.py | Adds query splitting agent for multi-entity single-hop queries. |
| src/memmachine/retrieval_agent/agents/memmachine_retriever.py | Adds direct MemMachine memory retriever agent wrapper. |
| src/memmachine/retrieval_agent/agents/coq_agent.py | Adds chain-of-query rewriting + sufficiency-check agent. |
| src/memmachine/retrieval_agent/agents/init.py | Exposes retrieval-agent classes in package exports. |
| src/memmachine/rest_client/memory.py | Adds agent_mode parameter to Python SDK Memory.search(). |
| src/memmachine/rest_client/README.md | Documents agent_mode usage in Python REST client README. |
| src/memmachine/main/memmachine.py | Propagates agent_mode through episodic search; validates llm_model. |
| src/memmachine/episodic_memory/long_term_memory/service_locator.py | Loads llm_model resource for long-term memory params. |
| src/memmachine/episodic_memory/long_term_memory/long_term_memory.py | Implements agent-mode branching and initializes retrieval agent. |
| src/memmachine/episodic_memory/episodic_memory.py | Plumbs agent_mode through episodic query path; adds logging. |
| src/memmachine/common/language_model/openai_responses_language_model.py | Adds reasoning_effort and token-usage-returning API method. |
| src/memmachine/common/filter/filter_parser.py | Marks FilterExpr as @runtime_checkable. |
| src/memmachine/common/errors.py | Adds language-model config errors for missing/default LLM model. |
| src/memmachine/common/configuration/language_model_conf.py | Adds contains_language_model() helper. |
| src/memmachine/common/configuration/episodic_config.py | Adds llm_model to long-term memory config/partial config. |
| src/memmachine/common/configuration/init.py | Adds default/check helpers for long-term-memory llm_model. |
| src/memmachine/common/api/spec.py | Adds agent_mode field to SearchMemoriesSpec. |
| src/memmachine/common/api/doc.py | Documents agent_mode and adds examples. |
| src/memmachine-ts/rest_client/tests/memmachine-memory.spec.ts | Adds TS client test asserting agent_mode in search payload. |
| src/memmachine-ts/rest_client/src/memory/memmachine-memory.types.ts | Adds agent_mode option to TS SearchMemoriesOptions. |
| src/memmachine-ts/rest_client/src/memory/memmachine-memory.ts | Includes agent_mode field in TS search request payload. |
| src/memmachine-ts/rest_client/README.md | Documents TS usage example for agent_mode. |
| sample_configs/episodic_memory_config.gpu.sample | Adds llm_model to long-term memory sample config. |
| sample_configs/episodic_memory_config.cpu.sample | Adds llm_model to long-term memory sample config. |
| memmachine-compose.sh | Updates config generation to rewrite LTM llm_model value. |
| evaluation/utils/memmachine_helper_restapiv2.py | Adds agent_mode to evaluation REST helper payload. |
| evaluation/utils/agent_utils.py | New helper utilities for retrieval-agent benchmark workflows. |
| evaluation/retrieval_agent/wikimultihop_search.py | Adds WikiMultiHop search benchmark script (agent/memmachine/llm modes). |
| evaluation/retrieval_agent/wikimultihop_ingest.py | Adds WikiMultiHop ingestion script. |
| evaluation/retrieval_agent/run_test.sh | New unified runner for locomo/wikimultihop/hotpotqa ingest/search. |
| evaluation/retrieval_agent/locomo_search.py | Adds LoCoMo search benchmark script using new agent utils. |
| evaluation/retrieval_agent/locomo_ingest.py | Adds LoCoMo ingestion script using new agent utils. |
| evaluation/retrieval_agent/locomo_delete.py | Simplifies LoCoMo delete script signature/typing. |
| evaluation/retrieval_agent/llm_judge.py | Updates judging model + parsing; simplifies IO paths. |
| evaluation/retrieval_agent/hotpotQA_test.py | Adds HotpotQA ingest/search benchmark script. |
| evaluation/retrieval_agent/generate_scores.py | Adds consolidated scoring report for retrieval-agent benchmark outputs. |
| evaluation/retrieval_agent/evaluate.py | Updates evaluation pipeline to new output schema fields. |
| evaluation/locomo/episodic_memory/locomo_ingest.py | Removes older legacy ingestion script. |
| evaluation/locomo/episodic_memory/locomo_config.yaml | Removes legacy YAML config stub. |
| evaluation/locomo/episodic_memory/README.md | Removes legacy locomo episodic_memory README. |
| evaluation/locomo/episodic_agent/run_experiments.py | Removes legacy episodic agent runner. |
| evaluation/locomo/episodic_agent/restapiv2_locomo_search_agent.py | Removes legacy restapiv2 search agent implementation. |
| evaluation/locomo/episodic_agent/memmachine_locomo.py | Removes legacy MemMachine locomo runner. |
| evaluation/locomo/episodic_agent/locomo_agent.py | Removes legacy OpenAI Agents-sdk locomo agent code. |
| evaluation/locomo/episodic_agent/generate_scores.py | Removes legacy scoring script. |
| evaluation/locomo/episodic_agent/init.py | Removes legacy package marker. |
| evaluation/locomo/episodic_agent/README.md | Removes legacy episodic agent README. |
| evaluation/episodic_memory/restapiv2_locomo_search.py | Adds CLI flag to enable agent_mode in legacy episodic workflow. |
| evaluation/episodic_memory/README.md | New README for legacy episodic workflow (with agent-mode option). |
| evaluation/README.md | Replaces top-level evaluation guide with retrieval-agent workflow docs. |
| docs/open_source/configuration.mdx | Documents llm_model under long-term memory config. |
| docs/install_guide/integrate/GPTStore.mdx | Documents agent_mode in schema snippet for GPT Store integration. |
| docs/getting_started/benchmarks.mdx | Points users to updated evaluation guide; expands benchmark mentions. |
| docs/examples/rest.mdx | Documents agent_mode in REST search example. |
| docs/api_reference/ts-rest/interfaces/SearchMemoryResult.mdx | Updates TS REST docs for renamed interfaces/options and adds agent_mode. |
| docs/api_reference/ts-rest/classes/MemMachineMemory.mdx | Updates TS class docs for new option type names and response structure. |
| docs/api_reference/python/memory_api.mdx | Updates Python SDK docs for expanded search() signature + agent_mode. |
| USAGE.md | Documents REST + Python SDK defaults including agent_mode. |
| README.md | Adds bibtex reference section relevant to agent prompts. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| param: AgentToolBaseParam = AgentToolBaseParam( | ||
| model=model, | ||
| children_tools=[split_agent, coq_agent, memory_agent], | ||
| backend=None, | ||
| extra_params={"default_tool_name": coq_agent.agent_name}, | ||
| ) |
There was a problem hiding this comment.
Same issue as above: backend is not a valid field for AgentToolBaseParam, so this will fail during initialization. Suggestion: remove backend=None (or add the field in AgentToolBaseParam and update all call sites consistently).
d47be46 to
af8e56a
Compare
af8e56a to
53a2d98
Compare
Purpose of the change
Adding an option to involve the retrieval agent during memory search to improve the memory retrieval accuracy.
Description
llm_modelfield inLongTermMemoryconfiguration.WikiMultihopandHotpotQAbenchmarks.Type of change
[Please delete options that are not relevant.]
How Has This Been Tested?
Test via benchmark scripts and unit tests
[Please delete options that are not relevant.]
Test Results: [Attach logs, screenshots, or relevant output]
As shown in the
Sample outputpart underevaluation/README.md.Checklist
[Please delete options that are not relevant.]
Maintainer Checklist
Further comments
Future improvements: