Skip to content

Retrieval agent implementation and benchmark updates#1103

Open
Tianyang-Zhang wants to merge 12 commits intomainfrom
retrieval_agent
Open

Retrieval agent implementation and benchmark updates#1103
Tianyang-Zhang wants to merge 12 commits intomainfrom
retrieval_agent

Conversation

@Tianyang-Zhang
Copy link

@Tianyang-Zhang Tianyang-Zhang commented Feb 11, 2026

Purpose of the change

Adding an option to involve the retrieval agent during memory search to improve the memory retrieval accuracy.

Description

  1. Implement three agents to help resolve complex multi-hop queries.
  2. Add llm_model field in LongTermMemory configuration.
  3. Refactor evaluation logic and add more result details. Add WikiMultihop and HotpotQA benchmarks.
  4. Simplify the benchmark cmdlines.
  5. Add corresponding unit tests and update existing unit tests.
  6. Update Python SDK and TypeScript SDK.
  7. Update stale documentation.

Type of change

[Please delete options that are not relevant.]

  • New feature (non-breaking change which adds functionality)
  • Documentation update

How Has This Been Tested?

Test via benchmark scripts and unit tests

[Please delete options that are not relevant.]

  • Unit Test
  • End-to-end Test
  • Test Script (please provide)

Test Results: [Attach logs, screenshots, or relevant output]
As shown in the Sample output part under evaluation/README.md.

Checklist

[Please delete options that are not relevant.]

  • I have signed the commit(s) within this pull request
  • My code follows the style guidelines of this project (See STYLE_GUIDE.md)
  • I have performed a self-review of my own code
  • I have commented my code
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added unit tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have checked my code and corrected any misspellings

Maintainer Checklist

  • Confirmed all checks passed
  • Contributor has signed the commit(s)
  • Reviewed the code
  • Run, Tested, and Verified the change(s) work as expected

Further comments

Future improvements:

  1. more agent types for different question types.
  2. configurable llm model reasoning effort.
  3. configurable time cost and token cost threshold on the retrieval agent.
  4. self-adapt search limit/top-k

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an “agent_mode” option to route long-term memory search through a new retrieval-agent orchestration (multi-hop, split-query, tool-selection), updates benchmarks/scripts, and propagates config + SDK changes to support an LLM model for the retrieval agent.

Changes:

  • Add retrieval-agent framework (ToolSelect / ChainOfQuery / SplitQuery / MemMachine retriever) and wire it into LongTermMemory.search_scored(..., agent_mode=...).
  • Extend configs + SDKs (Python + TS) to support agent_mode in search and add llm_model to long-term memory config/resources.
  • Add/refresh evaluation scripts + docs for new benchmark workflows (LoCoMo, WikiMultiHop, HotpotQA) and simplify CLI usage.

Reviewed changes

Copilot reviewed 69 out of 89 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
tests/memmachine/server/api_v2/test_spec.py Asserts default agent_mode behavior in request spec parsing.
tests/memmachine/server/api_v2/test_router.py Verifies router passes agent_mode through to service layer.
tests/memmachine/retrieval_agent/test_retrieval_agent.py New unit tests covering retrieval-agent behaviors and reranking.
tests/memmachine/rest_client/test_memory.py Adds Python REST client tests for agent_mode payload field.
tests/memmachine/main/test_memmachine_mock.py Updates mocked config defaults to include llm_model for LTM.
tests/memmachine/common/session_manager/test_session_manager.py Adds llm_model to episodic memory test configuration.
tests/memmachine/common/configuration/test_configuration.py Validates merge/update behavior for new llm_model field.
src/memmachine/server/api_v2/service.py Passes agent_mode through to MemMachine.query_search(); adds logging.
src/memmachine/retrieval_agent/common/agent_api.py Introduces retrieval-agent base API, policies, and rerank helper.
src/memmachine/retrieval_agent/agents/tool_select_agent.py Adds LLM-based tool routing agent.
src/memmachine/retrieval_agent/agents/split_query_agent.py Adds query splitting agent for multi-entity single-hop queries.
src/memmachine/retrieval_agent/agents/memmachine_retriever.py Adds direct MemMachine memory retriever agent wrapper.
src/memmachine/retrieval_agent/agents/coq_agent.py Adds chain-of-query rewriting + sufficiency-check agent.
src/memmachine/retrieval_agent/agents/init.py Exposes retrieval-agent classes in package exports.
src/memmachine/rest_client/memory.py Adds agent_mode parameter to Python SDK Memory.search().
src/memmachine/rest_client/README.md Documents agent_mode usage in Python REST client README.
src/memmachine/main/memmachine.py Propagates agent_mode through episodic search; validates llm_model.
src/memmachine/episodic_memory/long_term_memory/service_locator.py Loads llm_model resource for long-term memory params.
src/memmachine/episodic_memory/long_term_memory/long_term_memory.py Implements agent-mode branching and initializes retrieval agent.
src/memmachine/episodic_memory/episodic_memory.py Plumbs agent_mode through episodic query path; adds logging.
src/memmachine/common/language_model/openai_responses_language_model.py Adds reasoning_effort and token-usage-returning API method.
src/memmachine/common/filter/filter_parser.py Marks FilterExpr as @runtime_checkable.
src/memmachine/common/errors.py Adds language-model config errors for missing/default LLM model.
src/memmachine/common/configuration/language_model_conf.py Adds contains_language_model() helper.
src/memmachine/common/configuration/episodic_config.py Adds llm_model to long-term memory config/partial config.
src/memmachine/common/configuration/init.py Adds default/check helpers for long-term-memory llm_model.
src/memmachine/common/api/spec.py Adds agent_mode field to SearchMemoriesSpec.
src/memmachine/common/api/doc.py Documents agent_mode and adds examples.
src/memmachine-ts/rest_client/tests/memmachine-memory.spec.ts Adds TS client test asserting agent_mode in search payload.
src/memmachine-ts/rest_client/src/memory/memmachine-memory.types.ts Adds agent_mode option to TS SearchMemoriesOptions.
src/memmachine-ts/rest_client/src/memory/memmachine-memory.ts Includes agent_mode field in TS search request payload.
src/memmachine-ts/rest_client/README.md Documents TS usage example for agent_mode.
sample_configs/episodic_memory_config.gpu.sample Adds llm_model to long-term memory sample config.
sample_configs/episodic_memory_config.cpu.sample Adds llm_model to long-term memory sample config.
memmachine-compose.sh Updates config generation to rewrite LTM llm_model value.
evaluation/utils/memmachine_helper_restapiv2.py Adds agent_mode to evaluation REST helper payload.
evaluation/utils/agent_utils.py New helper utilities for retrieval-agent benchmark workflows.
evaluation/retrieval_agent/wikimultihop_search.py Adds WikiMultiHop search benchmark script (agent/memmachine/llm modes).
evaluation/retrieval_agent/wikimultihop_ingest.py Adds WikiMultiHop ingestion script.
evaluation/retrieval_agent/run_test.sh New unified runner for locomo/wikimultihop/hotpotqa ingest/search.
evaluation/retrieval_agent/locomo_search.py Adds LoCoMo search benchmark script using new agent utils.
evaluation/retrieval_agent/locomo_ingest.py Adds LoCoMo ingestion script using new agent utils.
evaluation/retrieval_agent/locomo_delete.py Simplifies LoCoMo delete script signature/typing.
evaluation/retrieval_agent/llm_judge.py Updates judging model + parsing; simplifies IO paths.
evaluation/retrieval_agent/hotpotQA_test.py Adds HotpotQA ingest/search benchmark script.
evaluation/retrieval_agent/generate_scores.py Adds consolidated scoring report for retrieval-agent benchmark outputs.
evaluation/retrieval_agent/evaluate.py Updates evaluation pipeline to new output schema fields.
evaluation/locomo/episodic_memory/locomo_ingest.py Removes older legacy ingestion script.
evaluation/locomo/episodic_memory/locomo_config.yaml Removes legacy YAML config stub.
evaluation/locomo/episodic_memory/README.md Removes legacy locomo episodic_memory README.
evaluation/locomo/episodic_agent/run_experiments.py Removes legacy episodic agent runner.
evaluation/locomo/episodic_agent/restapiv2_locomo_search_agent.py Removes legacy restapiv2 search agent implementation.
evaluation/locomo/episodic_agent/memmachine_locomo.py Removes legacy MemMachine locomo runner.
evaluation/locomo/episodic_agent/locomo_agent.py Removes legacy OpenAI Agents-sdk locomo agent code.
evaluation/locomo/episodic_agent/generate_scores.py Removes legacy scoring script.
evaluation/locomo/episodic_agent/init.py Removes legacy package marker.
evaluation/locomo/episodic_agent/README.md Removes legacy episodic agent README.
evaluation/episodic_memory/restapiv2_locomo_search.py Adds CLI flag to enable agent_mode in legacy episodic workflow.
evaluation/episodic_memory/README.md New README for legacy episodic workflow (with agent-mode option).
evaluation/README.md Replaces top-level evaluation guide with retrieval-agent workflow docs.
docs/open_source/configuration.mdx Documents llm_model under long-term memory config.
docs/install_guide/integrate/GPTStore.mdx Documents agent_mode in schema snippet for GPT Store integration.
docs/getting_started/benchmarks.mdx Points users to updated evaluation guide; expands benchmark mentions.
docs/examples/rest.mdx Documents agent_mode in REST search example.
docs/api_reference/ts-rest/interfaces/SearchMemoryResult.mdx Updates TS REST docs for renamed interfaces/options and adds agent_mode.
docs/api_reference/ts-rest/classes/MemMachineMemory.mdx Updates TS class docs for new option type names and response structure.
docs/api_reference/python/memory_api.mdx Updates Python SDK docs for expanded search() signature + agent_mode.
USAGE.md Documents REST + Python SDK defaults including agent_mode.
README.md Adds bibtex reference section relevant to agent prompts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 144 to 149
param: AgentToolBaseParam = AgentToolBaseParam(
model=model,
children_tools=[split_agent, coq_agent, memory_agent],
backend=None,
extra_params={"default_tool_name": coq_agent.agent_name},
)
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as above: backend is not a valid field for AgentToolBaseParam, so this will fail during initialization. Suggestion: remove backend=None (or add the field in AgentToolBaseParam and update all call sites consistently).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant