Skip to content

SEP-1627: Conformance Testing #1627

@pcarleton

Description

@pcarleton

Preamble

Title: Conformance Testing
Author(s): Olivier Chafik (@ochafik) , Paul Carleton (@pcarleton)
Sponsor: @pcarleton
Status: Draft
Type: Process
Created: 2025-10-08

Abstract

This SEP proposes the creation of a compliance test suite and harness, aimed primarily at easing up development and feature parity of MCP SDKs by facilitating the comparison of their behaviour across languages in a wide range of scenarios. Associated tooling will also simplify debugging MCP interactions, including authentication and authorization flows.

Motivation

Translating the human readable specification to code requires significant time and energy. Currently, each new implementation (e.g. a new SDK) has to do this work independently.  While the specification aims to be unambiguous, this re-translation has resulted in slightly different behavior. The goal of this proposal is to provide tools to reduce or remove these differences to maximize compatibility between independently built implementations.

A secondary goal is to facilitate debugging of any MCP interaction, beyond SDK development use cases: while we can be more prescriptive in how we want the official SDKs to behave, the spec often leaves a large freedom of behavior to implementations, and having a tool that can tell us when the interaction of two arbitrary MCP systems is violating the spec can help users that integrate said systems.

Specification (WIP)

Overview

#948 contains an early prototype for this SEP, which details are pretty much still in flux (feedback welcome). Here are the main ideas:

  • Conformance test suite & harness:
    • We build a large set of English-written test scenarios
    • We implement test cases that exercise each and every of these scenarios in each SDK
    • We record the transport layer chatter of client & server connections and store it as human-reviewable protocol trace “goldens”
      • Golden for a given test case is are shared across SDKs
    • We can develop / update new SDKs against these golden traces
  • Protocol Debugger:
    • Takes in a trace (set of JSON RPC Messages of an MCP connection), flags any compliance violations (broken MUSTs) warns against unimplemented recommendations (broken SHOULDs).
    • Beyond JSON RPC, can analyze transport-specific compliance, and auth compliance.
    • Used in the conformance test harness to pre-analyze new goldens before letting humans review them
    • Can be used on live traffic (e.g. hard-wired in an SDK, or as a man-in-the-middle debugger)

Metadata

Metadata

Assignees

No one assigned

    Labels

    SEPdraftSEP proposal with a sponsor.

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions