datafusion.contextยถ

Session Context and itโ€™s associated configuration.

Classesยถ

ArrowArrayExportable

Type hint for object exporting Arrow C Array via Arrow PyCapsule Interface.

ArrowStreamExportable

Type hint for object exporting Arrow C Stream via Arrow PyCapsule Interface.

CatalogProviderExportable

Type hint for object that has __datafusion_catalog_provider__ PyCapsule.

RuntimeConfig

See RuntimeEnvBuilder.

RuntimeEnvBuilder

Runtime configuration options.

SQLOptions

Options to be used when performing SQL queries.

SessionConfig

Session configuration options.

SessionContext

This is the main interface for executing queries and creating DataFrames.

TableProviderExportable

Type hint for object that has __datafusion_table_provider__ PyCapsule.

Module Contentsยถ

class datafusion.context.ArrowArrayExportableยถ

Bases: Protocol

Type hint for object exporting Arrow C Array via Arrow PyCapsule Interface.

https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html

__arrow_c_array__(requested_schema: object | None = None) tuple[object, object]ยถ
class datafusion.context.ArrowStreamExportableยถ

Bases: Protocol

Type hint for object exporting Arrow C Stream via Arrow PyCapsule Interface.

https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html

__arrow_c_stream__(requested_schema: object | None = None) objectยถ
class datafusion.context.CatalogProviderExportableยถ

Bases: Protocol

Type hint for object that has __datafusion_catalog_provider__ PyCapsule.

https://docs.rs/datafusion/latest/datafusion/catalog/trait.CatalogProvider.html

__datafusion_catalog_provider__(session: Any) objectยถ
class datafusion.context.RuntimeConfigยถ

Bases: RuntimeEnvBuilder

See RuntimeEnvBuilder.

Create a new RuntimeEnvBuilder with default values.

class datafusion.context.RuntimeEnvBuilderยถ

Runtime configuration options.

Create a new RuntimeEnvBuilder with default values.

with_disk_manager_disabled() RuntimeEnvBuilderยถ

Disable the disk manager, attempts to create temporary files will error.

Returns:

A new RuntimeEnvBuilder object with the updated setting.

with_disk_manager_os() RuntimeEnvBuilderยถ

Use the operating systemโ€™s temporary directory for disk manager.

Returns:

A new RuntimeEnvBuilder object with the updated setting.

with_disk_manager_specified(*paths: str | pathlib.Path) RuntimeEnvBuilderยถ

Use the specified paths for the disk managerโ€™s temporary files.

Parameters:

paths โ€“ Paths to use for the disk managerโ€™s temporary files.

Returns:

A new RuntimeEnvBuilder object with the updated setting.

with_fair_spill_pool(size: int) RuntimeEnvBuilderยถ

Use a fair spill pool with the specified size.

This pool works best when you know beforehand the query has multiple spillable operators that will likely all need to spill. Sometimes it will cause spills even when there was sufficient memory (reserved for other operators) to avoid doing so:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€zโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€zโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                       z                      z               โ”‚
โ”‚                       z                      z               โ”‚
โ”‚       Spillable       z       Unspillable    z     Free      โ”‚
โ”‚        Memory         z        Memory        z    Memory     โ”‚
โ”‚                       z                      z               โ”‚
โ”‚                       z                      z               โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€zโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€zโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
Parameters:

size โ€“ Size of the memory pool in bytes.

Returns:

A new RuntimeEnvBuilder object with the updated setting.

Examples usage:

config = RuntimeEnvBuilder().with_fair_spill_pool(1024)
with_greedy_memory_pool(size: int) RuntimeEnvBuilderยถ

Use a greedy memory pool with the specified size.

This pool works well for queries that do not need to spill or have a single spillable operator. See with_fair_spill_pool() if there are multiple spillable operators that all will spill.

Parameters:

size โ€“ Size of the memory pool in bytes.

Returns:

A new RuntimeEnvBuilder object with the updated setting.

Example usage:

config = RuntimeEnvBuilder().with_greedy_memory_pool(1024)
with_temp_file_path(path: str | pathlib.Path) RuntimeEnvBuilderยถ

Use the specified path to create any needed temporary files.

Parameters:

path โ€“ Path to use for temporary files.

Returns:

A new RuntimeEnvBuilder object with the updated setting.

Example usage:

config = RuntimeEnvBuilder().with_temp_file_path("/tmp")
with_unbounded_memory_pool() RuntimeEnvBuilderยถ

Use an unbounded memory pool.

Returns:

A new RuntimeEnvBuilder object with the updated setting.

config_internalยถ
class datafusion.context.SQLOptionsยถ

Options to be used when performing SQL queries.

Create a new SQLOptions with default values.

The default values are: - DDL commands are allowed - DML commands are allowed - Statements are allowed

with_allow_ddl(allow: bool = True) SQLOptionsยถ

Should DDL (Data Definition Language) commands be run?

Examples of DDL commands include CREATE TABLE and DROP TABLE.

Parameters:

allow โ€“ Allow DDL commands to be run.

Returns:

A new SQLOptions object with the updated setting.

Example usage:

options = SQLOptions().with_allow_ddl(True)
with_allow_dml(allow: bool = True) SQLOptionsยถ

Should DML (Data Manipulation Language) commands be run?

Examples of DML commands include INSERT INTO and DELETE.

Parameters:

allow โ€“ Allow DML commands to be run.

Returns:

A new SQLOptions object with the updated setting.

Example usage:

options = SQLOptions().with_allow_dml(True)
with_allow_statements(allow: bool = True) SQLOptionsยถ

Should statements such as SET VARIABLE and BEGIN TRANSACTION be run?

Parameters:

allow โ€“ Allow statements to be run.

Returns:

py:class:SQLOptions` object with the updated setting.

Return type:

A new

Example usage:

options = SQLOptions().with_allow_statements(True)
options_internalยถ
class datafusion.context.SessionConfig(config_options: dict[str, str] | None = None)ยถ

Session configuration options.

Create a new SessionConfig with the given configuration options.

Parameters:

config_options โ€“ Configuration options.

set(key: str, value: str) SessionConfigยถ

Set a configuration option.

Args: key: Option key. value: Option value.

Returns:

A new SessionConfig object with the updated setting.

with_batch_size(batch_size: int) SessionConfigยถ

Customize batch size.

Parameters:

batch_size โ€“ Batch size.

Returns:

A new SessionConfig object with the updated setting.

with_create_default_catalog_and_schema(enabled: bool = True) SessionConfigยถ

Control if the default catalog and schema will be automatically created.

Parameters:

enabled โ€“ Whether the default catalog and schema will be automatically created.

Returns:

A new SessionConfig object with the updated setting.

with_default_catalog_and_schema(catalog: str, schema: str) SessionConfigยถ

Select a name for the default catalog and schema.

Parameters:
  • catalog โ€“ Catalog name.

  • schema โ€“ Schema name.

Returns:

A new SessionConfig object with the updated setting.

with_information_schema(enabled: bool = True) SessionConfigยถ

Enable or disable the inclusion of information_schema virtual tables.

Parameters:

enabled โ€“ Whether to include information_schema virtual tables.

Returns:

A new SessionConfig object with the updated setting.

with_parquet_pruning(enabled: bool = True) SessionConfigยถ

Enable or disable the use of pruning predicate for parquet readers.

Pruning predicates will enable the reader to skip row groups.

Parameters:

enabled โ€“ Whether to use pruning predicate for parquet readers.

Returns:

A new SessionConfig object with the updated setting.

with_repartition_aggregations(enabled: bool = True) SessionConfigยถ

Enable or disable the use of repartitioning for aggregations.

Enabling this improves parallelism.

Parameters:

enabled โ€“ Whether to use repartitioning for aggregations.

Returns:

A new SessionConfig object with the updated setting.

with_repartition_file_min_size(size: int) SessionConfigยถ

Set minimum file range size for repartitioning scans.

Parameters:

size โ€“ Minimum file range size.

Returns:

A new SessionConfig object with the updated setting.

with_repartition_file_scans(enabled: bool = True) SessionConfigยถ

Enable or disable the use of repartitioning for file scans.

Parameters:

enabled โ€“ Whether to use repartitioning for file scans.

Returns:

A new SessionConfig object with the updated setting.

with_repartition_joins(enabled: bool = True) SessionConfigยถ

Enable or disable the use of repartitioning for joins to improve parallelism.

Parameters:

enabled โ€“ Whether to use repartitioning for joins.

Returns:

A new SessionConfig object with the updated setting.

with_repartition_sorts(enabled: bool = True) SessionConfigยถ

Enable or disable the use of repartitioning for window functions.

This may improve parallelism.

Parameters:

enabled โ€“ Whether to use repartitioning for window functions.

Returns:

A new SessionConfig object with the updated setting.

with_repartition_windows(enabled: bool = True) SessionConfigยถ

Enable or disable the use of repartitioning for window functions.

This may improve parallelism.

Parameters:

enabled โ€“ Whether to use repartitioning for window functions.

Returns:

A new SessionConfig object with the updated setting.

with_target_partitions(target_partitions: int) SessionConfigยถ

Customize the number of target partitions for query execution.

Increasing partitions can increase concurrency.

Parameters:

target_partitions โ€“ Number of target partitions.

Returns:

A new SessionConfig object with the updated setting.

config_internalยถ
class datafusion.context.SessionContext(config: SessionConfig | None = None, runtime: RuntimeEnvBuilder | None = None)ยถ

This is the main interface for executing queries and creating DataFrames.

See Concepts in the online documentation for more information.

Main interface for executing queries with DataFusion.

Maintains the state of the connection between a user and an instance of the connection between a user and an instance of the DataFusion engine.

Parameters:
  • config โ€“ Session configuration options.

  • runtime โ€“ Runtime configuration options.

Example usage:

The following example demonstrates how to use the context to execute a query against a CSV data source using the DataFrame API:

from datafusion import SessionContext

ctx = SessionContext()
df = ctx.read_csv("data.csv")
__datafusion_logical_extension_codec__() Anyยถ

Access the PyCapsule FFI_LogicalExtensionCodec.

__datafusion_task_context_provider__() Anyยถ

Access the PyCapsule FFI_TaskContextProvider.

__repr__() strยถ

Print a string representation of the Session Context.

static _convert_file_sort_order(file_sort_order: collections.abc.Sequence[collections.abc.Sequence[datafusion.expr.SortKey]] | None) list[list[datafusion._internal.expr.SortExpr]] | Noneยถ

Convert nested SortKey sequences into raw sort expressions.

Each SortKey can be a column name string, an Expr, or a SortExpr and will be converted using datafusion.expr.sort_list_to_raw_sort_list().

static _convert_table_partition_cols(table_partition_cols: list[tuple[str, str | pyarrow.DataType]]) list[tuple[str, pyarrow.DataType]]ยถ
catalog(name: str = 'datafusion') datafusion.catalog.Catalogยถ

Retrieve a catalog by name.

catalog_names() set[str]ยถ

Returns the list of catalogs in this context.

create_dataframe(partitions: list[list[pyarrow.RecordBatch]], name: str | None = None, schema: pyarrow.Schema | None = None) datafusion.dataframe.DataFrameยถ

Create and return a dataframe using the provided partitions.

Parameters:
  • partitions โ€“ pa.RecordBatch partitions to register.

  • name โ€“ Resultant dataframe name.

  • schema โ€“ Schema for the partitions.

Returns:

DataFrame representation of the SQL query.

create_dataframe_from_logical_plan(plan: datafusion.plan.LogicalPlan) datafusion.dataframe.DataFrameยถ

Create a DataFrame from an existing plan.

Parameters:

plan โ€“ Logical plan.

Returns:

DataFrame representation of the logical plan.

deregister_table(name: str) Noneยถ

Remove a table from the session.

empty_table() datafusion.dataframe.DataFrameยถ

Create an empty DataFrame.

enable_url_table() SessionContextยถ

Control if local files can be queried as tables.

Returns:

A new SessionContext object with url table enabled.

execute(plan: datafusion.plan.ExecutionPlan, partitions: int) datafusion.record_batch.RecordBatchStreamยถ

Execute the plan and return the results.

from_arrow(data: ArrowStreamExportable | ArrowArrayExportable, name: str | None = None) datafusion.dataframe.DataFrameยถ

Create a DataFrame from an Arrow source.

The Arrow data source can be any object that implements either __arrow_c_stream__ or __arrow_c_array__. For the latter, it must return a struct array.

Arrow data can be Polars, Pandas, Pyarrow etc.

Parameters:
  • data โ€“ Arrow data source.

  • name โ€“ Name of the DataFrame.

Returns:

DataFrame representation of the Arrow table.

from_arrow_table(data: pyarrow.Table, name: str | None = None) datafusion.dataframe.DataFrameยถ

Create a DataFrame from an Arrow table.

This is an alias for from_arrow().

from_pandas(data: pandas.DataFrame, name: str | None = None) datafusion.dataframe.DataFrameยถ

Create a DataFrame from a Pandas DataFrame.

Parameters:
  • data โ€“ Pandas DataFrame.

  • name โ€“ Name of the DataFrame.

Returns:

DataFrame representation of the Pandas DataFrame.

from_polars(data: polars.DataFrame, name: str | None = None) datafusion.dataframe.DataFrameยถ

Create a DataFrame from a Polars DataFrame.

Parameters:
  • data โ€“ Polars DataFrame.

  • name โ€“ Name of the DataFrame.

Returns:

DataFrame representation of the Polars DataFrame.

from_pydict(data: dict[str, list[Any]], name: str | None = None) datafusion.dataframe.DataFrameยถ

Create a DataFrame from a dictionary.

Parameters:
  • data โ€“ Dictionary of lists.

  • name โ€“ Name of the DataFrame.

Returns:

DataFrame representation of the dictionary of lists.

from_pylist(data: list[dict[str, Any]], name: str | None = None) datafusion.dataframe.DataFrameยถ

Create a DataFrame from a list.

Parameters:
  • data โ€“ List of dictionaries.

  • name โ€“ Name of the DataFrame.

Returns:

DataFrame representation of the list of dictionaries.

classmethod global_ctx() SessionContextยถ

Retrieve the global context as a SessionContext wrapper.

Returns:

A SessionContext object that wraps the global SessionContextInternal.

read_avro(path: str | pathlib.Path, schema: pyarrow.Schema | None = None, file_partition_cols: list[tuple[str, str | pyarrow.DataType]] | None = None, file_extension: str = '.avro') datafusion.dataframe.DataFrameยถ

Create a DataFrame for reading Avro data source.

Parameters:
  • path โ€“ Path to the Avro file.

  • schema โ€“ The data source schema.

  • file_partition_cols โ€“ Partition columns.

  • file_extension โ€“ File extension to select.

Returns:

DataFrame representation of the read Avro file

read_csv(path: str | pathlib.Path | list[str] | list[pathlib.Path], schema: pyarrow.Schema | None = None, has_header: bool = True, delimiter: str = ',', schema_infer_max_records: int = DEFAULT_MAX_INFER_SCHEMA, file_extension: str = '.csv', table_partition_cols: list[tuple[str, str | pyarrow.DataType]] | None = None, file_compression_type: str | None = None, options: datafusion.options.CsvReadOptions | None = None) datafusion.dataframe.DataFrameยถ

Read a CSV data source.

Parameters:
  • path โ€“ Path to the CSV file

  • schema โ€“ An optional schema representing the CSV files. If None, the CSV reader will try to infer it based on data in file.

  • has_header โ€“ Whether the CSV file have a header. If schema inference is run on a file with no headers, default column names are created.

  • delimiter โ€“ An optional column delimiter.

  • schema_infer_max_records โ€“ Maximum number of rows to read from CSV files for schema inference if needed.

  • file_extension โ€“ File extension; only files with this extension are selected for data input.

  • table_partition_cols โ€“ Partition columns.

  • file_compression_type โ€“ File compression type.

  • options โ€“ Set advanced options for CSV reading. This cannot be combined with any of the other options in this method.

Returns:

DataFrame representation of the read CSV files

read_json(path: str | pathlib.Path, schema: pyarrow.Schema | None = None, schema_infer_max_records: int = 1000, file_extension: str = '.json', table_partition_cols: list[tuple[str, str | pyarrow.DataType]] | None = None, file_compression_type: str | None = None) datafusion.dataframe.DataFrameยถ

Read a line-delimited JSON data source.

Parameters:
  • path โ€“ Path to the JSON file.

  • schema โ€“ The data source schema.

  • schema_infer_max_records โ€“ Maximum number of rows to read from JSON files for schema inference if needed.

  • file_extension โ€“ File extension; only files with this extension are selected for data input.

  • table_partition_cols โ€“ Partition columns.

  • file_compression_type โ€“ File compression type.

Returns:

DataFrame representation of the read JSON files.

read_parquet(path: str | pathlib.Path, table_partition_cols: list[tuple[str, str | pyarrow.DataType]] | None = None, parquet_pruning: bool = True, file_extension: str = '.parquet', skip_metadata: bool = True, schema: pyarrow.Schema | None = None, file_sort_order: collections.abc.Sequence[collections.abc.Sequence[datafusion.expr.SortKey]] | None = None) datafusion.dataframe.DataFrameยถ

Read a Parquet source into a Dataframe.

Parameters:
  • path โ€“ Path to the Parquet file.

  • table_partition_cols โ€“ Partition columns.

  • parquet_pruning โ€“ Whether the parquet reader should use the predicate to prune row groups.

  • file_extension โ€“ File extension; only files with this extension are selected for data input.

  • skip_metadata โ€“ Whether the parquet reader should skip any metadata that may be in the file schema. This can help avoid schema conflicts due to metadata.

  • schema โ€“ An optional schema representing the parquet files. If None, the parquet reader will try to infer it based on data in the file.

  • file_sort_order โ€“ Sort order for the file. Each sort key can be specified as a column name (str), an expression (Expr), or a SortExpr.

Returns:

DataFrame representation of the read Parquet files

read_table(table: datafusion.catalog.Table | TableProviderExportable | datafusion.dataframe.DataFrame | pyarrow.dataset.Dataset) datafusion.dataframe.DataFrameยถ

Creates a DataFrame from a table.

register_avro(name: str, path: str | pathlib.Path, schema: pyarrow.Schema | None = None, file_extension: str = '.avro', table_partition_cols: list[tuple[str, str | pyarrow.DataType]] | None = None) Noneยถ

Register an Avro file as a table.

The registered table can be referenced from SQL statement executed against this context.

Parameters:
  • name โ€“ Name of the table to register.

  • path โ€“ Path to the Avro file.

  • schema โ€“ The data source schema.

  • file_extension โ€“ File extension to select.

  • table_partition_cols โ€“ Partition columns.

register_catalog_provider(name: str, provider: CatalogProviderExportable | datafusion.catalog.CatalogProvider | datafusion.catalog.Catalog) Noneยถ

Register a catalog provider.

register_csv(name: str, path: str | pathlib.Path | list[str | pathlib.Path], schema: pyarrow.Schema | None = None, has_header: bool = True, delimiter: str = ',', schema_infer_max_records: int = DEFAULT_MAX_INFER_SCHEMA, file_extension: str = '.csv', file_compression_type: str | None = None, options: datafusion.options.CsvReadOptions | None = None) Noneยถ

Register a CSV file as a table.

The registered table can be referenced from SQL statement executed against.

Parameters:
  • name โ€“ Name of the table to register.

  • path โ€“ Path to the CSV file. It also accepts a list of Paths.

  • schema โ€“ An optional schema representing the CSV file. If None, the CSV reader will try to infer it based on data in file.

  • has_header โ€“ Whether the CSV file have a header. If schema inference is run on a file with no headers, default column names are created.

  • delimiter โ€“ An optional column delimiter.

  • schema_infer_max_records โ€“ Maximum number of rows to read from CSV files for schema inference if needed.

  • file_extension โ€“ File extension; only files with this extension are selected for data input.

  • file_compression_type โ€“ File compression type.

  • options โ€“ Set advanced options for CSV reading. This cannot be combined with any of the other options in this method.

register_dataset(name: str, dataset: pyarrow.dataset.Dataset) Noneยถ

Register a pa.dataset.Dataset as a table.

Parameters:
  • name โ€“ Name of the table to register.

  • dataset โ€“ PyArrow dataset.

register_json(name: str, path: str | pathlib.Path, schema: pyarrow.Schema | None = None, schema_infer_max_records: int = 1000, file_extension: str = '.json', table_partition_cols: list[tuple[str, str | pyarrow.DataType]] | None = None, file_compression_type: str | None = None) Noneยถ

Register a JSON file as a table.

The registered table can be referenced from SQL statement executed against this context.

Parameters:
  • name โ€“ Name of the table to register.

  • path โ€“ Path to the JSON file.

  • schema โ€“ The data source schema.

  • schema_infer_max_records โ€“ Maximum number of rows to read from JSON files for schema inference if needed.

  • file_extension โ€“ File extension; only files with this extension are selected for data input.

  • table_partition_cols โ€“ Partition columns.

  • file_compression_type โ€“ File compression type.

register_listing_table(name: str, path: str | pathlib.Path, table_partition_cols: list[tuple[str, str | pyarrow.DataType]] | None = None, file_extension: str = '.parquet', schema: pyarrow.Schema | None = None, file_sort_order: collections.abc.Sequence[collections.abc.Sequence[datafusion.expr.SortKey]] | None = None) Noneยถ

Register multiple files as a single table.

Registers a Table that can assemble multiple files from locations in an ObjectStore instance.

Parameters:
  • name โ€“ Name of the resultant table.

  • path โ€“ Path to the file to register.

  • table_partition_cols โ€“ Partition columns.

  • file_extension โ€“ File extension of the provided table.

  • schema โ€“ The data source schema.

  • file_sort_order โ€“ Sort order for the file. Each sort key can be specified as a column name (str), an expression (Expr), or a SortExpr.

register_object_store(schema: str, store: Any, host: str | None = None) Noneยถ

Add a new object store into the session.

Parameters:
  • schema โ€“ The data source schema.

  • store โ€“ The ObjectStore to register.

  • host โ€“ URL for the host.

register_parquet(name: str, path: str | pathlib.Path, table_partition_cols: list[tuple[str, str | pyarrow.DataType]] | None = None, parquet_pruning: bool = True, file_extension: str = '.parquet', skip_metadata: bool = True, schema: pyarrow.Schema | None = None, file_sort_order: collections.abc.Sequence[collections.abc.Sequence[datafusion.expr.SortKey]] | None = None) Noneยถ

Register a Parquet file as a table.

The registered table can be referenced from SQL statement executed against this context.

Parameters:
  • name โ€“ Name of the table to register.

  • path โ€“ Path to the Parquet file.

  • table_partition_cols โ€“ Partition columns.

  • parquet_pruning โ€“ Whether the parquet reader should use the predicate to prune row groups.

  • file_extension โ€“ File extension; only files with this extension are selected for data input.

  • skip_metadata โ€“ Whether the parquet reader should skip any metadata that may be in the file schema. This can help avoid schema conflicts due to metadata.

  • schema โ€“ The data source schema.

  • file_sort_order โ€“ Sort order for the file. Each sort key can be specified as a column name (str), an expression (Expr), or a SortExpr.

register_record_batches(name: str, partitions: list[list[pyarrow.RecordBatch]]) Noneยถ

Register record batches as a table.

This function will convert the provided partitions into a table and register it into the session using the given name.

Parameters:
  • name โ€“ Name of the resultant table.

  • partitions โ€“ Record batches to register as a table.

register_table(name: str, table: datafusion.catalog.Table | TableProviderExportable | datafusion.dataframe.DataFrame | pyarrow.dataset.Dataset) Noneยถ

Register a Table with this context.

The registered table can be referenced from SQL statements executed against this context.

Parameters:
  • name โ€“ Name of the resultant table.

  • table โ€“ Any object that can be converted into a Table.

register_table_provider(name: str, provider: datafusion.catalog.Table | TableProviderExportable | datafusion.dataframe.DataFrame | pyarrow.dataset.Dataset) Noneยถ

Register a table provider.

Deprecated: use register_table() instead.

register_udaf(udaf: datafusion.user_defined.AggregateUDF) Noneยถ

Register a user-defined aggregation function (UDAF) with the context.

register_udf(udf: datafusion.user_defined.ScalarUDF) Noneยถ

Register a user-defined function (UDF) with the context.

register_udtf(func: datafusion.user_defined.TableFunction) Noneยถ

Register a user defined table function.

register_udwf(udwf: datafusion.user_defined.WindowUDF) Noneยถ

Register a user-defined window function (UDWF) with the context.

register_view(name: str, df: datafusion.dataframe.DataFrame) Noneยถ

Register a DataFrame as a view.

Parameters:
  • name (str) โ€“ The name to register the view under.

  • df (DataFrame) โ€“ The DataFrame to be converted into a view and registered.

session_id() strยถ

Return an id that uniquely identifies this SessionContext.

sql(query: str, options: SQLOptions | None = None, param_values: dict[str, Any] | None = None, **named_params: Any) datafusion.dataframe.DataFrameยถ

Create a DataFrame from SQL query text.

See the online documentation for a description of how to perform parameterized substitution via either the param_values option or passing in named_params.

Note: This API implements DDL statements such as CREATE TABLE and CREATE VIEW and DML statements such as INSERT INTO with in-memory default implementation.See sql_with_options().

Parameters:
  • query โ€“ SQL query text.

  • options โ€“ If provided, the query will be validated against these options.

  • param_values โ€“ Provides substitution of scalar values in the query after parsing.

  • named_params โ€“ Provides string or DataFrame substitution in the query string.

Returns:

DataFrame representation of the SQL query.

sql_with_options(query: str, options: SQLOptions, param_values: dict[str, Any] | None = None, **named_params: Any) datafusion.dataframe.DataFrameยถ

Create a DataFrame from SQL query text.

This function will first validate that the query is allowed by the provided options.

Parameters:
  • query โ€“ SQL query text.

  • options โ€“ SQL options.

  • param_values โ€“ Provides substitution of scalar values in the query after parsing.

  • named_params โ€“ Provides string or DataFrame substitution in the query string.

Returns:

DataFrame representation of the SQL query.

table(name: str) datafusion.dataframe.DataFrameยถ

Retrieve a previously registered table by name.

table_exist(name: str) boolยถ

Return whether a table with the given name exists.

with_logical_extension_codec(codec: Any) SessionContextยถ

Create a new session context with specified codec.

This only supports codecs that have been implemented using the FFI interface.

ctxยถ
class datafusion.context.TableProviderExportableยถ

Bases: Protocol

Type hint for object that has __datafusion_table_provider__ PyCapsule.

https://datafusion.apache.org/python/user-guide/io/table_provider.html

__datafusion_table_provider__(session: Any) objectยถ