Input/Output#
Synthetic Data#
| Creates a  | |
| Creates a  | 
Python Objects#
| Create a  | 
Parquet#
| Creates a  | |
| Create  | |
| Writes the  | 
CSV#
| Creates a  | |
| Writes the  | 
JSON#
| Creates a  | |
| Writes the  | 
Text#
| Create a  | 
Audio#
| Creates a  | 
Avro#
| Create a  | 
Images#
| Creates a  | |
| Writes the  | 
Binary#
| Create a  | 
TFRecords#
| Create a  | |
| Write the  | |
| Specifies read options when reading TFRecord files with TFX. | 
Pandas#
| Create a  | |
| Create a  | |
| Convert this  | |
| Converts this  | 
NumPy#
| Create an Arrow dataset from numpy files. | |
| Creates a  | |
| Creates a  | |
| Writes a column of the  | |
| Converts this  | 
Arrow#
| Create a  | |
| Create a  | |
| Convert this  | 
MongoDB#
| Create a  | |
| Writes the  | 
BigQuery#
| 
 | Create a dataset from BigQuery. | 
| 
 | Write the dataset to a BigQuery dataset table. | 
SQL Databases#
| Read from a database that provides a Python DB API2-compliant connector. | |
| Write to a database that provides a Python DB API2-compliant connector. | 
Databricks#
| Read a Databricks unity catalog table or Databricks SQL execution result. | 
Delta Sharing#
| Read data from a Delta Sharing table. | 
Hudi#
| Create a  | 
Iceberg#
| Create a  | 
Lance#
| Create a  | 
ClickHouse#
| Create a  | 
Dask#
| Create a  | |
| Convert this  | 
Spark#
| Create a  | |
| Convert this  | 
Modin#
| Create a  | |
| Convert this  | 
Mars#
| Create a  | |
| Convert this  | 
Torch#
| Create a  | 
Hugging Face#
| Create a  | 
TensorFlow#
| Create a  | 
Video#
| Creates a  | 
WebDataset#
| Create a  | 
Datasource API#
| Read a stream from a custom  | |
| Interface for defining a custom  | |
| A function used to read blocks from the  | |
| Generates filenames when you write a  | 
Datasink API#
| Writes the dataset to a custom  | |
| Interface for defining write-related logic. | |
| A datasink that writes one row to each file. | |
| A datasink that writes multiple rows to each file. | |
| File-based datasource for reading files. | |
| Aggregated result of the Datasink write operations. | |
| Type variable. | 
Partitioning API#
| Partition scheme used to describe path-based partitions. | |
| Supported dataset partition styles. | |
| Partition parser for path-based partition formats. | |
| Partition filter for path-based partition formats. | 
MetadataProvider API#
| Abstract callable that provides metadata for the files of a single dataset block. | |
| Abstract callable that provides metadata for  | |
| Default metadata provider for  | |
| Provides block metadata for Arrow Parquet file fragments. | |
| Fast Metadata provider for  | 
Shuffling API#
| Configuration for file shuffling. |