Build Your Own Connection
Overview
Delta Sharing is an open protocol for secure real-time data sharing, allowing organizations to share data across different computing platforms. This guide will walk you through the process of connecting to and accessing data through Delta Sharing.
Resources
Delta Sharing Connector Options
Python Connector
Apache Spark Connector
Set up an Interactive Shell
Set up a Standalone Project
Python Connector
The Delta Sharing Python Connector is a Python library that implements the Delta Sharing Protocol to read tables from a Delta Sharing server. You can load shared tables as a pandas DataFrame, or as an Apache Spark DataFrame if running in PySpark with the Apache Spark Connector installed.
System Requirements
Python 3.8+ for delta-sharing version 1.1+
Python 3.6+ for older versions
If running Linux, glibc version >= 2.31
For automatic delta-kernel-rust-sharing-wrapper package installation, please see next section for more details.
Installation Process
Unsetpip3 install delta-sharing
If you are using Databricks Runtime, you can follow Databricks Libraries doc to install the library on your clusters.
If this doesn’t work because of an issue downloading delta-kernel-rust-sharing-wrapper try the following:
Check python3 version >= 3.8
Upgrade your pip3 to the latest version
Accessing Shared Data
The connector accesses shared tables based on profile files, which are JSON files containing a user's credentials to access a Delta Sharing server. We have several ways to get started:
Before You Begin
Download a profile file from your data provider.
Accessing Shared Data Options
After you save the profile file, you can use it in the connector to access shared tables.
import delta_sharing
Point to the profile file. It can be a file on the local file system or a file on a remote storage.
profile_file = ""
Create a SharingClient.
client = delta_sharing.SharingClient(profile_file)
List all shared tables.
client.list_all_tables()
Create a url to access a shared table.
A table path is the profile file path following with `#` and the fully qualified name of a table.
(`..`).
table_url = profile_file + "#.."
Fetch 10 rows from a table and convert it to a Pandas DataFrame. This can be used to read sample data from a table that cannot fit in the memory.
delta_sharing.load_as_pandas(table_url, limit=10)
Load a table as a Pandas DataFrame. This can be used to process tables that can fit in the memory.
delta_sharing.load_as_pandas(table_url)
Load a table as a Pandas DataFrame explicitly using Delta Format
delta_sharing.load_as_pandas(table_url, use_delta_format = True)
If the code is running with PySpark, you can use `load_as_spark` to load the table as a Spark DataFrame.
delta_sharing.load_as_spark(table_url)
If the table supports history sharing(tableConfig.cdfEnabled=true in the OSS Delta Sharing Server), the connector can query table changes.
Load table changes from version 0 to version 5, as a Pandas DataFrame.
delta_sharing.load_table_changes_as_pandas(table_url, starting_version=0, ending_version=5)
If the code is running with PySpark, you can load table changes as Spark DataFrame.
delta_sharing.load_table_changes_as_spark(table_url, starting_version=0, ending_version=5)
Apache Spark Connector
The Apache Spark Connector implements the Delta Sharing Protocol to read shared tables from a Delta Sharing Server. It can be used in SQL, Python, Java, Scala and R.
System Requirements
Java 8+
Scala 2.12.x
Apache Spark 3+ or Databricks Runtime 9+
Accessing Shared Data
The connector loads user credentials from profile files.
Configuring Apache Spark
You can set up Apache Spark to load the Delta Sharing connector in the following twoways:
Run interactively: Start the Spark shell (Scala or Python) with the Delta Sharing connector and run the code snippets interactively in the shell.
Run as a project: Set up a Maven or SBT project (Scala or Java) with the Delta Sharing connector, copy the code snippets into a source file, and run the project.
If you are using Databricks Runtime, you can skip this section and follow Databricks Libraries doc to install the connector on your clusters.
Set up an interactive shell
To use Delta Sharing connector interactively within the Spark’s Scala/Python shell, you can launch the shells as follows.
PySpark Shell
Unsetpyspark --packages io.delta:delta-sharing-spark_2.12:3.1.0
Scala Shell
Unsetbin/spark-shell --packages
io.delta:delta-sharing-spark_2.12:3.1.0
Set up a standalone project
If you want to build a Java/Scala project using Delta Sharing connector from Maven Central Repository, you can use the following Maven coordinates.
Maven
You include Delta Sharing connector in your Maven project by adding it as a dependency in your POM file. Delta Sharing connector is compiled with Scala 2.12.
io.delta
delta-sharing-spark_2.12
3.1.0