Getting Started With Analytics

Build Your Own Connection

Overview

Delta Sharing is an open protocol for secure real-time data sharing, allowing organizations to share data across different computing platforms. This guide will walk you through the process of connecting to and accessing data through Delta Sharing.

Resources

Delta Sharing Connector Options

Python Connector
Apache Spark Connector
Set up an Interactive Shell
Set up a Standalone Project

Python Connector

The Delta Sharing Python Connector is a Python library that implements the Delta Sharing Protocol to read tables from a Delta Sharing server. You can load shared tables as a pandas DataFrame, or as an Apache Spark DataFrame if running in PySpark with the Apache Spark Connector installed.

System Requirements

Python 3.8+ for delta-sharing version 1.1+
Python 3.6+ for older versions
If running Linux, glibc version >= 2.31
For automatic delta-kernel-rust-sharing-wrapper package installation, please see next section for more details.

Installation Process

Unsetpip3 install delta-sharing

If you are using Databricks Runtime, you can follow Databricks Libraries doc to install the library on your clusters.
If this doesn’t work because of an issue downloading delta-kernel-rust-sharing-wrapper try the following:
- Check python3 version >= 3.8
- Upgrade your pip3 to the latest version

Accessing Shared Data

The connector accesses shared tables based on profile files, which are JSON files containing a user's credentials to access a Delta Sharing server. We have several ways to get started:

Before You Begin

Download a profile file from your data provider.

Accessing Shared Data Options

After you save the profile file, you can use it in the connector to access shared tables.

import delta_sharing

Point to the profile file. It can be a file on the local file system or a file on a remote storage.
- profile_file = ""
Create a SharingClient.
- client = delta_sharing.SharingClient(profile_file)
List all shared tables.
- client.list_all_tables()
Create a url to access a shared table.
A table path is the profile file path following with `#` and the fully qualified name of a table.
(`..`).
- table_url = profile_file + "#.."
Fetch 10 rows from a table and convert it to a Pandas DataFrame. This can be used to read sample data from a table that cannot fit in the memory.
- delta_sharing.load_as_pandas(table_url, limit=10)
Load a table as a Pandas DataFrame. This can be used to process tables that can fit in the memory.
- delta_sharing.load_as_pandas(table_url)
Load a table as a Pandas DataFrame explicitly using Delta Format
- delta_sharing.load_as_pandas(table_url, use_delta_format = True)
If the code is running with PySpark, you can use `load_as_spark` to load the table as a Spark DataFrame.
- delta_sharing.load_as_spark(table_url)
If the table supports history sharing(tableConfig.cdfEnabled=true in the OSS Delta Sharing Server), the connector can query table changes.
Load table changes from version 0 to version 5, as a Pandas DataFrame.
- delta_sharing.load_table_changes_as_pandas(table_url, starting_version=0, ending_version=5)
If the code is running with PySpark, you can load table changes as Spark DataFrame.
- delta_sharing.load_table_changes_as_spark(table_url, starting_version=0, ending_version=5)

Apache Spark Connector

The Apache Spark Connector implements the Delta Sharing Protocol to read shared tables from a Delta Sharing Server. It can be used in SQL, Python, Java, Scala and R.

System Requirements

Java 8+
Scala 2.12.x
Apache Spark 3+ or Databricks Runtime 9+

Accessing Shared Data

The connector loads user credentials from profile files.

Configuring Apache Spark

You can set up Apache Spark to load the Delta Sharing connector in the following twoways:

Run interactively: Start the Spark shell (Scala or Python) with the Delta Sharing connector and run the code snippets interactively in the shell.
Run as a project: Set up a Maven or SBT project (Scala or Java) with the Delta Sharing connector, copy the code snippets into a source file, and run the project.

If you are using Databricks Runtime, you can skip this section and follow Databricks Libraries doc to install the connector on your clusters.

Set up an interactive shell

To use Delta Sharing connector interactively within the Spark’s Scala/Python shell, you can launch the shells as follows.

PySpark Shell

Unsetpyspark --packages io.delta:delta-sharing-spark_2.12:3.1.0

Scala Shell

Unsetbin/spark-shell --packages
io.delta:delta-sharing-spark_2.12:3.1.0

Set up a standalone project

If you want to build a Java/Scala project using Delta Sharing connector from Maven Central Repository, you can use the following Maven coordinates.

Maven

You include Delta Sharing connector in your Maven project by adding it as a dependency in your POM file. Delta Sharing connector is compiled with Scala 2.12.

io.delta

delta-sharing-spark_2.12

3.1.0

Connect to Amazon S3 Using Python

Support

Support

Build Your Own Connection

Overview

Resources

Getting Started Guides

General Procore Guide

Procore Pay

Accounting Integrations

Process Guides

Process Flow Diagrams

Glossary of Terms

Top Product Manuals

Daily Log

Company Documents

Drawings

RFIs

Submittals

Featured Product Manuals

Company Admin

Commitments

Company Directory

Drawings

Submittals

See All Product Manuals

Developer Portal

Certifications

Training Video Library

Permissions Breakdown

System Status

Community

Product Updates

Webinars

Developer Portal

United States (English)

France (Français)

新加坡 (中文)

Getting Started With Analytics

Build Your Own Connection

Overview

Resources

Delta Sharing Connector Options

Python Connector

System Requirements

Installation Process

Accessing Shared Data

Before You Begin

Accessing Shared Data Options

Apache Spark Connector

System Requirements

Accessing Shared Data

Configuring Apache Spark

Set up an interactive shell

Set up a standalone project

Maven