Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

Connecting Cloud Object Storage with Databricks Unity Catalog

Save for later
  • 600 min read
  • 2024-10-22 12:09:01

article-image

This article is an excerpt from the book,

Figure 10.1 – Add a storage credential 

3. Enter storage credential details: Give the credential a name, the IAM role ARN that allows Unity Catalog to access the storage location on your cloud tenant, and a comment if you want, and click on Create

connecting-cloud-object-storage-with-databricks-unity-catalog-img-4 

Figure 10.4 – Updated trust policy with External ID 

Creating an external location 

An external location contains a reference to a storage credential and a cloud storage path. You need to create an external location to access data from a custom storage location that Unity Catalog uses to reference external tables. In this example, you will create an external location that points to the de-book-ext-loc folder in an S3 bucket. To create an external location, you can follow these steps: 

1. Go to Catalog Explorer: Click on Catalog in the left panel to go to Catalog Explorer

2. Create external location: Click on +Add and select Add an external location

connecting-cloud-object-storage-with-databricks-unity-catalog-img-8 

Figure 10.8 – Test connection for external location 

If everything is set up right, you should see a screen like the following. Click on Done

https://docs.databricks.com/en/ compute/access-mode-limitations.html 

  • Databricks SQL documentation: https:// medium.com/@sauravkum780/step-by-step-guide-on-databricks-unitycatalog-setup-and-its-features-1d0366c282b7 
  • Conclusion 

    In summary, connecting to cloud object storage using Databricks Unity Catalog provides a streamlined approach to managing and accessing data across various cloud platforms such as AWS S3, Azure Blob Storage, and Google Cloud Storage. By utilizing a unified namespace, consistent APIs, and powerful governance features, Unity Catalog simplifies the process of creating and managing storage credentials and external locations. With built-in fine-grained access controls, you can securely manage data stored in different formats and cloud environments, all while leveraging Databricks' powerful data analytics capabilities. This guide walks through setting up an IAM role and creating an external location in AWS S3, demonstrating how easy it is to connect cloud storage with Unity Catalog. 

    Author Bio

    Pulkit Chadha is a seasoned technologist with over 15 years of experience in data engineering. His proficiency in crafting and refining data pipelines has been instrumental in driving success across diverse sectors such as healthcare, media and entertainment, hi-tech, and manufacturing. Pulkit’s tailored data engineering solutions are designed to address the unique challenges and aspirations of each enterprise he collaborates with.

    Modal Close icon
    Modal Close icon

    Follow Lee on X/Twitter - Father, Husband, Serial builder creating AI, crypto, games & web tools. We are friends :) AI Will Come To Life!

    Check out: eBank.nz (Art Generator) | Netwrck.com (AI Tools) | Text-Generator.io (AI API) | BitBank.nz (Crypto AI) | ReadingTime (Kids Reading) | RewordGame | BigMultiplayerChess | WebFiddle | How.nz | Helix AI Assistant