Blog

How to upgrade your Spark Stream application with a new checkpoint!

How to upgrade your Spark Stream application with a new checkpoint With working code Sometimes in life, we need to make breaking changes which require us to create a new checkpoint. Some example scenarios: You are doing a code/application change where you are changing logic Major Spark Version upgrade from Spark 2.x to Spark 3.x The previous deployment was wrong, and you want to reprocess from a certain point There could be plenty of scenarios where you want to control precisely which data(Kafka offsets) need to be processed.

Continue reading

How to parameterize Delta Live Tables and import reusable functions

How to parameterize Delta Live Tables and import reusable functions with working code This blog will discuss passing custom parameters to a Delta Live Tables (DLT) pipeline. Furthermore, we will discuss importing functions defined in other files or locations. You can import files from the current directory or a specified location using sys.path.append(). Update: As of *December 2022, you can directly import files if the reusable_functions.py file exists in the same repository by just using the import command, which is the preferred approach.

Continue reading

Merge Multiple Spark Streams Into A Delta Table

Merge Multiple Spark Streams Into A Delta Table with working code This blog will discuss how to read from multiple Spark Streams and merge/upsert data into a single Delta Table. We will also optimize/cluster data of the delta table. Overall, the process works in the following manner: Read data from a streaming source Use this special function ***foreachBatch. ***Using this we will call any user-defined function responsible for all the processing.

Continue reading

Using Spark Streaming to merge/upsert data into a Delta Lake with working code

Using Spark Streaming to merge/upsert data into a Delta Lake with working code This blog will discuss how to read from a Spark Streaming and merge/upsert data into a Delta Lake. We will also optimize/cluster data of the delta table. In the end, we will show how to start a streaming pipeline with the previous target table as the source. Overall, the process works in the following manner, we read data from a streaming source and use this special function ***foreachBatch.

Continue reading

ARC Uses a Lakehouse Architecture for Real-time Data Insights That Optimize Drilling Performance and Lower Carbon Emissions

ARC has deployed the Databricks Lakehouse Platform to enable its drilling engineers to monitor operational metrics in near real-time, so that we can proactively identify any potential issues and enable agile mitigation measures. In addition to improving drilling precision, this solution has helped us in reducing drilling time for one of our fields. Time saving translates to reduction in fuel used and therefore a reduction in CO2 footprint that result from drilling operations.

Continue reading

How Audantic Uses Databricks Delta Live Tables to Increase Productivity for Real Estate Market Segments

To support our data-driven initiatives, we had ‘stitched’ together various services for ETL, orchestration, ML leveraging AWS, Airflow, where we saw some success but quickly turned into an overly complex system that took nearly five times as long to develop compared to the new solution. Our team captured high-level metrics comparing our previous implementation and current lakehouse solution. As you can see from the table below, we spent months developing our previous solution and had to write approximately 3 times as much code.

Continue reading

1 on 1 Interview Coaching

Welcome! I am a Super Coach . I have worked at some of the biggest tech companies, including Databricks & Amazon. I make great efforts to use my platform to share resources and show folks the path of growth that exists without taking the management ladder track while inspiring many immigrants and people of colour to understand and realize their true market potential. To further my mission, I help folks negotiate job offers, $2+ Million negotiated so far.

Continue reading