streaming

How to write your first Spark application with Stream-Stream Joins with working code.

How to write your first Spark application with Stream-Stream Joins with working code. Have you been waiting to try Streaming but cannot take the plunge? In a single blog, we will teach you whatever needs to be understood about Streaming Joins. We will give you a working code which you can use for your next Streaming Pipeline. The steps involved: Create a fake dataset at scale Set a baseline using traditional SQL Define Temporary Streaming Views Inner Joins with optional Watermarking Left Joins with Watermarking The cold start edge case: withEventTimeOrder Cleanup What is Stream-Stream Join?

Continue reading

Dive Deep into Spark Streaming Checkpoint

From Beginner to Pro: A Comprehensive Guide to understanding the Spark Streaming Checkpoint Spark is a distributed computing framework that allows for processing large datasets in parallel across a cluster of computers. When running a Spark job, it is not uncommon to encounter failures due to various issues such as network or hardware failures, software bugs, or even insufficient memory. One way to address these issues is to re-run the entire job from the beginning, which can be time-consuming and inefficient.

Continue reading

How I wrote my first Spark Streaming Application with Joins?

How I wrote my first Spark Streaming Application with Joins with working code When I started learning about Spark Streaming, I could not find enough code/material which could kick-start my journey and build my confidence. I wrote this blog to fill this gap which could help beginners understand how simple streaming is and build their first application. In this blog, I will explain most things by first principles to increase your understanding and confidence and you walk away with code for your first Streaming application.

Continue reading