A Video Streaming Website Keeps Count

0 views

Skip to first unread message

Wesley Godinez

unread,

Aug 3, 2024, 1:03:36 PM8/3/24

to lourssourcjuncha

Other tines I am listening to web streaming and the phone goes into lock mode I tap on the screen and my itunes is open and then it will play with audio!!!??? Each and every time I had not had iTunes open on my phone prior to it going into lock mode.

If you are on a website playing audio or video, but your screen locks/goes to sleep, it is normal for the audio and video to stop playing at that time. You would want to adjust the Auto-Lock setting under Settings > Display & Brightness > Auto-Lock, if you wish to keep your iPhone from going to sleep. You can also look into enabling 'Attention Aware Features' if you are indeed using an iPhone XS as your post suggests: About Attention Aware features on your iPhone X or iPad Pro - Apple Support.

Internally, it works as follows. Spark Streaming receives live input data streams and dividesthe data into batches, which are then processed by the Spark engine to generate the finalstream of results in batches.

Spark Streaming provides a high-level abstraction called discretized stream or DStream,which represents a continuous stream of data. DStreams can be created either from input datastreams from sources such as Kafka, and Kinesis, or by applying high-leveloperations on other DStreams. Internally, a DStream is represented as a sequence ofRDDs.

This guide shows you how to start writing Spark Streaming programs with DStreams. You canwrite Spark Streaming programs in Scala, Java or Python (introduced in Spark 1.2),all of which are presented in this guide.You will find tabs throughout this guide that let you choose between code snippets ofdifferent languages.

flatMap is a one-to-many DStream operation that creates a new DStream bygenerating multiple new records from each record in the source DStream. In this case,each line will be split into multiple words and the stream of words is represented as thewords DStream. Next, we want to count these words.

The words DStream is further mapped (one-to-one transformation) to a DStream of (word,1) pairs, which is then reduced to get the frequency of words in each batch of data.Finally, wordCounts.pprint() will print a few of the counts generated every second.

Note that when these lines are executed, Spark Streaming only sets up the computation itwill perform when it is started, and no real processing has started yet. To start the processingafter all the transformations have been setup, we finally call

First, we import the names of the Spark Streaming classes and some implicitconversions from StreamingContext into our environment in order to add useful methods toother classes we need (like DStream). StreamingContext is themain entry point for all streaming functionality. We create a local StreamingContext with two execution threads, and a batch interval of 1 second.

This lines DStream represents the stream of data that will be received from the dataserver. Each record in this DStream is a line of text. Next, we want to split the lines byspace characters into words.

The words DStream is further mapped (one-to-one transformation) to a DStream of (word,1) pairs, which is then reduced to get the frequency of words in each batch of data.Finally, wordCounts.print() will print a few of the counts generated every second.

First, we create aJavaStreamingContext object,which is the main entry point for all streamingfunctionality. We create a local StreamingContext with two execution threads, and a batch interval of 1 second.

flatMap is a DStream operation that creates a new DStream bygenerating multiple new records from each record in the source DStream. In this case,each line will be split into multiple words and the stream of words is represented as thewords DStream. Note that we defined the transformation using aFlatMapFunction object.As we will discover along the way, there are a number of such convenience classes in the Java APIthat help defines DStream transformations.

The words DStream is further mapped (one-to-one transformation) to a DStream of (word,1) pairs, using a PairFunctionobject. Then, it is reduced to get the frequency of words in each batch of data,using a Function2 object.Finally, wordCounts.print() will print a few of the counts generated every second.

Note that when these lines are executed, Spark Streaming only sets up the computation itwill perform after it is started, and no real processing has started yet. To start the processingafter all the transformations have been setup, we finally call start method.

For ingesting data from sources like Kafka and Kinesis that are not present in the SparkStreaming core API, you will have to add the correspondingartifact spark-streaming-xyz_2.12 to the dependencies. For example,some of the common ones are as follows.

Any operation applied on a DStream translates to operations on the underlying RDDs. For example,in the earlier example of converting a stream of lines to words,the flatMap operation is applied on each RDD in the lines DStream to generate the RDDs of the words DStream. This is shown in the following figure.

These underlying RDD transformations are computed by the Spark engine. The DStream operationshide most of these details and provide the developer with a higher-level API for convenience.These operations are discussed in detail in later sections.

Note that, if you want to receive multiple streams of data in parallel in your streamingapplication, you can create multiple input DStreams (discussedfurther in the Performance Tuning section). This willcreate multiple receivers which will simultaneously receive multiple data streams. But note that aSpark worker/executor is a long-running task, hence it occupies one of the cores allocated to theSpark Streaming application. Therefore, it is important to remember that a Spark Streaming applicationneeds to be allocated enough cores (or threads, if running locally) to process the received data,as well as to run the receiver(s).

Extending the logic to running on a cluster, the number of cores allocated to the Spark Streamingapplication must be more than the number of receivers. Otherwise the system will receive data, butnot be able to process it.

We have already taken a look at the ssc.socketTextStream(...) in the quick examplewhich creates a DStream from textdata received over a TCP socket connection. Besides sockets, the StreamingContext API providesmethods for creating DStreams from files as input sources.

For reading data from files on any file system compatible with the HDFS API (that is, HDFS, S3, NFS, etc.), a DStream can be created asvia StreamingContext.fileStream[KeyClass, ValueClass, InputFormatClass].

To guarantee that changes are picked up in a window, write the fileto an unmonitored directory, then, immediately after the output stream is closed,rename it into the destination directory.Provided the renamed file appears in the scanned destination directory during the windowof its creation, the new data will be picked up.

In contrast, Object Stores such as Amazon S3 and Azure Storage usually have slow rename operations, as thedata is actually copied.Furthermore, a renamed object may have the time of the rename() operation as its modification time, somay not be considered part of the window which the original create time implied they were.

Careful testing is needed against the target object store to verify that the timestamp behaviorof the store is consistent with that expected by Spark Streaming. It may bethat writing directly into a destination directory is the appropriate strategy forstreaming data via the chosen object store.

For testing a Spark Streaming application with test data, one can also create a DStream based on a queue of RDDs, using streamingContext.queueStream(queueOfRDDs). Each RDD pushed into the queue will be treated as a batch of data in the DStream, and processed like a stream.

This category of sources requires interfacing with external non-Spark libraries, some of them withcomplex dependencies (e.g., Kafka). Hence, to minimize issues related to version conflictsof dependencies, the functionality to create DStreams from these sources has been moved to separatelibraries that can be linked to explicitly when necessary.

Input DStreams can also be created out of custom data sources. All you have to do is implement auser-defined receiver (see next section to understand what that is) that can receive data fromthe custom sources and push it into Spark. See the Custom ReceiverGuide for details.

There can be two kinds of data sources based on their reliability. Sources(like Kafka) allow the transferred data to be acknowledged. If the system receivingdata from these reliable sources acknowledges the received data correctly, it can be ensuredthat no data will be lost due to any kind of failure. This leads to two kinds of receivers:

In every batch, Spark will apply the state update function for all existing keys, regardless of whether they have new data in a batch or not. If the update function returns None then the key-value pair will be eliminated.

The transform operation (along with its variations like transformWith) allowsarbitrary RDD-to-RDD functions to be applied on a DStream. It can be used to apply any RDDoperation that is not exposed in the DStream API.For example, the functionality of joining every batch in a data streamwith another dataset is not directly exposed in the DStream API. However,you can easily use transform to do this. This enables very powerful possibilities. For example,one can do real-time data cleaning by joining the input data stream with precomputedspam information (maybe generated with Spark as well) and then filtering based on it.