Question:
How you can filter dates with OR clause in PySpark?

In PySpark, you can filter dates using the OR clause by chaining multiple filter conditions together using the | operator (bitwise OR). Here's how you can do it:


Python

from pyspark.sql import SparkSession

from pyspark.sql.functions import col


# Create a Spark session

spark = SparkSession.builder.appName("DateFilterExample").getOrCreate()


# Sample DataFrame with date column

data = [("2023-09-19",), ("2023-09-20",), ("2023-09-21",), ("2023-09-22",)]

columns = ["date"]

df = spark.createDataFrame(data, columns)


# Define date filter conditions

condition1 = (col("date") == "2023-09-20")

condition2 = (col("date") == "2023-09-22")


# Apply OR clause to filter dates

filtered_df = df.filter(condition1 | condition2)


# Show the filtered DataFrame

filtered_df.show()

In this example, we first import the necessary modules and create a Spark session. We then create a sample DataFrame df with a date column. Next, we define two date filter conditions using the col function and the == operator. Finally, we use the filter method with the | operator to apply the OR clause to filter dates that meet either condition1 or condition2. The resulting filtered_df will contain rows where the date matches either "2023-09-20" or "2023-09-22".


Suggested blogs:

>Python ValueError Solved: ‘Shape must be rank 1 but is rank 0’

>How to compile and install Python version 3.11.2?

>How to solve encoding issue when writing to a text file, with Python?

>Python Problem: GeneratorDatasetOp:Dataset will not be optimized as the dataset cannot

Ritu Singh

Ritu Singh

Submit
0 Answers