Ritu Singh
1. You can use RecordBatch.to_pylist
to get each row. Then use yield
to create an iterator.
import pyarrow.parquet as pq
def file_iterator(file_name, batch_size):
parquet_file = pq.ParquetFile(file_name)
for record_batch in parquet_file.iter_batches(batch_size=batch_size):
for d in record_batch.to_pylist():
yield d
for row in file_iterator("file.parquet", 100):
print(row)
Answer by:
>0x26res
Credit:
>Stackoverflow
To read multiple .parquet files from multiple directories into a single pandas dataframe, we will use the
following steps:
1. Import the required libraries
2. Create a list of directories containing .parquet files
3. Loop through the list of directories and read the .parquet files into separate dataframes
4. Concatenate the dataframes into a single dataframe
Suggested blogs:
>How to save python yaml and load a nested class?
>What makes Python 'flow' with HTML nicely as compared to PHP?
>How to do wild grouping of friends in Python?
>How to do Web Scraping with Python?