Question:

How to iterate and read one row at at time from multiple parquet files?

1. You can use RecordBatch.to_pylist to get each row. Then use yield to create an iterator.

import pyarrow.parquet as pq

def file_iterator(file_name, batch_size):
    parquet_file = pq.ParquetFile(file_name)
    for record_batch in parquet_file.iter_batches(batch_size=batch_size):
        for d in record_batch.to_pylist():
            yield d

for row in file_iterator("file.parquet", 100):
    print(row)

Answer by: >0x26res

Credit: >Stackoverflow

To read multiple .parquet files from multiple directories into a single pandas dataframe, we will use the

following steps:

1. Import the required libraries
2. Create a list of directories containing .parquet files
3. Loop through the list of directories and read the .parquet files into separate dataframes
4. Concatenate the dataframes into a single dataframe

>What makes Python 'flow' with HTML nicely as compared to PHP?

>How to do wild grouping of friends in Python?

>How to do Web Scraping with Python?

Python

Ritu Singh

Your Answer

Submit

Ritu Singh

0 Answers

Top Questions

Ritu Singh

0 Answers

Top Questions

Related Questions