Why Does Concatenation Of Data Frames Get Exponentially Slower?

In this post, we will learn Why does concatenation of DataFrames get exponentially slower which might be due to different reasons due to the use of pandas’ handling memory allocation during the process of contamination process.

Concatenation

Concatenation Of Data Frames

There are situations when we have to deal with the adding of data frames as it is not the same data frame adding. There we need to copy the data of one data frame to another a new data frame and the memory of the data frame increases as we do not know how much the new data frame that we are gong to add is having.

Although in pandas it do not allocate memory in a continuous block. or all the memory at same place  Instead, it allocates memory in chunks, which are typically around 100 MB in size. For managing the memory more affected When pandas need to allocate additional memory for the new DataFrame and there is not enough contiguous free memory available, it has to allocate a new chunk of memory, copy the data from the old chunk to the new chunk, and then deallocate the old chunk. This process can be time-consuming, especially when the number of chunks increases.

As we can see the memory size of the data frame has been increased due to adding a new data frame it requires more chunks of data and needs to allocate new memory to them, And this memory allocation is one of the main reasons for making it too much slow.

And for the same, we have given an example to understand the Concatenation.

import pandas as pd

# Generate two large DataFrames with random data
df1 = pd.DataFrame({'A': range(10000000), 'B': range(10000000)})
df2 = pd.DataFrame({'A': range(10000000, 20000000), 'B': range(10000000, 20000000)})

# Concatenate the DataFrames with a small chunk size
concatenated_df = pd.concat([ df1, df2], ignore_index=True, chunksize=1000000)

# Print the shape of the concatenated DataFrame
print( concatenated_df.shape)

 

Here we added two data frames df1 and df2 and the result is stored in another data frame named Concatenation_df at the end we simply printed it.

 

To learn more about the reason of Why does concatenation of DataFrames get slower visit:  stack overflow.

To learn more about python solutions to different python problems and tutorials for the concepts we need to know to work on python programming along with different ways to solve any generally asked problems: How To Pass-Variables From A Java & Python Client To A Linux/Ubuntu Server Which Is Running C?.

Leave a Comment

%d bloggers like this: