R Tips: how to initialize an empty data frame

If you use R for all your daily work then you might sometimes need to initialize an empty data frame and then append data to it by using rbind(). This is an extremely inefficient process since R needs to reallocated memory every time you use something like a <- rbind(a, b). However, there are many places that you cannot preallocate memory to your initial data frame and need to start with an empty data frame and grow it gradually.

One particular example scenario where this method becomes handy is when you have a bunch of similar csv files that contain more or less the same information, but each file is for a different date and the number of rows in each file might vary slightly. For example assume you are storing the stock prices for different companies in files like “2012-3-4.csv” and “2012-3-5.csv”. Below are the contents of the two files. Note that the first file has 3 rows while the second one has 2. The number of rows in each file is not known to the programmer apriori so it is not feasible to preallocate the memory to the data frame properly.

 

You may want to have a data frame that contains a concatenation of the two files with the date attached as the third column

We can initialize an empty data frame with proper data types to store all of the data

Now we can open each file, read its contents and write that to our empty data frame

Resulting data frame

This way you can initialize an empty data frame, then loop through the files and append to it. This pattern, as mentioned before, is very inefficient and should be avoided for large amount of data. But it is a handy little trick if performance is not of an immediate concern.

One thought on “R Tips: how to initialize an empty data frame

Leave a Reply

Your email address will not be published. Required fields are marked *