How to add a new column to a data frame in R
One of the first things I struggled with when learning to write R code, was how to add a new column to a data frame in R.
For example, we have a data frame, userdata, with the following three columns user_id
, first_name
, last_name
, and fav_food
. We will create two new columns: full_name
and pizza_lover
.
In R, you can access particular columns in a data frame with the following syntax: data frame name + $ + column_name
. If we want to access all of the values in the first_name
column, we would have userdata$first_name
. This would return the first name for all rows.
When it’s all said and done, creating a new column in a data frame consists of three small steps that are done in the same one line statement.
1. Name the Column
To create a new column, you’ll use this same syntax when accessing column names, but instead of an existing column name, you’ll provide a new value: userdata$full_name
.
2. Assign the Column to the Data Frame
Now that you’ve decided on a column name, you need to assign it to the data frame. This is where it was weird for me the first time. You are going to add the column to the data frame like you would declare a new variable. Take your new column name userdata$full_name
and add the assignment operator. I prefer to just use an equal sign, but old school R suggests that you use their operator, <-
. For me, it takes too many keystrokes, and I usually end up fat-fingering the shift key to get the less than sign.
userdata$full_name =
3. Fill the Column with Data
The last step is to fill your column with data. You can enter a static value by putting anything after the assignment operator, which is sometimes useful, but not in the case of our full name column.
userdata$full_name = “Mitch Craver”
To get the full name for each record, we’ll use the paste()
function. The paste()
function takes multiple different values and concatenates them together with your choice of separator.
userdata$full_name = paste(userdata$first_name, userdata$last_name, sep = “ “)
Using a function when adding a column to a data frame allows you to dynamically fill the column with relevant data. In this case, the paste()
function will grab the first_name
value and the last_name
for each row and it to the cell in the data frame.
Taking this new information, let’s create a new column to identify pizza lovers. For this example, I’ll use the ifelse()
function. This is one of my favorite functions because it condenses a lot of other code into a simple line of code.
Creating our next column with a different function:
userdata$pizza_lover = ifelse(userdata$fav_food == “Pizza”, 1, 0)
The ifelse()
function requires three parameters:
- The statement being evalutated. In our case, whether or not the value in the fav_food column is equal to “Pizza”.
- The TRUE value: 1
- The FALSE value: 0
If you make a mistake when assigning data to the new column, you can simply overwrite it by running your bit of code on the column again.
If we ran, userdata$full_name = “Mitch”
and didn’t want “Mitch” to appear for all rows in the column, update the code with the correct data to be added, an run that bit of code again.
Summary
Creating a new column for a data frame in R only requires three steps that are combined into one piece of code.
- Decide on a new column name
- Assign the new column to the data frame with the assignment operator
- Fill it with data: either by using a static value, or by dynamically assigning the data with a function. In our case, we used the
paste()
andifelse()
functions.
userdata$full_name = paste(…, sep = “ “)
anduserdata$pizza_lover = ifelse(…)
Finally, if you make a mistake, correct the code, and rerun the statement. Doing this will overwrite what is in the column with the new values from your code.
/2021/06/06/how-to-add-a-new-column-to-a-data-frame-in-r/ /2021/06/21/how-to-add-a-new-column-to-a-data-frame-in-r