triopapers.blogg.se - Frame subsume

#Frame subsume code#

Now, there’s just one more method to share with you. When we subset the education data frame with either of the two aforementioned methods, we get the same result as we did with the first two methods: In our case, we take a subset of education where “Region” is equal to 2 and then we select the “State,” “Minor.Population,” and “Education.Expenditure” columns. The subset() function takes 3 arguments: the data frame you want subsetted, the rows corresponding to the condition by which you want it subsetted, and the columns you want returned. ed_exp4 <- subset(education, Region = 2, select = c("State","Minor.Population","Education.Expenditures")) There is another basic function in R that allows us to subset a data frame without knowing the row and column references. There’s got to be an easier way to do that.

#Frame subsume code#

Now, you may look at this line of code and think that it’s too complicated. We retrieve the columns of the subset by using the %in% operator on the names of the education data frame. This function returns the indices where the Region column of the education data from is 2. This time, however, we are extracting the rows we need by using the which() function. First, we are using the same basic bracketing technique to subset the education data frame as we did with the first two examples. It’s pretty easy with 7 columns and 50 rows, but what if you have 70 columns and 5,000 rows? How do you find which columns and rows you need in that case? Here’s another way to subset a data frame in R… ed_exp3 <- education

You have to know the exact column and row references you want to extract. Now, these basic ways of subsetting a data frame in R can become tedious with large data sets. If we now call ed_exp1 and ed_exp2, we can see that both data frames return the same subset of the original education data frame. Here, instead of subsetting the rows and columns we wanted returned, we subsetted the rows and columns we did not want returned and then omitted them with the “-” sign. Take a look at this code: ed_exp2 <- education Pretty simple, right?Īnother way to subset the data frame with brackets is by omitting row and column references. To create the new data frame ‘ed_exp1,’ we subsetted the ‘education’ data frame by extracting rows 10-21, and columns 2, 6, and 7. Here’s the basic way to retrieve that data in R: ed_exp1 <- education However, we would only need the observations from the rows that correspond to Region 2. We would need three variables: State, Minor.Population, and Education.Expenditures. Now, let’s suppose we oversee the Midwestern division of schools and that we are charged with calculating how much money was spent per child for each state in our region. Here’s what the first part of our data set looks like after I’ve imported the data and appropriately named its columns. # import education expenditure data set and assign column namesĮducation <- read.csv("", stringsAsFactors = FALSE)Ĭolnames(education) <- c("X","State","Region","Urban.Population","","Minor.Population","Education.Expenditures") Let’s pull some data from the web and see how this is done on a real data set. The most basic way of subsetting a data frame in R is by using square brackets such that in:Įxample is the data frame we want to subset, ‘x’ consists of the rows we want returned, and ‘y’ consists of the columns we want returned. So, how do you sort through all the extraneous variables and observations and extract only those you need? Well, R has several ways of doing this in a process it calls “subsetting.”

Often, when you’re working with a large data set, you will only be interested in a small portion of it for your particular analysis.