select() is a function from dplyr R package that is used to select data frame variables by name, by index, and also is used to rename variables while selecting, and dropping variables by name. In this article, I will explain the syntax of select() function, and its usage with examples like selecting specific variables by name, by position, selecting variables from the list of names, and many more. Note that in R columns are referred to as variables and rows are referred to as observations. Show
dplyr is an R package that provides a grammar of data manipulation and provides a most used set of verbs that helps data science analysts to solve the most common data manipulation. In order to use this, you have to install it first using Sometimes you may need to change the variable names, if so read rename data frame columns in r. 1. dplyr select() SyntaxFollowing is the syntax of select() function of dplyr package in R. This returns an object of the same class as
Let’s create an R DataFrame, run these examples and explore the output. If you already have data in CSV you can easily import CSV file to R DataFrame. Also, refer to Import Excel File into R.
Yields below output.
2. Select Variables by Index PositionThe
Verb Yields below output
3. Select Variables by NameYou can also select variables by name, select multiple variables, and all variables in the list (contains in the list). The first example from the following selects the specified variables that are supplied to select() function with a comma separator. The second example selects all variables from the list.
4. Drop VariablesBy using select() you can also
drop columns from the DataFrame by Name. To drop variables, use
5. Select All Variables Between 2 VariablesYou can also select all variables between two variables, in order to do so use the range
6. Select All Variables that starts withUse
7. Select All Variables that ends withUse
8. Select Variables containing characterIn case you wanted to select all variables that contain a character or string use contains(). The following example selects all variables that contain a character a.
9. Select All Numeric VariablesSelecting all numeric variables is one of the most used operations. If you have data frame with variables with strings and integers, performing certain statistical operations on the entire data frame results in error hence, first you need to select all numeric columns and perform the operation on the result of it.
10. Complete Example
11. ConclusionIn this article, you have learned select() method syntax from dplyr package, how to select the variables by index position and name, select variables start with, end with e.t.c Related Articles
References
Which of following are examples of variable names that can be used?The following are examples of valid variable names: age, gender, x25, age_of_hh_head. The following are examples of invalid variable names: age_ (ends with an underscore); 0st (starts with a digit);
What are the variables used in R?Variables in R programming can be used to store numbers (real and complex), words, matrices, and even tables. R is a dynamically programmed language which means that unlike other programming languages, we do not have to declare the data type of a variable before we can use it in our program.
How do you use variable names in R?Variable Names
Rules for R variables are: A variable name must start with a letter and can be a combination of letters, digits, period(.) and underscore(_). If it starts with period(.), it cannot be followed by a digit.
Can variable names have in R?R supports rather long variable names and these names can contain even spaces and punctuation but short variables names make coding easier.
|