 R Data Frame

Data Frame

• A Data Frame in R has two dimensional properties similar to a matrix but it can contain heterogeneous data
• In a way, data frame is like a list with components as columns
• The components of a data frame must be vectors (numeric, character or logical), factors, numeric matrices, lists or other data frames
• Vector structures appearing as variables of a data frame must all have the same length and matrix structures must all have the same row size
Syntax
//Syntax:
data.frame(vectors, row.names = NULL, etc…)

as.data.frame(x)
where x can be vector, list, factor or matrix

is.data.frame(x)
it checks if the variable is a data frame or not

Creating a data frame with name myScore by using data.frame() function

CODE/PROGRAM/EXAMPLE
Subjects <- c("Math","Science","English","Social")
Marks <- c(99,67,74,62)
myScore <- data.frame(Subjects, Marks)
myScore

Output: Consider ‘mat1’ matrix. Use as.data.frame() function to convert it to a data frame with the name ‘NewDF’ CODE/PROGRAM/EXAMPLE
NewDF <- as.data.frame(mat1)
NewDF

Output: Consider the two heterogeneous vectors, Subjects and Marks, with character and numeric types respectively and the myScore data frame created in the previous example The ‘Subjects’ character vector got converted to a factor, when the data frame was created To ensure that the ‘Subjects’ vector remains as a character, use option stringsAsFactors = FALSE

CODE/PROGRAM/EXAMPLE
data.frame(Subjects, Marks, stringsAsFactors = FALSE)

names() function

can be used to retrieve the column names

can be used to modify the column names

Consider the previous GMAT example. If we want to change the name from ‘Tom ‘ to ‘John’

CODE/PROGRAM/EXAMPLE
names(GMAT.df)
 "Jane" "Tom" "Katy" "James"
names(GMAT.df) <- c("Jane", "John", "Katy", "James")
GMAT.df

Output: colnames() & rownames() function

can be used to retrieve or modify the column and row names respectively

CODE/PROGRAM/EXAMPLE
colnames(GMAT.df)
 "Jane" "John" "Katy" "James"

'\$'symbol is required to access a specific column

CODE/PROGRAM/EXAMPLE
GMAT.df\$Katy
 99.4 99.7 98.9

dataFrameName[position]

an element can be retrieved with the help of its position in the data frame

CODE/PROGRAM/EXAMPLE
GMAT.df[2,3]
 99.7

# 99.7 is math score of Katy

dim() function can be used to check the dimensions of the data frame and also to modify the dimensions of the same

CODE/PROGRAM/EXAMPLE
dim(GMAT.df)
 3 4

One way to subset the data frame is by using ‘subset()’ function

Syntax
Syntax :
subset(x, condition, select, ..)

Where,
x: the data frame
condition: the subset condition
select: columns to be displayed in the output

Consider the ‘math’ data frame If we want the details of students who scored more than or equal to 96 marks

CODE/PROGRAM/EXAMPLE
subset(math, math\$marks >= 96)

Output: If we want the names & marks of students who scored more than 96 and less than 99

subset(math, math\$marks > 96 & math\$marks < 99, select = c(name, marks))

Output: What will happen if there are missing values in my data frames ?

Any operation performed on missing data(NA), will result in NA, but we have a option to resolve this issue. Let’ us see what it is.

#Data_Frame_in_r_language #r_programming_data_frame #dataframes_in_r_programming #r_language_data_frame #r_programming_filter_dataframe #convert_dataframe_to_array_r

(New page will open, for Comment)

Not yet commented...