R 기초 203 내장 함수들 (Built-in Functions)
preface 이번 포스트에서는 R에 내장된 함수를 사용하는 방법에 대하여 설명합니다.
Built-in Functions
다음 자료를 참고하였습니다:
R의 거의 모든 부분은 기능을 통해 수행됩니다. 여기에서는 변수를 작성하거나 다시 코딩할 때 일반적으로 사용되는 숫자 및 문자 함수만 언급합니다. 아래 함수들은 개별 변수 뿐만 아니라 vector, matrix 에도 적용됩니다.
숫자 함수 (Numeric Functions)
Function | Description |
---|---|
abs(x) | absolute value |
sqrt(x) | square root |
ceiling(x) | ceiling(3.475) is 4 |
floor(x) | floor(3.475) is 3 |
trunc(x) | trunc(5.99) is 5 |
round(x, digits=n) | round(3.475, digits=2) is 3.48 |
. | 0.5에서 반올림하면 0이 되는 IEEE rounding 을 사용하므로 주의 |
signif(x, digits=n) | signif(3.475, digits=2) is 3.5 |
cos(x), sin(x), tan(x) | also acos(x), cosh(x), acosh(x), etc. |
log(x) | natural logarithm |
log10(x) | common logarithm |
exp(x) | e^x |
문자 함수 (Character Functions)
Function | Description |
---|---|
substr(x, start=n1, stop=n2) | Extract or replace substrings in a character vector. |
- | x <- “abcdef” |
- | substr(x, 2, 4) is “bcd” |
- | substr(x, 2, 4) <- “22222” is “a222ef” |
grep(pattern, x , ignore.case=FALSE, fixed=FALSE) | Search for pattern in x. If fixed =FALSE then pattern is a regular expression. If fixed=TRUE then pattern is a text string. Returns matching indices. |
- | grep(“A”, c(“b”,”A”,”c”), fixed=TRUE) returns 2 |
sub(pattern, replacement, x, ignore.case =FALSE, fixed=FALSE) | Find pattern in x and replace with replacement text. If fixed=FALSE then pattern is a regular expression. |
- | If fixed = T then pattern is a text string. |
- | sub(“\s”,”.”,”Hello There”) returns “Hello.There” |
strsplit(x, split) | Split the elements of character vector x at split. |
- | strsplit(“abc”, “”) returns 3 element vector “a”,”b”,”c” |
paste(…, sep=””) | Concatenate strings after using sep string to seperate them. |
- | paste(“x”,1:3,sep=””) returns c(“x1”,”x2” “x3”) |
- | paste(“x”,1:3,sep=”M”) returns c(“xM1”,”xM2” “xM3”) |
- | paste(“Today is”, date()) |
toupper(x) | Uppercase |
tolower(x) | Lowercase |
통계적 확률 함수 (Statistical Probability Functions)
아래 표에서는 확률 분포와 관련된 함수들을 소개합니다. 난수 생성(random number generator)을 할 때, set.seed(1234)
(혹은 다른 숫자) 를 이용해서 코드 재현성을 높일 수 있습니다.
Function | Description |
---|---|
dnorm(x) | normal density function (by default m=0 sd=1) |
- | # plot standard normal curve |
- | x <- pretty(c(-3,3), 30) |
- | y <- dnorm(x) |
- | plot(x, y, type=’l’, xlab=”Normal Deviate”, ylab=”Density”, yaxs=”i”) |
pnorm(q) | cumulative normal probability for q |
- | (area under the normal curve to the left of q) |
- | pnorm(1.96) is 0.975 |
qnorm(p) | normal quantile. |
- | value at the p percentile of normal distribution |
- | qnorm(.9) is 1.28 # 90th percentile |
rnorm(n, m=0,sd=1) | n random normal deviates with mean m and standard deviation sd. |
- | #50 random normal variates with mean=50, sd=10 |
- | x <- rnorm(50, m=50, sd=10) |
dbinom(x, size, prob) | binomial distribution where size is the sample size and prob is the probability of a heads (pi) |
pbinom(q, size, prob) | # prob of 0 to 5 heads of fair coin out of 10 flips: dbinom(0:5, 10, .5) |
qbinom(p, size, prob) | # prob of 5 or less heads of fair coin out of 10 flips: pbinom(5, 10, .5) |
rbinom(n, size, prob) | |
dpois(x, lamda) | poisson distribution with m=std=lamda |
ppois(q, lamda) | #probability of 0,1, or 2 events with lamda=4: dpois(0:2, 4) |
qpois(p, lamda) | # probability of at least 3 events with lamda=4: 1- ppois(2,4) |
rpois(n, lamda) | |
dunif(x, min=0, max=1) | uniform distribution, follows the same pattern as the normal distribution above. |
punif(q, min=0, max=1) | #10 uniform random variates: x <- runif(10) |
qunif(p, min=0, max=1) | |
runif(n, min=0, max=1) |
기타 통계 함수 (Other Statistical Functions)
아래 표에서는 기타 유용한 통계 함수들을 소개합니다. na.rm = TRUE
옵션을 활성화하면 통계량 계산에서 결측값(NA)이 제외됩니다. 결측값을 제외하지 않으면 에러가 발생합니다. 이때 여러 변수를 한꺼번에 계산한다면 결측치가 없는 변수의 해당 관측치도 함께 제외되니 주의합시다.
Function | Description |
---|---|
mean(x, trim=0, na.rm=FALSE) | mean of object x |
- | # trimmed mean, removing any missing values and |
- | # 5 percent of highest and lowest scores |
- | mx <- mean(x,trim=.05,na.rm=TRUE) |
sd(x) | standard deviation of object(x). also look at var(x) for variance and mad(x) for median absolute deviation. |
median(x) | median |
quantile(x, probs) | quantiles where x is the numeric vector whose quantiles are desired and probs is a numeric vector with probabilities in [0,1]. |
- | # 30th and 84th percentiles of x |
- | y <- quantile(x, c(.3,.84)) |
range(x) | range |
sum(x) | sum |
diff(x, lag=1) | lagged differences, with lag indicating which lag to use |
min(x) | minimum |
max(x) | maximum |
scale(x, center=TRUE, scale=TRUE) | column center or standardize a matrix. |
기타 유용한 함수 (Other Useful Functions)
Function | Description |
---|---|
seq(from , to, by) | generate a sequence |
- | indices <- seq(1,10,2) |
- | #indices is c(1, 3, 5, 7, 9) |
rep(x, ntimes) | repeat x n times |
- | y <- rep(1:3, 2) |
- | # y is c(1, 2, 3, 1, 2, 3) |
cut(x, n) | divide continuous variable in factor with n levels |
- | y <- cut(x, 5) |