R 기초 203 내장 함수들 (Built-in Functions)

R 기초 203 내장 함수들 (Built-in Functions)

preface 이번 포스트에서는 R에 내장된 함수를 사용하는 방법에 대하여 설명합니다.

Built-in Functions

다음 자료를 참고하였습니다:

R의 거의 모든 부분은 기능을 통해 수행됩니다. 여기에서는 변수를 작성하거나 다시 코딩할 때 일반적으로 사용되는 숫자 및 문자 함수만 언급합니다. 아래 함수들은 개별 변수 뿐만 아니라 vector, matrix 에도 적용됩니다.

숫자 함수 (Numeric Functions)

Function Description
abs(x) absolute value
sqrt(x) square root
ceiling(x) ceiling(3.475) is 4
floor(x) floor(3.475) is 3
trunc(x) trunc(5.99) is 5
round(x, digits=n) round(3.475, digits=2) is 3.48
. 0.5에서 반올림하면 0이 되는 IEEE rounding 을 사용하므로 주의
signif(x, digits=n) signif(3.475, digits=2) is 3.5
cos(x), sin(x), tan(x) also acos(x), cosh(x), acosh(x), etc.
log(x) natural logarithm
log10(x) common logarithm
exp(x) e^x

문자 함수 (Character Functions)

Function Description
substr(x, start=n1, stop=n2) Extract or replace substrings in a character vector.
- x <- “abcdef”
- substr(x, 2, 4) is “bcd”
- substr(x, 2, 4) <- “22222” is “a222ef”
grep(pattern, x , ignore.case=FALSE, fixed=FALSE) Search for pattern in x. If fixed =FALSE then pattern is a regular expression. If fixed=TRUE then pattern is a text string. Returns matching indices.
- grep(“A”, c(“b”,”A”,”c”), fixed=TRUE) returns 2
sub(pattern, replacement, x, ignore.case =FALSE, fixed=FALSE) Find pattern in x and replace with replacement text. If fixed=FALSE then pattern is a regular expression.
- If fixed = T then pattern is a text string.
- sub(“\s”,”.”,”Hello There”) returns “Hello.There”
strsplit(x, split) Split the elements of character vector x at split.
- strsplit(“abc”, “”) returns 3 element vector “a”,”b”,”c”
paste(…, sep=””) Concatenate strings after using sep string to seperate them.
- paste(“x”,1:3,sep=””) returns c(“x1”,”x2” “x3”)
- paste(“x”,1:3,sep=”M”) returns c(“xM1”,”xM2” “xM3”)
- paste(“Today is”, date())
toupper(x) Uppercase
tolower(x) Lowercase

통계적 확률 함수 (Statistical Probability Functions)

아래 표에서는 확률 분포와 관련된 함수들을 소개합니다. 난수 생성(random number generator)을 할 때, set.seed(1234) (혹은 다른 숫자) 를 이용해서 코드 재현성을 높일 수 있습니다.

Function Description
dnorm(x) normal density function (by default m=0 sd=1)
- # plot standard normal curve
- x <- pretty(c(-3,3), 30)
- y <- dnorm(x)
- plot(x, y, type=’l’, xlab=”Normal Deviate”, ylab=”Density”, yaxs=”i”)
pnorm(q) cumulative normal probability for q
- (area under the normal curve to the left of q)
- pnorm(1.96) is 0.975
qnorm(p) normal quantile.
- value at the p percentile of normal distribution
- qnorm(.9) is 1.28 # 90th percentile
rnorm(n, m=0,sd=1) n random normal deviates with mean m and standard deviation sd.
- #50 random normal variates with mean=50, sd=10
- x <- rnorm(50, m=50, sd=10)
dbinom(x, size, prob) binomial distribution where size is the sample size and prob is the probability of a heads (pi)
pbinom(q, size, prob) # prob of 0 to 5 heads of fair coin out of 10 flips: dbinom(0:5, 10, .5)
qbinom(p, size, prob) # prob of 5 or less heads of fair coin out of 10 flips: pbinom(5, 10, .5)
rbinom(n, size, prob)  
dpois(x, lamda) poisson distribution with m=std=lamda
ppois(q, lamda) #probability of 0,1, or 2 events with lamda=4: dpois(0:2, 4)
qpois(p, lamda) # probability of at least 3 events with lamda=4: 1- ppois(2,4)
rpois(n, lamda)  
dunif(x, min=0, max=1) uniform distribution, follows the same pattern as the normal distribution above.
punif(q, min=0, max=1) #10 uniform random variates: x <- runif(10)
qunif(p, min=0, max=1)  
runif(n, min=0, max=1)  

기타 통계 함수 (Other Statistical Functions)

아래 표에서는 기타 유용한 통계 함수들을 소개합니다. na.rm = TRUE 옵션을 활성화하면 통계량 계산에서 결측값(NA)이 제외됩니다. 결측값을 제외하지 않으면 에러가 발생합니다. 이때 여러 변수를 한꺼번에 계산한다면 결측치가 없는 변수의 해당 관측치도 함께 제외되니 주의합시다.

Function Description
mean(x, trim=0, na.rm=FALSE) mean of object x
- # trimmed mean, removing any missing values and
- # 5 percent of highest and lowest scores
- mx <- mean(x,trim=.05,na.rm=TRUE)
sd(x) standard deviation of object(x). also look at var(x) for variance and mad(x) for median absolute deviation.
median(x) median
quantile(x, probs) quantiles where x is the numeric vector whose quantiles are desired and probs is a numeric vector with probabilities in [0,1].
- # 30th and 84th percentiles of x
- y <- quantile(x, c(.3,.84))
range(x) range
sum(x) sum
diff(x, lag=1) lagged differences, with lag indicating which lag to use
min(x) minimum
max(x) maximum
scale(x, center=TRUE, scale=TRUE) column center or standardize a matrix.

기타 유용한 함수 (Other Useful Functions)

Function Description
seq(from , to, by) generate a sequence
- indices <- seq(1,10,2)
- #indices is c(1, 3, 5, 7, 9)
rep(x, ntimes) repeat x n times
- y <- rep(1:3, 2)
- # y is c(1, 2, 3, 1, 2, 3)
cut(x, n) divide continuous variable in factor with n levels
- y <- cut(x, 5)

Tag Cloud

R    SQL    classification    demension reduction    jekyll    python    regression    supervised   
Hyeongjun Kim

Hyeongjun Kim

Financial Economist, Data Scientist, and Hearthstone Player

rss facebook twitter github youtube mail spotify lastfm instagram linkedin google google-plus pinterest medium vimeo stackoverflow reddit quora quora