R 기초 203 내장 함수들 (Built-in Functions)

Friday. October 13, 2017

preface 이번 포스트에서는 R에 내장된 함수를 사용하는 방법에 대하여 설명합니다.

Built-in Functions

다음 자료를 참고하였습니다:

http://www.statmethods.net/management/functions.html

R의 거의 모든 부분은 기능을 통해 수행됩니다. 여기에서는 변수를 작성하거나 다시 코딩할 때 일반적으로 사용되는 숫자 및 문자 함수만 언급합니다. 아래 함수들은 개별 변수 뿐만 아니라 vector, matrix 에도 적용됩니다.

숫자 함수 (Numeric Functions)

Function	Description
abs(x)	absolute value
sqrt(x)	square root
ceiling(x)	ceiling(3.475) is 4
floor(x)	floor(3.475) is 3
trunc(x)	trunc(5.99) is 5
round(x, digits=n)	round(3.475, digits=2) is 3.48
.	0.5에서 반올림하면 0이 되는 IEEE rounding 을 사용하므로 주의
signif(x, digits=n)	signif(3.475, digits=2) is 3.5
cos(x), sin(x), tan(x)	also acos(x), cosh(x), acosh(x), etc.
log(x)	natural logarithm
log10(x)	common logarithm
exp(x)	e^x

문자 함수 (Character Functions)

Function	Description
substr(x, start=n1, stop=n2)	Extract or replace substrings in a character vector.
-	x <- “abcdef”
-	substr(x, 2, 4) is “bcd”
-	substr(x, 2, 4) <- “22222” is “a222ef”
grep(pattern, x , ignore.case=FALSE, fixed=FALSE)	Search for pattern in x. If fixed =FALSE then pattern is a regular expression. If fixed=TRUE then pattern is a text string. Returns matching indices.
-	grep(“A”, c(“b”,”A”,”c”), fixed=TRUE) returns 2
sub(pattern, replacement, x, ignore.case =FALSE, fixed=FALSE)	Find pattern in x and replace with replacement text. If fixed=FALSE then pattern is a regular expression.
-	If fixed = T then pattern is a text string.
-	sub(“\s”,”.”,”Hello There”) returns “Hello.There”
strsplit(x, split)	Split the elements of character vector x at split.
-	strsplit(“abc”, “”) returns 3 element vector “a”,”b”,”c”
paste(…, sep=””)	Concatenate strings after using sep string to seperate them.
-	paste(“x”,1:3,sep=””) returns c(“x1”,”x2” “x3”)
-	paste(“x”,1:3,sep=”M”) returns c(“xM1”,”xM2” “xM3”)
-	paste(“Today is”, date())
toupper(x)	Uppercase
tolower(x)	Lowercase

통계적 확률 함수 (Statistical Probability Functions)

아래 표에서는 확률 분포와 관련된 함수들을 소개합니다. 난수 생성(random number generator)을 할 때, set.seed(1234) (혹은 다른 숫자) 를 이용해서 코드 재현성을 높일 수 있습니다.

Function	Description
dnorm(x)	normal density function (by default m=0 sd=1)
-	# plot standard normal curve
-	x <- pretty(c(-3,3), 30)
-	y <- dnorm(x)
-	plot(x, y, type=’l’, xlab=”Normal Deviate”, ylab=”Density”, yaxs=”i”)
pnorm(q)	cumulative normal probability for q
-	(area under the normal curve to the left of q)
-	pnorm(1.96) is 0.975
qnorm(p)	normal quantile.
-	value at the p percentile of normal distribution
-	qnorm(.9) is 1.28 # 90th percentile
rnorm(n, m=0,sd=1)	n random normal deviates with mean m and standard deviation sd.
-	#50 random normal variates with mean=50, sd=10
-	x <- rnorm(50, m=50, sd=10)
dbinom(x, size, prob)	binomial distribution where size is the sample size and prob is the probability of a heads (pi)
pbinom(q, size, prob)	# prob of 0 to 5 heads of fair coin out of 10 flips: dbinom(0:5, 10, .5)
qbinom(p, size, prob)	# prob of 5 or less heads of fair coin out of 10 flips: pbinom(5, 10, .5)
rbinom(n, size, prob)
dpois(x, lamda)	poisson distribution with m=std=lamda
ppois(q, lamda)	#probability of 0,1, or 2 events with lamda=4: dpois(0:2, 4)
qpois(p, lamda)	# probability of at least 3 events with lamda=4: 1- ppois(2,4)
rpois(n, lamda)
dunif(x, min=0, max=1)	uniform distribution, follows the same pattern as the normal distribution above.
punif(q, min=0, max=1)	#10 uniform random variates: x <- runif(10)
qunif(p, min=0, max=1)
runif(n, min=0, max=1)

기타 통계 함수 (Other Statistical Functions)

아래 표에서는 기타 유용한 통계 함수들을 소개합니다. na.rm = TRUE 옵션을 활성화하면 통계량 계산에서 결측값(NA)이 제외됩니다. 결측값을 제외하지 않으면 에러가 발생합니다. 이때 여러 변수를 한꺼번에 계산한다면 결측치가 없는 변수의 해당 관측치도 함께 제외되니 주의합시다.

Function	Description
mean(x, trim=0, na.rm=FALSE)	mean of object x
-	# trimmed mean, removing any missing values and
-	# 5 percent of highest and lowest scores
-	mx <- mean(x,trim=.05,na.rm=TRUE)
sd(x)	standard deviation of object(x). also look at var(x) for variance and mad(x) for median absolute deviation.
median(x)	median
quantile(x, probs)	quantiles where x is the numeric vector whose quantiles are desired and probs is a numeric vector with probabilities in [0,1].
-	# 30th and 84th percentiles of x
-	y <- quantile(x, c(.3,.84))
range(x)	range
sum(x)	sum
diff(x, lag=1)	lagged differences, with lag indicating which lag to use
min(x)	minimum
max(x)	maximum
scale(x, center=TRUE, scale=TRUE)	column center or standardize a matrix.

기타 유용한 함수 (Other Useful Functions)

Function	Description
seq(from , to, by)	generate a sequence
-	indices <- seq(1,10,2)
-	#indices is c(1, 3, 5, 7, 9)
rep(x, ntimes)	repeat x n times
-	y <- rep(1:3, 2)
-	# y is c(1, 2, 3, 1, 2, 3)
cut(x, n)	divide continuous variable in factor with n levels
-	y <- cut(x, 5)

Hyeongjun Kim

Financial Economist, Data Scientist, and Hearthstone Player