问题

每当我想在 R 中做一些"map"py时,我通常尝试使用 apply family中的一个函数. (side问题:我仍然没有学习 plyr reshape - 将 plyr reshape 这些完全?)

然而,我从来没有理解它们之间的区别[如何{ sapply , lapply 等}将该函数应用到输入/输出将看起来像,甚至什么输入可以],所以我经常只是通过他们,直到我得到我想要的.

有人可以解释如何使用哪一个?

[我目前的(可能不正确/不完整)理解是...

  1. sapply(vec, f): input is a vector. output is a vector/matrix, where element i is f(vec[i]) [giving you a matrix if f has a multi-element output]
  2. lapply(vec, f): same as sapply, but output is a list?
  3. apply(matrix, 1/2, f): input is a matrix. output is a vector, where element i is f(row/col i of the matrix)
  4. tapply(vector, grouping, f): output is a matrix/array, where an element in the matrix/array is the value of f at a grouping g of the vector, and g gets pushed to the row/col names
  5. by(dataframe, grouping, f): let g be a grouping. apply f to each column of the group/dataframe. pretty print the grouping and the value of f at each column.
  6. aggregate(matrix, grouping, f): similar to by, but instead of pretty printing the output, aggregate sticks everything into a dataframe.]


解决方法

R有许多*应用函数,在帮助文件(例如?apply )中详细描述.有足够的他们,虽然,开始使用R可能有困难决定哪一个是适合他们的情况,甚至记住他们所有.他们可能有一个一般的意义,"我应该使用*应用函数在这里",但它可能很难保持它们一切顺利.

尽管事实(在其他答案中指出),* apply family的大部分功能被非常受欢迎的 plyr 包所覆盖,但基函数仍然有用且值得了解.

此答案旨在作为一种路标,用于新的useRs,以帮助他们为他们的特定问题指定正确的* apply函数.注意,这是,旨在简单地反复或更换R文档!希望这个答案可以帮助你决定哪个*应用函数适合你的情况,然后由你来进一步研究.除了一个例外,将不会解决性能差异.

  • apply - When you want to apply a function to the rows or columns of a matrix (and higher-dimensional analogues); not generally advisable for data frames as it will coerce to a matrix first.

    # Two dimensional matrix
    M <- matrix(seq(1,16), 4, 4)
    
    # apply min to rows
    apply(M, 1, min)
    [1] 1 2 3 4
    
    # apply max to columns
    apply(M, 2, max)
    [1]  4  8 12 16
    
    # 3 dimensional array
    M <- array( seq(32), dim = c(4,4,2))
    
    # Apply sum across each M[*, , ] - i.e Sum across 2nd and 3rd dimension
    apply(M, 1, sum)
    # Result is one-dimensional
    [1] 120 128 136 144
    
    # Apply sum across each M[*, *, ] - i.e Sum across 3rd dimension
    apply(M, c(1,2), sum)
    # Result is two-dimensional
         [,1] [,2] [,3] [,4]
    [1,]   18   26   34   42
    [2,]   20   28   36   44
    [3,]   22   30   38   46
    [4,]   24   32   40   48
    

    If you want row/column means or sums for a 2D matrix, be sure to investigate the highly optimized, lightning-quick colMeans, rowMeans, colSums, rowSums.

  • lapply - When you want to apply a function to each element of a list in turn and get a list back.

    This is the workhorse of many of the other *apply functions. Peel back their code and you will often find lapply underneath.

       x <- list(a = 1, b = 1:3, c = 10:100) 
       lapply(x, FUN = length) 
       $a 
       [1] 1
       $b 
       [1] 3
       $c 
       [1] 91
    
       lapply(x, FUN = sum) 
       $a 
       [1] 1
       $b 
       [1] 6
       $c 
       [1] 5005
    
  • sapply - When you want to apply a function to each element of a list in turn, but you want a vector back, rather than a list.

    If you find yourself typing unlist(lapply(...)), stop and consider sapply.

       x <- list(a = 1, b = 1:3, c = 10:100)
       #Compare with above; a named vector, not a list 
       sapply(x, FUN = length)  
       a  b  c   
       1  3 91
    
       sapply(x, FUN = sum)   
       a    b    c    
       1    6 5005 
    

    In more advanced uses of sapply it will attempt to coerce the result to a multi-dimensional array, if appropriate. For example, if our function returns vectors of the same length, sapply will use them as columns of a matrix:

       sapply(1:5,function(x) rnorm(3,x))
    

    If our function returns a 2 dimensional matrix, sapply will do essentially the same thing, treating each returned matrix as a single long vector:

       sapply(1:5,function(x) matrix(x,2,2))
    

    Unless we specify simplify = "array", in which case it will use the individual matrices to build a multi-dimensional array:

       sapply(1:5,function(x) matrix(x,2,2), simplify = "array")
    

    Each of these behaviors is of course contingent on our function returning vectors or matrices of the same length or dimension.

  • vapply - When you want to use sapply but perhaps need to squeeze some more speed out of your code.

    For vapply, you basically give R an example of what sort of thing your function will return, which can save some time coercing returned values to fit in a single atomic vector.

    x <- list(a = 1, b = 1:3, c = 10:100)
    #Note that since the advantage here is mainly speed, this
    # example is only for illustration. We're telling R that
    # everything returned by length() should be an integer of 
    # length 1. 
    vapply(x, FUN = length, FUN.VALUE = 0L) 
    a  b  c  
    1  3 91
    
  • mapply - For when you have several data structures (e.g. vectors, lists) and you want to apply a function to the 1st elements of each, and then the 2nd elements of each, etc., coercing the result to a vector/array as in sapply.

    This is multivariate in the sense that your function must accept multiple arguments.

    #Sums the 1st elements, the 2nd elements, etc. 
    mapply(sum, 1:5, 1:5, 1:5) 
    [1]  3  6  9 12 15
    #To do rep(1,4), rep(2,3), etc.
    mapply(rep, 1:4, 4:1)   
    [[1]]
    [1] 1 1 1 1
    
    [[2]]
    [1] 2 2 2
    
    [[3]]
    [1] 3 3
    
    [[4]]
    [1] 4
    
  • Map - A wrapper to mapply with SIMPLIFY = FALSE, so it is guaranteed to return a list.

    Map(sum, 1:5, 1:5, 1:5)
    [[1]]
    [1] 3
    
    [[2]]
    [1] 6
    
    [[3]]
    [1] 9
    
    [[4]]
    [1] 12
    
    [[5]]
    [1] 15
    
  • rapply - For when you want to apply a function to each element of a nested list structure, recursively.

    To give you some idea of how uncommon rapply is, I forgot about it when first posting this answer! Obviously, I'm sure many people use it, but YMMV. rapply is best illustrated with a user-defined function to apply:

    #Append ! to string, otherwise increment
    myFun <- function(x){
        if (is.character(x)){
        return(paste(x,"!",sep=""))
        }
        else{
        return(x + 1)
        }
    }
    
    #A nested list structure
    l <- list(a = list(a1 = "Boo", b1 = 2, c1 = "Eeek"), 
              b = 3, c = "Yikes", 
              d = list(a2 = 1, b2 = list(a3 = "Hey", b3 = 5)))
    
    
    #Result is named vector, coerced to character           
    rapply(l,myFun)
    
    #Result is a nested list like l, with values altered
    rapply(l, myFun, how = "replace")
    
  • tapply - For when you want to apply a function to subsets of a vector and the subsets are defined by some other vector, usually a factor.

    The black sheep of the *apply family, of sorts. The help file's use of the phrase "ragged array" can be a bit confusing, but it is actually quite simple.

    A vector:

       x <- 1:20
    

    A factor (of the same length!) defining groups:

       y <- factor(rep(letters[1:5], each = 4))
    

    Add up the values in x within each subgroup defined by y:

       tapply(x, y, sum)  
        a  b  c  d  e  
       10 26 42 58 74 
    

    More complex examples can be handled where the subgroups are defined by the unique combinations of a list of several factors. tapply is similar in spirit to the split-apply-combine functions that are common in R (aggregate, by, ave, ddply, etc.) Hence its black sheep status.




相关问题推荐