Почему sapply относительно медленно при запросе атрибутов на переменные в data.frame?

Что-то меня удивило: давайте сравним два способа получения class es для переменных в большом кадре данных со многими столбцами: решение sapply и решение цикла for.

bigDF <- as.data.frame( matrix( 0, nrow=1E5, ncol=1E3 ) )
library( microbenchmark )

for_soln <- function(x) {
  out <- character( ncol(x) )
  for( i in 1:ncol(x) ) {
    out[i] <- class(x[,i])
  }
  return( out )
}

microbenchmark( times=20,
  sapply( bigDF, class ),
  for_soln( bigDF )
)

дает мне, на моей машине,

Unit: milliseconds
                  expr       min        lq    median       uq      max
1      for_soln(bigDF)  21.26563  21.58688  26.03969 163.6544 300.6819
2 sapply(bigDF, class) 385.90406 405.04047 444.69212 471.8829 889.6217

Интересно, что если мы преобразуем bigDF в список, sapply снова будет приятным и быстрым.

bigList <- as.list( bigDF )
for_soln2 <- function(x) {
  out <- character( length(x) )
  for( i in 1:length(x) ) {
    out[i] <- class( x[[i]] )
  }
  return( out )
}

microbenchmark( sapply( bigList, class ), for_soln2( bigList ) )

дает мне

Unit: milliseconds
                    expr      min       lq   median       uq      max
1     for_soln2(bigList) 1.887353 1.959856 2.010270 2.058968 4.497837
2 sapply(bigList, class) 1.348461 1.386648 1.401706 1.428025 3.825547

Почему эти операции, особенно sapply, занимают намного больше времени с data.frame по сравнению с list? И есть ли более идиоматическое решение?

bigDF <- as.data.frame(matrix(0, nrow=1E5, ncol=1E3)) t1 <- sapply(bigDF, class) t2 <- for_soln(bigDF) > head(t1) V1 V2 V3 V4 V5 V6 "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" > head(t2) [1] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" > identical(t1, t2) [1] FALSE

t3 <- sapply(1:ncol(bigDF), function(idx) class(bigDF[[idx]])) > identical(t2, t3) [1] TRUE microbenchmark(times=20, sapply(bigDF, class), for_soln(bigDF), sapply(1:ncol(bigDF), function(idx) class(bigDF[[idx]])) ) Unit: milliseconds expr min lq median uq max 1 for-soln (t2) 38.31545 39.45940 40.48152 43.05400 313.9484 2 sapply-new (t3) 18.51510 18.82293 19.87947 26.10541 261.5233 3 sapply-orig (t1) 952.94612 1075.38915 1159.49464 1204.52747 1484.1522

Ответ 1