GHC 7.10 генерирует более медленный код, чем более старые версии

Я понял, что последняя версия GHC (7.10.3) производит значительно более медленный код, чем более старая версия. Текущая версия на данный момент:

$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 7.10.3

У меня есть еще две старые версии, установленные на моей локальной машине.

Мой тестовый код взят из здесь (код collatz1.hs):

import Data.Word
import Data.List
import System.Environment

collatzNext :: Word32 -> Word32
collatzNext a = (if even a then a else 3*a+1) `div` 2

-- new code
collatzLen :: Word32 -> Int
collatzLen a0 = lenIterWhile collatzNext (/= 1) a0

lenIterWhile :: (a -> a) -> (a -> Bool) -> a -> Int
lenIterWhile next notDone start = len start 0 where
    len n m = if notDone n
                then len (next n) (m+1)
                else m
-- End of new code

main = do
    [a0] <- getArgs
    let max_a0 = (read a0)::Word32
    print $ maximum $ map (\a0 -> (collatzLen a0, a0)) [1..max_a0]

Компиляция с GHC 7.4, 7.6 и 7.10 дает следующие моменты:

$ ~/Tools/ghc-7.4.2/bin/ghc -O2 Test.hs 
[1 of 1] Compiling Main             ( Test.hs, Test.o )
Linking Test ...

$ time ./Test 1000000
(329,837799)

real    0m1.879s
user    0m1.876s
sys     0m0.000s

$ ~/Tools/ghc-7.6.1/bin/ghc -O2 Test.hs 
[1 of 1] Compiling Main             ( Test.hs, Test.o )
Linking Test ...

$ time ./Test 1000000
(329,837799)

real    0m1.901s
user    0m1.896s
sys     0m0.000s

$ ~/Tools/ghc/bin/ghc -O2 Test.hs 
[1 of 1] Compiling Main             ( Test.hs, Test.o )
Linking Test ...

$ time ./Test 1000000
(329,837799)

real    0m10.562s
user    0m10.528s
sys     0m0.036s

Мы можем сказать, что нет сомнений в том, что последняя версия GHC создает худший код, чем предыдущие версии. Я не могу воспроизвести такую же эффективность, как блог, хотя, вероятно, потому, что у меня нет LLVM, и у Idon нет точной версии, которую использовал автор. Но все же, я считаю, что вывод очевиден.

Мой вопрос, в общем, почему это может произойти? Каким-то образом GHC становится хуже, чем раньше. И конкретно, если я хочу исследовать, как я должен начать себя?

1c1 < Fri Mar 11 19:58 2016 Time and Allocation Profiling Report (Final) --- > Fri Mar 11 19:59 2016 Time and Allocation Profiling Report (Final) 5,6c5,6 < total time = 2.40 secs (2400 ticks @ 1000 us, 1 processor) < total alloc = 256,066,744 bytes (excludes profiling overheads) --- > total time = 10.89 secs (10895 ticks @ 1000 us, 1 processor) > total alloc = 15,713,590,808 bytes (excludes profiling overheads) 10,13c10,12 < lenIterWhile.len Main 93.8 0.0 < collatzMax Main 2.2 93.7 < collatzNext Main 2.0 0.0 < lenIterWhile Main 1.5 6.2 --- > collatzNext Main 79.6 89.4 > lenIterWhile.len Main 18.9 8.8 > collatzMax Main 0.8 1.5

-- GHC 7.8.4 Rec { Main.$wlen [Occ=LoopBreaker] :: GHC.Prim.Word# -> GHC.Prim.Int# -> GHC.Prim.Int# [GblId, Arity=2, Caf=NoCafRefs, Str=DmdType <S,1*U><L,U>] Main.$wlen = \ (ww_s4Mn :: GHC.Prim.Word#) (ww1_s4Mr :: GHC.Prim.Int#) -> case ww_s4Mn of wild_XQ { __DEFAULT -> case GHC.Prim.remWord# wild_XQ (__word 2) of _ [Occ=Dead] { __DEFAULT -> Main.$wlen (GHC.Prim.quotWord# (GHC.Prim.narrow32Word# (GHC.Prim.plusWord# (GHC.Prim.narrow32Word# (GHC.Prim.timesWord# (__word 3) wild_XQ)) (__word 1))) (__word 2)) (GHC.Prim.+# ww1_s4Mr 1); __word 0 -> Main.$wlen (GHC.Prim.quotWord# wild_XQ (__word 2)) (GHC.Prim.+# ww1_s4Mr 1) }; __word 1 -> ww1_s4Mr } end Rec }

Rec { $wlen_r6Sy :: GHC.Prim.Word# -> GHC.Prim.Int# -> GHC.Prim.Int# [GblId, Arity=2, Str=DmdType <S,U><L,U>] $wlen_r6Sy = \ (ww_s60s :: GHC.Prim.Word#) (ww1_s60w :: GHC.Prim.Int#) -> case ww_s60s of wild_X1Z { __DEFAULT -> case even @ Word32 GHC.Word.$fIntegralWord32 (GHC.Word.W32# wild_X1Z) of _ [Occ=Dead] { False -> $wlen_r6Sy (GHC.Prim.quotWord# (GHC.Prim.narrow32Word# (GHC.Prim.plusWord# (GHC.Prim.narrow32Word# (GHC.Prim.timesWord# (__word 3) wild_X1Z)) (__word 1))) (__word 2)) (GHC.Prim.+# ww1_s60w 1); True -> $wlen_r6Sy (GHC.Prim.quotWord# wild_X1Z (__word 2)) (GHC.Prim.+# ww1_s60w 1) }; __word 1 -> ww1_s60w } end Rec }

$ stack --resolver=ghc-7.10 ghc -- Test.hs -O2 -fforce-recomp -prof -fprof-auto -auto-all $ ./Test +RTS -s -p -RTS (329,837799) 416,119,240 bytes allocated in the heap 69,760 bytes copied during GC 59,368 bytes maximum residency (2 sample(s)) 21,912 bytes maximum slop 1 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 800 colls, 0 par 0.000s 0.002s 0.0000s 0.0001s Gen 1 2 colls, 0 par 0.000s 0.000s 0.0002s 0.0003s INIT time 0.000s ( 0.019s elapsed) MUT time 2.500s ( 2.546s elapsed) GC time 0.000s ( 0.003s elapsed) RP time 0.000s ( 0.000s elapsed) PROF time 0.000s ( 0.000s elapsed) EXIT time 0.000s ( 0.000s elapsed) Total time 2.500s ( 2.567s elapsed) %GC time 0.0% (0.1% elapsed) Alloc rate 166,447,696 bytes per MUT second Productivity 100.0% of total user, 97.4% of total elapsed $ cat Test.prof Fri Mar 11 20:22 2016 Time and Allocation Profiling Report (Final) Test.exe +RTS -s -p -RTS 1000000 total time = 2.54 secs (2535 ticks @ 1000 us, 1 processor) total alloc = 256,066,984 bytes (excludes profiling overheads) COST CENTRE MODULE %time %alloc lenIterWhile.len Main 94.4 0.0 main Main 1.9 93.7 collatzNext Main 1.8 0.0 lenIterWhile Main 1.3 6.2 individual inherited COST CENTRE MODULE no. entries %time %alloc %time %alloc MAIN MAIN 44 0 0.0 0.0 100.0 100.0 main Main 89 0 1.9 93.7 100.0 100.0 main.\ Main 92 1000000 0.4 0.0 98.1 6.2 collatzLen Main 93 1000000 0.2 0.0 97.8 6.2 lenIterWhile Main 94 1000000 1.3 6.2 97.5 6.2 lenIterWhile.len Main 95 88826840 94.4 0.0 96.2 0.0 collatzNext Main 96 87826840 1.8 0.0 1.8 0.0 main.max_a0 Main 90 1 0.0 0.0 0.0 0.0 CAF GHC.IO.Encoding.CodePage 73 0 0.0 0.0 0.0 0.0 CAF System.Environment 64 0 0.0 0.0 0.0 0.0 CAF GHC.IO.Handle.Text 62 0 0.0 0.0 0.0 0.0 CAF GHC.IO.Encoding 61 0 0.0 0.0 0.0 0.0

Ответ 1