Подтвердить что ты не робот

Оптимизируйте производительность членства в словаре для списка ключей

Я пытаюсь написать код, который должен возвращать true, если какой-либо элемент списка присутствует в словаре. Выполнение этой части действительно важно. Я знаю, что могу просто перебрать список и сломаться, если найду первый хит для поиска. Есть ли более быстрый или более путинский способ для этого, чем приведенный ниже?

for x in someList:
     if x in someDict:
           return True
return False

EDIT: Я использую Python 2.7. Мое первое предпочтение было бы более быстрым методом.

4b9b3361

Ответ 1

Использование встроенного any может иметь некоторое преимущество производительности над двумя циклами

any(x in someDict for x in someList)

но вам, возможно, потребуется измерить свой пробег. Если ваш список и dict остаются довольно статичными, и вам нужно выполнить сравнение несколько раз, вы можете рассмотреть возможность использования set

someSet = set(someList) 
someDict.viewkeys() & someSet 

Примечание Python 3.X по умолчанию возвращает представления, а не последовательность, поэтому при использовании Python 3.X

someSet = set(someList) 
someDict.keys() & someSet 

В обоих случаях вы можете обернуть результат с помощью bool для получения логического результата

bool(someDict.keys() & set(someSet ))

Heretic Note

Мое любопытство улучшило меня, и я приурочил все предлагаемые решения. Кажется, что ваше оригинальное решение лучше. Вот результат

Пример произвольно сгенерированного ввода

def test_data_gen():
    from random import sample
    for i in range(1,5):
        n = 10**i
        population = set(range(1,100000))
        some_list = sample(list(population),n)
        population.difference_update(some_list)
        some_dict = dict(zip(sample(population,n),
                             sample(range(1,100000),n)))
        yield "Population Size of {}".format(n), (some_list, some_dict), {}

Тест-модуль

Я переписал тестовую часть ответа, поскольку это было грязно, и ответ получал довольно приличное внимание. Я создал модуль сравнения python timeit и переместил его на github

Результат теста

Timeit repeated for 10 times
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
======================================
Test Run for Population Size of 10
======================================
|Rank  |FunctionName         |Result    |Description
+------+---------------------+----------+-----------------------------------------------
|     1|foo_nested           |0.000011  |Original OPs Code
+------+---------------------+----------+-----------------------------------------------
|     2|foo_ifilter_any      |0.000014  |any(ifilter(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
|     3|foo_ifilter_not_not  |0.000015  |not not next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
|     4|foo_imap_any         |0.000018  |any(imap(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
|     5|foo_any              |0.000019  |any(x in some_dict for x in some_list)
+------+---------------------+----------+-----------------------------------------------
|     6|foo_ifilter_next     |0.000022  |bool(next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
|     7|foo_set_ashwin       |0.000024  |not set(some_dct).isdisjoint(some_lst)
+------+---------------------+----------+-----------------------------------------------
|     8|foo_set              |0.000047  |some_dict.viewkeys() & set(some_list )
======================================
Test Run for Population Size of 100
======================================
|Rank  |FunctionName         |Result    |Description
+------+---------------------+----------+-----------------------------------------------
|     1|foo_ifilter_any      |0.000071  |any(ifilter(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
|     2|foo_nested           |0.000072  |Original OPs Code
+------+---------------------+----------+-----------------------------------------------
|     3|foo_ifilter_not_not  |0.000073  |not not next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
|     4|foo_ifilter_next     |0.000076  |bool(next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
|     5|foo_imap_any         |0.000082  |any(imap(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
|     6|foo_any              |0.000092  |any(x in some_dict for x in some_list)
+------+---------------------+----------+-----------------------------------------------
|     7|foo_set_ashwin       |0.000170  |not set(some_dct).isdisjoint(some_lst)
+------+---------------------+----------+-----------------------------------------------
|     8|foo_set              |0.000638  |some_dict.viewkeys() & set(some_list )
======================================
Test Run for Population Size of 1000
======================================
|Rank  |FunctionName         |Result    |Description
+------+---------------------+----------+-----------------------------------------------
|     1|foo_ifilter_not_not  |0.000746  |not not next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
|     2|foo_ifilter_any      |0.000746  |any(ifilter(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
|     3|foo_ifilter_next     |0.000752  |bool(next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
|     4|foo_nested           |0.000771  |Original OPs Code
+------+---------------------+----------+-----------------------------------------------
|     5|foo_set_ashwin       |0.000838  |not set(some_dct).isdisjoint(some_lst)
+------+---------------------+----------+-----------------------------------------------
|     6|foo_imap_any         |0.000842  |any(imap(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
|     7|foo_any              |0.000933  |any(x in some_dict for x in some_list)
+------+---------------------+----------+-----------------------------------------------
|     8|foo_set              |0.001702  |some_dict.viewkeys() & set(some_list )
======================================
Test Run for Population Size of 10000
======================================
|Rank  |FunctionName         |Result    |Description
+------+---------------------+----------+-----------------------------------------------
|     1|foo_nested           |0.007195  |Original OPs Code
+------+---------------------+----------+-----------------------------------------------
|     2|foo_ifilter_next     |0.007410  |bool(next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
|     3|foo_ifilter_any      |0.007491  |any(ifilter(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
|     4|foo_ifilter_not_not  |0.007671  |not not next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
|     5|foo_set_ashwin       |0.008385  |not set(some_dct).isdisjoint(some_lst)
+------+---------------------+----------+-----------------------------------------------
|     6|foo_imap_any         |0.011327  |any(imap(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
|     7|foo_any              |0.011533  |any(x in some_dict for x in some_list)
+------+---------------------+----------+-----------------------------------------------
|     8|foo_set              |0.018313  |some_dict.viewkeys() & set(some_list )
Timeit repeated for 100 times
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
======================================
Test Run for Population Size of 10
======================================
|Rank  |FunctionName         |Result    |Description
+------+---------------------+----------+-----------------------------------------------
|     1|foo_nested           |0.000098  |Original OPs Code
+------+---------------------+----------+-----------------------------------------------
|     2|foo_ifilter_any      |0.000124  |any(ifilter(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
|     3|foo_ifilter_not_not  |0.000131  |not not next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
|     4|foo_imap_any         |0.000142  |any(imap(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
|     5|foo_ifilter_next     |0.000151  |bool(next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
|     6|foo_any              |0.000158  |any(x in some_dict for x in some_list)
+------+---------------------+----------+-----------------------------------------------
|     7|foo_set_ashwin       |0.000186  |not set(some_dct).isdisjoint(some_lst)
+------+---------------------+----------+-----------------------------------------------
|     8|foo_set              |0.000496  |some_dict.viewkeys() & set(some_list )
======================================
Test Run for Population Size of 100
======================================
|Rank  |FunctionName         |Result    |Description
+------+---------------------+----------+-----------------------------------------------
|     1|foo_ifilter_any      |0.000661  |any(ifilter(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
|     2|foo_ifilter_not_not  |0.000677  |not not next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
|     3|foo_nested           |0.000683  |Original OPs Code
+------+---------------------+----------+-----------------------------------------------
|     4|foo_ifilter_next     |0.000684  |bool(next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
|     5|foo_imap_any         |0.000762  |any(imap(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
|     6|foo_any              |0.000854  |any(x in some_dict for x in some_list)
+------+---------------------+----------+-----------------------------------------------
|     7|foo_set_ashwin       |0.001291  |not set(some_dct).isdisjoint(some_lst)
+------+---------------------+----------+-----------------------------------------------
|     8|foo_set              |0.005018  |some_dict.viewkeys() & set(some_list )
======================================
Test Run for Population Size of 1000
======================================
|Rank  |FunctionName         |Result    |Description
+------+---------------------+----------+-----------------------------------------------
|     1|foo_ifilter_any      |0.007585  |any(ifilter(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
|     2|foo_nested           |0.007713  |Original OPs Code
+------+---------------------+----------+-----------------------------------------------
|     3|foo_set_ashwin       |0.008256  |not set(some_dct).isdisjoint(some_lst)
+------+---------------------+----------+-----------------------------------------------
|     4|foo_ifilter_not_not  |0.008526  |not not next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
|     5|foo_any              |0.009422  |any(x in some_dict for x in some_list)
+------+---------------------+----------+-----------------------------------------------
|     6|foo_ifilter_next     |0.010259  |bool(next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
|     7|foo_imap_any         |0.011414  |any(imap(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
|     8|foo_set              |0.019862  |some_dict.viewkeys() & set(some_list )
======================================
Test Run for Population Size of 10000
======================================
|Rank  |FunctionName         |Result    |Description
+------+---------------------+----------+-----------------------------------------------
|     1|foo_imap_any         |0.082221  |any(imap(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
|     2|foo_ifilter_any      |0.083573  |any(ifilter(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
|     3|foo_nested           |0.095736  |Original OPs Code
+------+---------------------+----------+-----------------------------------------------
|     4|foo_set_ashwin       |0.103427  |not set(some_dct).isdisjoint(some_lst)
+------+---------------------+----------+-----------------------------------------------
|     5|foo_ifilter_next     |0.104589  |bool(next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
|     6|foo_ifilter_not_not  |0.117974  |not not next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
|     7|foo_any              |0.127739  |any(x in some_dict for x in some_list)
+------+---------------------+----------+-----------------------------------------------
|     8|foo_set              |0.208228  |some_dict.viewkeys() & set(some_list )
Timeit repeated for 1000 times
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
======================================
Test Run for Population Size of 10
======================================
|Rank  |FunctionName         |Result    |Description
+------+---------------------+----------+-----------------------------------------------
|     1|foo_nested           |0.000953  |Original OPs Code
+------+---------------------+----------+-----------------------------------------------
|     2|foo_ifilter_any      |0.001134  |any(ifilter(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
|     3|foo_ifilter_not_not  |0.001213  |not not next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
|     4|foo_ifilter_next     |0.001340  |bool(next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
|     5|foo_imap_any         |0.001407  |any(imap(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
|     6|foo_any              |0.001535  |any(x in some_dict for x in some_list)
+------+---------------------+----------+-----------------------------------------------
|     7|foo_set_ashwin       |0.002252  |not set(some_dct).isdisjoint(some_lst)
+------+---------------------+----------+-----------------------------------------------
|     8|foo_set              |0.004701  |some_dict.viewkeys() & set(some_list )
======================================
Test Run for Population Size of 100
======================================
|Rank  |FunctionName         |Result    |Description
+------+---------------------+----------+-----------------------------------------------
|     1|foo_ifilter_any      |0.006209  |any(ifilter(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
|     2|foo_ifilter_next     |0.006411  |bool(next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
|     3|foo_ifilter_not_not  |0.006657  |not not next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
|     4|foo_nested           |0.006727  |Original OPs Code
+------+---------------------+----------+-----------------------------------------------
|     5|foo_imap_any         |0.007562  |any(imap(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
|     6|foo_any              |0.008262  |any(x in some_dict for x in some_list)
+------+---------------------+----------+-----------------------------------------------
|     7|foo_set_ashwin       |0.012260  |not set(some_dct).isdisjoint(some_lst)
+------+---------------------+----------+-----------------------------------------------
|     8|foo_set              |0.046773  |some_dict.viewkeys() & set(some_list )
======================================
Test Run for Population Size of 1000
======================================
|Rank  |FunctionName         |Result    |Description
+------+---------------------+----------+-----------------------------------------------
|     1|foo_ifilter_not_not  |0.071888  |not not next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
|     2|foo_ifilter_next     |0.072150  |bool(next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
|     3|foo_nested           |0.073382  |Original OPs Code
+------+---------------------+----------+-----------------------------------------------
|     4|foo_ifilter_any      |0.075698  |any(ifilter(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
|     5|foo_set_ashwin       |0.077367  |not set(some_dct).isdisjoint(some_lst)
+------+---------------------+----------+-----------------------------------------------
|     6|foo_imap_any         |0.090623  |any(imap(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
|     7|foo_any              |0.093301  |any(x in some_dict for x in some_list)
+------+---------------------+----------+-----------------------------------------------
|     8|foo_set              |0.177051  |some_dict.viewkeys() & set(some_list )
======================================
Test Run for Population Size of 10000
======================================
|Rank  |FunctionName         |Result    |Description
+------+---------------------+----------+-----------------------------------------------
|     1|foo_nested           |0.701317  |Original OPs Code
+------+---------------------+----------+-----------------------------------------------
|     2|foo_ifilter_next     |0.706156  |bool(next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
|     3|foo_ifilter_any      |0.723368  |any(ifilter(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
|     4|foo_ifilter_not_not  |0.746650  |not not next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
|     5|foo_set_ashwin       |0.776704  |not set(some_dct).isdisjoint(some_lst)
+------+---------------------+----------+-----------------------------------------------
|     6|foo_imap_any         |0.832117  |any(imap(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
|     7|foo_any              |0.881777  |any(x in some_dict for x in some_list)
+------+---------------------+----------+-----------------------------------------------
|     8|foo_set              |1.665962  |some_dict.viewkeys() & set(some_list )
Timeit repeated for 10000 times
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
======================================
Test Run for Population Size of 10
======================================
|Rank  |FunctionName         |Result    |Description
+------+---------------------+----------+-----------------------------------------------
|     1|foo_nested           |0.010581  |Original OPs Code
+------+---------------------+----------+-----------------------------------------------
|     2|foo_ifilter_any      |0.013512  |any(ifilter(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
|     3|foo_imap_any         |0.015321  |any(imap(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
|     4|foo_ifilter_not_not  |0.017680  |not not next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
|     5|foo_ifilter_next     |0.019334  |bool(next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
|     6|foo_any              |0.026274  |any(x in some_dict for x in some_list)
+------+---------------------+----------+-----------------------------------------------
|     7|foo_set_ashwin       |0.030881  |not set(some_dct).isdisjoint(some_lst)
+------+---------------------+----------+-----------------------------------------------
|     8|foo_set              |0.053605  |some_dict.viewkeys() & set(some_list )
======================================
Test Run for Population Size of 100
======================================
|Rank  |FunctionName         |Result    |Description
+------+---------------------+----------+-----------------------------------------------
|     1|foo_nested           |0.070194  |Original OPs Code
+------+---------------------+----------+-----------------------------------------------
|     2|foo_ifilter_not_not  |0.078524  |not not next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
|     3|foo_ifilter_any      |0.079499  |any(ifilter(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
|     4|foo_imap_any         |0.087349  |any(imap(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
|     5|foo_ifilter_next     |0.093970  |bool(next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
|     6|foo_any              |0.097948  |any(x in some_dict for x in some_list)
+------+---------------------+----------+-----------------------------------------------
|     7|foo_set_ashwin       |0.130725  |not set(some_dct).isdisjoint(some_lst)
+------+---------------------+----------+-----------------------------------------------
|     8|foo_set              |0.480841  |some_dict.viewkeys() & set(some_list )
======================================
Test Run for Population Size of 1000
======================================
|Rank  |FunctionName         |Result    |Description
+------+---------------------+----------+-----------------------------------------------
|     1|foo_ifilter_any      |0.754491  |any(ifilter(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
|     2|foo_ifilter_not_not  |0.756253  |not not next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
|     3|foo_ifilter_next     |0.771382  |bool(next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
|     4|foo_nested           |0.787152  |Original OPs Code
+------+---------------------+----------+-----------------------------------------------
|     5|foo_set_ashwin       |0.818520  |not set(some_dct).isdisjoint(some_lst)
+------+---------------------+----------+-----------------------------------------------
|     6|foo_imap_any         |0.902947  |any(imap(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
|     7|foo_any              |1.001810  |any(x in some_dict for x in some_list)
+------+---------------------+----------+-----------------------------------------------
|     8|foo_set              |2.012781  |some_dict.viewkeys() & set(some_list )
=======================================
Test Run for Population Size of 10000
=======================================
|Rank  |FunctionName         |Result     |Description
+------+---------------------+-----------+-----------------------------------------------
|     1|foo_imap_any         |10.071469  |any(imap(some_dict.__contains__, some_list))
+------+---------------------+-----------+-----------------------------------------------
|     2|foo_any              |11.127034  |any(x in some_dict for x in some_list)
+------+---------------------+-----------+-----------------------------------------------
|     3|foo_set              |18.881414  |some_dict.viewkeys() & set(some_list )
+------+---------------------+-----------+-----------------------------------------------
|     4|foo_nested           |8.731133   |Original OPs Code
+------+---------------------+-----------+-----------------------------------------------
|     5|foo_ifilter_not_not  |9.019190   |not not next(ifilter(some_dict.__contains__...
+------+---------------------+-----------+-----------------------------------------------
|     6|foo_ifilter_next     |9.189966   |bool(next(ifilter(some_dict.__contains__...
+------+---------------------+-----------+-----------------------------------------------
|     7|foo_set_ashwin       |9.363886   |not set(some_dct).isdisjoint(some_lst)
+------+---------------------+-----------+-----------------------------------------------
|     8|foo_ifilter_any      |9.442759   |any(ifilter(some_dict.__contains__, some_list))

И графическое сравнение с вышеупомянутым модулем

enter image description hereenter image description hereenter image description hereenter image description here

Заключение

Преждевременная оптимизация - это зло. Очевидно, что ни одно из решений не имеет оптимальной производительности при изменении тестового домена. В зависимости от размера популяции и частоты итераций производительность решений значительно варьируется. Результат снова говорит о том, что в Python следует убедиться, что код должен быть доступен для чтения, а не гарантировать, что этот код является отличным или оптимизированным для производительности в определенных случаях, но тогда он не может быть масштабируемым.

Примечание Были некоторые сомнения в том, почему не использовать ifilter лучше, чем остальные

"В ответе Абхита он приурочил различные подходы и обнаружил, что ifilter/next был не самым быстрым; любая идея, почему это будет так?"

Известно, что в python при вызове C-функций есть накладные расходы, и если размер популяции низкий, а частота итерации высока, накопление накладных расходов функции C будет медленно отображаться. Как видно на графиках, где размер населения мал, но итерация высока, используя ifilter, производительность значительно отличается.

Ответ 2

Самый чистый и быстрый способ - использовать any() и itertools.ifilter():

any(ifilter(someDict.__contains__, someList))

В этом коде используется:

  • связанный метод, someDict.__contains__ как предикат
  • ifilter() itertool лениво сканирует истинный результат при скорости C
  • любой() встроенный диск фильтрует, чтобы найти одно совпадающее вхождение или вернуть False, если итератор исчерпан.

Вы также можете использовать itertools.imap() вместо ifilter().

Ответ 3

s = set(someList)
return bool(s.intersection(someDict.keys()))