Кэширование IEnumerable

public IEnumerable<ModuleData> ListModules()
{
    foreach (XElement m in Source.Descendants("Module"))
    {
        yield return new ModuleData(m.Element("ModuleID").Value);
    }
}

Изначально вышеуказанный код велик, поскольку нет необходимости оценивать всю коллекцию, если она не нужна.

Однако, как только все Модули были перечислены один раз, становится все дороже повторно запрашивать XDocument, когда нет изменений.

Итак, как улучшение производительности:

public IEnumerable<ModuleData> ListModules()
{
    if (Modules == null)
    {
        Modules = new List<ModuleData>();
        foreach (XElement m in Source.Descendants("Module"))
        {
            Modules.Add(new ModuleData(m.Element("ModuleID").Value, 1, 1));
        }
    }
    return Modules;
}

Это здорово, если я неоднократно использую весь список, но не так сильно.

Есть ли промежуточная площадка, где я могу вернуть результат до тех пор, пока весь список не будет повторен, затем кешируйте его и обслуживайте кеш для последующих запросов?

Ответ 1

Вы можете посмотреть Сохранение состояния перечислений, в котором описывается, как создать ленивый список (который кэшируется после итерации).

Ответ 2

Проверьте MemoizeAll() в библиотеке Reactive Extensions for.NET (Rx). Поскольку это оценивается лениво, вы можете безопасно настроить его во время строительства и просто вернуть Modules из ListModules():

Modules = Source.
    Descendants("Module").
    Select(m => new ModuleData(m.Element("ModuleID").Value, 1, 1)).
    MemoizeAll();

Там есть хорошее объяснение MemoizeAll() (и некоторых других менее очевидных расширений Rx) здесь.

Ответ 3

Я видел несколько реализаций там, некоторые старше и не использующие новейшие классы .Net, некоторые из которых слишком сложны для моих нужд. Я закончил с самым кратким и декларативным кодом, который я смог собрать, который добавился к классу с примерно 15 строками (фактического) кода. Он, похоже, хорошо согласуется с потребностями OP:

Изменить: вторая ревизия, улучшенная поддержка пустых перечислений

/// <summary>
/// A <see cref="IEnumerable{T}"/> that caches every item upon first enumeration.
/// </summary>
/// <seealso cref="http://blogs.msdn.com/b/matt/archive/2008/03/14/digging-deeper-into-lazy-and-functional-c.aspx"/>
/// <seealso cref="http://blogs.msdn.com/b/wesdyer/archive/2007/02/13/the-virtues-of-laziness.aspx"/>
public class CachedEnumerable<T> : IEnumerable<T> {
  private readonly bool _hasItem; // Needed so an empty enumerable will not return null but an actual empty enumerable.
  private readonly T _item;
  private readonly Lazy<CachedEnumerable<T>> _nextItems;

  /// <summary>
  /// Initialises a new instance of <see cref="CachedEnumerable{T}"/> using <paramref name="item"/> as the current item
  /// and <paramref name="nextItems"/> as a value factory for the <see cref="CachedEnumerable{T}"/> containing the next items.
  /// </summary>
  protected internal CachedEnumerable(T item, Func<CachedEnumerable<T>> nextItems) {
    _hasItem = true;
    _item = item;
    _nextItems = new Lazy<CachedEnumerable<T>>(nextItems);
  }

  /// <summary>
  /// Initialises a new instance of <see cref="CachedEnumerable{T}"/> with no current item and no next items.
  /// </summary>
  protected internal CachedEnumerable() {
    _hasItem = false;
  }

  /// <summary>
  /// Instantiates and returns a <see cref="CachedEnumerable{T}"/> for a given <paramref name="enumerable"/>.
  /// Notice: The first item is always iterated through.
  /// </summary>
  public static CachedEnumerable<T> Create(IEnumerable<T> enumerable) {
    return Create(enumerable.GetEnumerator());
  }

  /// <summary>
  /// Instantiates and returns a <see cref="CachedEnumerable{T}"/> for a given <paramref name="enumerator"/>.
  /// Notice: The first item is always iterated through.
  /// </summary>
  private static CachedEnumerable<T> Create(IEnumerator<T> enumerator) {
    return enumerator.MoveNext() ? new CachedEnumerable<T>(enumerator.Current, () => Create(enumerator)) : new CachedEnumerable<T>();
  }

  /// <summary>
  /// Returns an enumerator that iterates through the collection.
  /// </summary>
  public IEnumerator<T> GetEnumerator() {
    if (_hasItem) {
      yield return _item;

      var nextItems = _nextItems.Value;
      if (nextItems != null) {
        foreach (var nextItem in nextItems) {
          yield return nextItem;
        }
      }
    }
  }

  /// <summary>
  /// Returns an enumerator that iterates through a collection.
  /// </summary>
  IEnumerator IEnumerable.GetEnumerator() {
    return GetEnumerator();
  }
}

Полезным методом расширения может быть:

public static class IEnumerableExtensions {
  /// <summary>
  /// Instantiates and returns a <see cref="CachedEnumerable{T}"/> for a given <paramref name="enumerable"/>.
  /// Notice: The first item is always iterated through.
  /// </summary>
  public static CachedEnumerable<T> ToCachedEnumerable<T>(this IEnumerable<T> enumerable) {
    return CachedEnumerable<T>.Create(enumerable);
  }
}

И для тестеров модулей среди вас: (если вы не используете resharper, просто выньте атрибуты [SuppressMessage])

/// <summary>
/// Tests the <see cref="CachedEnumerable{T}"/> class.
/// </summary>
[TestFixture]
public class CachedEnumerableTest {
  private int _count;

  /// <remarks>
  /// This test case is only here to emphasise the problem with <see cref="IEnumerable{T}"/> which <see cref="CachedEnumerable{T}"/> attempts to solve.
  /// </remarks>
  [Test]
  [SuppressMessage("ReSharper", "PossibleMultipleEnumeration")]
  [SuppressMessage("ReSharper", "ReturnValueOfPureMethodIsNotUsed")]
  public void MultipleEnumerationAreNotCachedForOriginalIEnumerable() {
    _count = 0;

    var enumerable = Enumerable.Range(1, 40).Select(IncrementCount);

    enumerable.Take(3).ToArray();
    enumerable.Take(10).ToArray();
    enumerable.Take(4).ToArray();

    Assert.AreEqual(17, _count);
  }

  /// <remarks>
  /// This test case is only here to emphasise the problem with <see cref="IList{T}"/> which <see cref="CachedEnumerable{T}"/> attempts to solve.
  /// </remarks>
  [Test]
  [SuppressMessage("ReSharper", "PossibleMultipleEnumeration")]
  [SuppressMessage("ReSharper", "ReturnValueOfPureMethodIsNotUsed")]
  public void EntireListIsEnumeratedForOriginalListOrArray() {
    _count = 0;
    Enumerable.Range(1, 40).Select(IncrementCount).ToList();
    Assert.AreEqual(40, _count);

    _count = 0;
    Enumerable.Range(1, 40).Select(IncrementCount).ToArray();
    Assert.AreEqual(40, _count);
  }

  [Test]
  [SuppressMessage("ReSharper", "ReturnValueOfPureMethodIsNotUsed")]
  public void MultipleEnumerationsAreCached() {
    _count = 0;

    var cachedEnumerable = Enumerable.Range(1, 40).Select(IncrementCount).ToCachedEnumerable();

    cachedEnumerable.Take(3).ToArray();
    cachedEnumerable.Take(10).ToArray();
    cachedEnumerable.Take(4).ToArray();

    Assert.AreEqual(10, _count);
  }

  [Test]
  public void FreshCachedEnumerableDoesNotEnumerateExceptFirstItem() {
    _count = 0;

    Enumerable.Range(1, 40).Select(IncrementCount).ToCachedEnumerable();

    Assert.AreEqual(1, _count);
  }

  /// <remarks>
  /// Based on Jon Skeet test mentioned here: http://www.siepman.nl/blog/post/2013/10/09/LazyList-A-better-LINQ-result-cache-than-List.aspx
  /// </remarks>
  [Test]
  [SuppressMessage("ReSharper", "LoopCanBeConvertedToQuery")]
  public void MatrixEnumerationIteratesAsExpectedWhileStillKeepingEnumeratedValuesCached() {
    _count = 0;

    var cachedEnumerable = Enumerable.Range(1, 5).Select(IncrementCount).ToCachedEnumerable();

    var matrixCount = 0;

    foreach (var x in cachedEnumerable) {
      foreach (var y in cachedEnumerable) {
        matrixCount++;
      }
    }

    Assert.AreEqual(5, _count);
    Assert.AreEqual(25, matrixCount);
  }

  [Test]
  public void OrderingCachedEnumerableWorksAsExpectedWhileStillKeepingEnumeratedValuesCached() {
    _count = 0;

    var cachedEnumerable = Enumerable.Range(1, 5).Select(IncrementCount).ToCachedEnumerable();

    var orderedEnumerated = cachedEnumerable.OrderBy(x => x);
    var orderedEnumeratedArray = orderedEnumerated.ToArray(); // Enumerated first time in ascending order.
    Assert.AreEqual(5, _count);

    for (int i = 0; i < orderedEnumeratedArray.Length; i++) {
      Assert.AreEqual(i + 1, orderedEnumeratedArray[i]);
    }

    var reorderedEnumeratedArray = orderedEnumerated.OrderByDescending(x => x).ToArray(); // Enumerated second time in descending order.
    Assert.AreEqual(5, _count);

    for (int i = 0; i < reorderedEnumeratedArray.Length; i++) {
      Assert.AreEqual(5 - i, reorderedEnumeratedArray[i]);
    }
  }

  private int IncrementCount(int value) {
    _count++;
    return value;
  }
}

Ответ 4

Мне нравится @tsemer. Но я хотел бы предложить свои решения, которые не имеют ничего общего с FP. Это наивный подход, но он генерирует намного меньше распределений. И он не является потокобезопасным.

public class CachedEnumerable<T> : IEnumerable<T>, IDisposable
{
    IEnumerator<T> _enumerator;
    readonly List<T> _cache = new List<T>();

    public CachedEnumerable(IEnumerable<T> enumerable) 
        : this(enumerable.GetEnumerator())
    {
    }

    public CachedEnumerable(IEnumerator<T> enumerator)
    {
        _enumerator = enumerator;
    }

    public IEnumerator<T> GetEnumerator()
    {
        // The index of the current item in the cache.
        int index = 0;

        // Enumerate the _cache first
        for (; index < _cache.Count; index++)
        {
            yield return _cache[index];
        }

        // Continue enumeration of the original _enumerator, 
        // until it is finished. 
        // This adds items to the cache and increment 
        for (; _enumerator != null && _enumerator.MoveNext(); index++)
        {
            var current = _enumerator.Current;
            _cache.Add(current);
            yield return current;
        }

        if (_enumerator != null)
        {
            _enumerator.Dispose();
            _enumerator = null;
        }

        // Some other users of the same instance of CachedEnumerable
        // can add more items to the cache, 
        // so we need to enumerate them as well
        for (; index < _cache.Count; index++)
        {
            yield return _cache[index];
        }
    }

    public void Dispose()
    {
        if (_enumerator != null)
        {
            _enumerator.Dispose();
            _enumerator = null;
        }
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
}

Вот как будет работать матричный тест из ответа @tsemer:

var ints = new [] { 1, 2, 3, 4, 5 };
var cachedEnumerable = new CachedEnumerable<int>(ints); 
foreach (var x in cachedEnumerable)
{
    foreach (var y in cachedEnumerable)
    {
        //Do something
    }
}

Внешний цикл (x) пропускает сначала for, потому что _cache пуст;
x извлекает один элемент из _enumerator в _cache;
x приостанавливается до второго цикла for;
Внутренний цикл (y) перечисляет один элемент из _cache;
y извлекает все элементы из _enumerator в _cache;
y пропускает третий цикл for, потому что его переменная index равна 5;
x возобновляет, его index равно 1. Он пропускает второй цикл for, потому что _enumerator завершен;
x перечисляет один элемент из _cache с использованием третьего цикла for;
x приостанавливается до третьего for;
y перечисляет 5 элементов из _cache, используя первый цикл for;
y пропускает второй цикл for, потому что _enumerator завершен;
y пропускает третий цикл for, потому что index of y равно 5;
x возобновляет, увеличивает index. Он извлекает один элемент из _cache, используя третий цикл for.
x паузы.
если index переменная x меньше 5, тогда перейдите к 10;
конец.

Ответ 5

Мне очень нравится хазикский ответ... приятный и простой всегда есть путь. НО есть ошибка в GetEnumerator

он вроде понимает, что есть проблема, и именно поэтому существует странный третий цикл после второго цикла перечислителя.... но это не так просто. Проблема, которая вызывает необходимость в третьем цикле, является общей... поэтому она должна быть рекурсивной.

Ответ, хотя выглядит еще проще.

    public IEnumerator<T> GetEnumerator()
    {
        int index = 0;

        while (true)
        {
            if (index < _cache.Count)
            {
                yield return _cache[index];
                index = index + 1;
            }
            else
            {
                if (_enumerator.MoveNext())
                {
                    _cache.Add(_enumerator.Current);
                }
                else
                {
                    yield break;
                }
            }
        }
    }

да, вы можете сделать его немного более эффективным, уступив текущему... но я возьму микросекундный хит... это только когда-либо случается один раз за элемент.

и его не потокобезопасный... но кто заботится об этом.

Ответ 6

Я не вижу серьезных проблем с идеей кэширования результатов в списке, как в приведенном выше коде. Вероятно, было бы лучше построить список, используя метод ToList().

public IEnumerable<ModuleData> ListModules()
{
    if (Modules == null)
    {
        Modules = Source.Descendants("Module")
                      .Select(m => new ModuleData(m.Element("ModuleID").Value, 1, 1)))
                      .ToList();
    }
    return Modules;
}