Эффективный способ генерации комбинаций, упорядоченных путем увеличения суммы индексов

Для эвристического алгоритма мне нужно оценить один за другим комбинации определенного набора, пока не достигнет критерия остановки.

Поскольку их много, на данный момент я их генерирую, используя следующий блок памяти итератора с памятью (вдохновленный python itertools.combinations):

public static IEnumerable<T[]> GetCombinations<T>(this IList<T> pool, int r)
{
    int n = pool.Count;
    if (r > n)
        throw new ArgumentException("r cannot be greater than pool size");
    int[] indices = Enumerable.Range(0, r).ToArray();
    yield return indices.Select(idx => pool[idx]).ToArray();
    while (true)
    {
        int i;
        for (i = r - 1; i >= 0; i--)
            if (indices[i] != i + n - r)
                break;
        if (i < 0)
            break;
        indices[i] += 1;
        for (int j = i + 1; j < r; j++)
            indices[j] = indices[j - 1] + 1;
        yield return indices.Select(idx => pool[idx]).ToArray();
    }
}

Проблема заключается в том, чтобы значительно повысить эффективность моей эвристики, мне нужно было бы сгенерировать эти комбинации, отсортированные по сумме их индексов (другими словами, мне нужно сначала создать, комбинации, содержащие первые элементы набора).

например.
Рассмотрим множество S = {0,1,2,3,4,5}
(Я выбираю этот набор для простоты, так как элементы и их индексы совпадают).
Все возможные комбинации чисел r=4, генерируемые данным алгоритмом:

(0, 1, 2, 3)  SUM:  6
(0, 1, 2, 4)  SUM:  7
(0, 1, 2, 5)  SUM:  8
(0, 1, 3, 4)  SUM:  8
(0, 1, 3, 5)  SUM:  9
(0, 1, 4, 5)  SUM: 10
(0, 2, 3, 4)  SUM:  9
(0, 2, 3, 5)  SUM: 10
(0, 2, 4, 5)  SUM: 11
(0, 3, 4, 5)  SUM: 12
(1, 2, 3, 4)  SUM: 10
(1, 2, 3, 5)  SUM: 11
(1, 2, 4, 5)  SUM: 12
(1, 3, 4, 5)  SUM: 13
(2, 3, 4, 5)  SUM: 14

где, как вы видите, комбинации не сортируются строго по восходящей сумме.

Желаемый результат - это следующее:
(порядок комбинаций с одинаковой суммой не важен)

(0, 1, 2, 3)  SUM:  6
(0, 1, 2, 4)  SUM:  7
(0, 1, 2, 5)  SUM:  8
(0, 1, 3, 4)  SUM:  8
(0, 1, 3, 5)  SUM:  9
(0, 2, 3, 4)  SUM:  9
(0, 1, 4, 5)  SUM: 10
(0, 2, 3, 5)  SUM: 10
(1, 2, 3, 4)  SUM: 10
(0, 2, 4, 5)  SUM: 11
(1, 2, 3, 5)  SUM: 11
(0, 3, 4, 5)  SUM: 12
(1, 2, 4, 5)  SUM: 12
(1, 3, 4, 5)  SUM: 13
(2, 3, 4, 5)  SUM: 14

Тривиальное решение состоит в том, чтобы сгенерировать все комбинации, затем отсортировать их по их сумме; но это не очень эффективно/возможно, так как количество комбинаций становится огромным, поскольку n растет.

Я также быстро посмотрел на комбинаторные серые коды, но я не мог найти никого подходящего для этой проблемы.

У вас есть идея о том, как реализовать что-то вроде этого?

EDIT:

У этой проблемы есть альтернативная (к сожалению, не простая) формулировка.
Учитывая множество S и число r, все возможные суммы тривиальны, так как они всего лишь числа из суммы первых элементов r из S в сумму последних r элементов S.

Если сказать, что если для каждой суммы T мы можем эффективно найти все комбинации, имеющие сумму T, мы решим исходную задачу, так как просто сгенерируем их в порядке возрастания.

¹ эффективно означает, что я не хочу генерировать все комбинации и отбрасывать те, у которых есть другая сумма.

ИЗМЕНИТЬ 2:

После предложения @EricLippert я создал следующий код:

public static IEnumerable<T[]> 
GetCombinationsSortedByIndexSum<T>(this IList<T> pool, int r)
{
    int n = pool.Count;
    if (r > n)
        throw new ArgumentException("r cannot be greater than pool size");
    int minSum = ((r - 1) * r) / 2;
    int maxSum = (n * (n + 1)) / 2 - ((n - r - 1) * (n - r)) / 2;

    for (int sum = minSum; sum <= maxSum; sum++)
    {
        foreach (var indexes in AllMonotIncrSubseqOfLenMWhichSumToN(0, n - 1, r, sum))
            yield return indexes.Select(x => pool[x]).ToArray();
    }
}

static IEnumerable<IEnumerable<int>> 
AllMonotIncrSubseqOfLenMWhichSumToN(int seqFirstElement, int seqLastElement, int m, int n)
{
    for (int i = seqFirstElement; i <= seqLastElement - m + 1; i++)
    {
        if (m == 1)
        {
            if (i == n)
                yield return new int[] { i };
        }
        else
        {
            foreach (var el in AllMonotIncrSubseqOfLenMWhichSumToN(i + 1, seqLastElement, m - 1, n - i))
                yield return new int[] { i }.Concat(el);
        }
    }
}

Это прекрасно работает (надеюсь, это то, что имел в виду Эрик: P), но меня все еще беспокоит сложность рекурсивного метода. На самом деле кажется, что мы регенерируем все комбинации для каждой суммы, отбрасывая те, которые не суммируются до желаемого значения.

Чтобы уменьшить сложность внутренней функции, я нашел способ ограничить итерации, используя эффективные верхние и нижние границы (и теперь очень сложно сказать, в чем сложность этого).

Отметьте мой ответ, чтобы увидеть окончательный код.

Ответ 1

Решение, которое я имел в виду, было:

using System;
using System.Collections.Generic;
using System.Linq;
class Program
{
  // Preconditions:
  // * items is a sequence of non-negative monotone increasing integers
  // * n is the number of items to be in the subsequence
  // * sum is the desired sum of that subsequence.
  // Result:
  // A sequence of subsequences of the original sequence where each 
  // subsequence has n items and the given sum.
  static IEnumerable<IEnumerable<int>> M(IEnumerable<int> items, int sum, int n)
  {
    // Let start by taking some easy outs. If the sum is negative
    // then there is no solution. If the number of items in the
    // subsequence is negative then there is no solution.

    if (sum < 0 || n < 0)
      yield break;

    // If the number of items in the subsequence is zero then
    // the only possible solution is if the sum is zero.

    if (n == 0)
    {
      if (sum == 0)
        yield return Enumerable.Empty<int>();
      yield break;
    }

    // If the number of items is less than the required number of 
    // items, there is no solution.

    if (items.Count() < n)
      yield break;

    // We have at least n items in the sequence, and
    // and n is greater than zero, so First() is valid:

    int first = items.First();

    // We need n items from a monotone increasing subsequence
    // that have a particular sum. We might already be too 
    // large to meet that requirement:

    if (n * first > sum)
      yield break;

    // There might be some solutions that involve the first element.
    // Find them all.

    foreach(var subsequence in M(items.Skip(1), sum - first, n - 1))
      yield return new[]{first}.Concat(subsequence);      

    // And there might be some solutions that do not involve the first element.
    // Find them all.

    foreach(var subsequence in M(items.Skip(1), sum, n))
      yield return subsequence;
  }
  static void Main()
  {
    int[] x = {0, 1, 2, 3, 4, 5};
    for (int i = 0; i <= 15; ++i)
      foreach(var seq in M(x, i, 4))
        Console.WriteLine("({0}) SUM {1}", string.Join(",", seq), i);
  }
}

Выход - ваш желаемый результат.

Я не пытался оптимизировать это. Было бы интересно просмотреть его и посмотреть, где большая часть времени тратится.

UPDATE: просто для удовольствия я написал версию, которая использует неизменяемый стек вместо произвольного перечислимого. Наслаждайтесь!

using System;
using System.Collections.Generic;
using System.Linq;

abstract class ImmutableList<T> : IEnumerable<T>
{
  public static readonly ImmutableList<T> Empty = new EmptyList();
  private ImmutableList() {}  
  public abstract bool IsEmpty { get; }
  public abstract T Head { get; }
  public abstract ImmutableList<T> Tail { get; }
  public ImmutableList<T> Push(T newHead)
  {
    return new List(newHead, this);
  }  

  private sealed class EmptyList : ImmutableList<T>
  {
    public override bool IsEmpty { get { return true; } }
    public override T Head { get { throw new InvalidOperationException(); } }
    public override ImmutableList<T> Tail { get { throw new InvalidOperationException(); } }
  }
  private sealed class List : ImmutableList<T>
  {
    private readonly T head;
    private readonly ImmutableList<T> tail;
    public override bool IsEmpty { get { return false; } }
    public override T Head { get { return head; } }
    public override ImmutableList<T> Tail { get { return tail; } }
    public List(T head, ImmutableList<T> tail)
    {
      this.head = head;
      this.tail = tail;
    }
  }
  System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
  {
    return this.GetEnumerator();
  }
  public IEnumerator<T> GetEnumerator()
  {
    for (ImmutableList<T> current = this; !current.IsEmpty; current = current.Tail)
      yield return current.Head;
  }
}  

class Program
{
  // Preconditions:
  // * items is a sequence of non-negative monotone increasing integers
  // * n is the number of items to be in the subsequence
  // * sum is the desired sum of that subsequence.
  // Result:
  // A sequence of subsequences of the original sequence where each 
  // subsequence has n items and the given sum.
  static IEnumerable<ImmutableList<int>> M(ImmutableList<int> items, int sum, int n)
  {
    // Let start by taking some easy outs. If the sum is negative
    // then there is no solution. If the number of items in the
    // subsequence is negative then there is no solution.

    if (sum < 0 || n < 0)
      yield break;

    // If the number of items in the subsequence is zero then
    // the only possible solution is if the sum is zero.
    if (n == 0)
    {
      if (sum == 0)
        yield return ImmutableList<int>.Empty;
      yield break;
    }

    // If the number of items is less than the required number of 
    // items, there is no solution.

    if (items.Count() < n)
      yield break;

    // We have at least n items in the sequence, and
    // and n is greater than zero.
    int first = items.Head;

    // We need n items from a monotone increasing subsequence
    // that have a particular sum. We might already be too 
    // large to meet that requirement:

    if (n * first > sum)
      yield break;

    // There might be some solutions that involve the first element.
    // Find them all.

    foreach(var subsequence in M(items.Tail, sum - first, n - 1))
      yield return subsequence.Push(first);      

    // And there might be some solutions that do not involve the first element.
    // Find them all.
    foreach(var subsequence in M(items.Tail, sum, n))
      yield return subsequence;
  }
  static void Main()
  {
    ImmutableList<int> x = ImmutableList<int>.Empty.Push(5).
                           Push(4).Push(3).Push(2).Push(1).Push(0);
    for (int i = 0; i <= 15; ++i)
      foreach(var seq in M(x, i, 4))
        Console.WriteLine("({0}) SUM {1}", string.Join(",", seq), i);
  }
}

Ответ 2

Если в худшем случае вы выбираете 35, выберите 10, что даст 183 579 396 уникальных комбинаций в соответствии с этим калькулятор биномиальных коэффициентов, который является лучшим бесплатным Я нашел в Интернете до сих пор. Большинство современных процессоров должны иметь возможность пройти через это не более секунды или 2 - в зависимости от языка и не считая времени для сортировки. С С++ это, вероятно, будет значительно ниже секунды. Если перейти на С++-маршрут, то вы, вероятно, захотите сделать его dll и вызвать его через платформу invoke (P/I). Есть также некоторые виды, которые обладают превосходной производительностью со списками, которые в основном отсортированы, что похоже на этот случай.

Если в течение секунды все еще слишком медленно, вы можете подумать о предварительном вычислении всех N выбранных случаев K, которые вам нужны, и записать их в файл (после применения сортировки на основе суммы k-индексов) и затем чтение файла (ов) при запуске программы. В зависимости от приложения и того, где он будет размещен, это может быть не слишком практичным, если это относится к платформе Windows CE с ограниченной памятью, например. Но для ПК или другой системы с хорошим объемом свободного места на жестком диске это не должно быть проблемой.

В ответ на ваш вопрос о том, что я имел в виду под "указать индекс в набор":

Я написал класс С#, который может принимать индекс в таблицу отсортированных биномиальных коэффициентов и возвращать соответствующие k-индексы для этого индекса без необходимости повторять все комбинации перед ним. Существует другой метод, который выполняет обратное и возвращает соответствующий индекс (или ранг) для заданных k-индексов. Ранг начинается с нуля, и в вашем примере выше будут указаны k-индексы 0, 1, 2, 3. Ранг 1 будет для k-индексов 0, 1, 2, 4 и т.д. Так, например, в примере 35 выберите 10, если вы знаете, что вам нужны k-индексы всего за 150 000 000, тогда вам не нужно проходить через первый 150M, чтобы получить значения после этого. Вы можете вызвать метод класса и передать 150000000 в качестве индекса, и он вернет k-индексы для этого индекса. Методы очень оптимизированы и основаны на математическом соотношении, которое можно увидеть в Pascal Triangle.

Класс написан на .NET С# и предоставляет способ управления объектами, связанными с проблемой (если есть), используя общий список. Конструктор этого класса принимает значение bool, называемое InitTable, которое, когда true, создаст общий список для хранения объектов, которые будут управляться. Если это значение ложно, то оно не создаст таблицу. Таблицу не нужно создавать, чтобы использовать перечисленные выше методы перевода. Для доступа к таблице предоставляются методы доступа.

Существует связанный тестовый класс, который показывает, как использовать класс и его методы. Он был широко протестирован, по крайней мере, в 2 случаях, и нет известных ошибок.

Чтобы прочитать об этом классе и загрузить код, см. Tablizing the Binomial Coeffieicent.

Следующий проверенный код будет проходить через каждую уникальную комбинацию:

public void Test10Choose5()
{
   String S;
   int Loop;
   int N = 10;  // Total number of elements in the set.
   int K = 5;  // Total number of elements in each group.
   // Create the bin coeff object required to get all
   // the combos for this N choose K combination.
   BinCoeff<int> BC = new BinCoeff<int>(N, K, false);
   int NumCombos = BinCoeff<int>.GetBinCoeff(N, K);
   // The Kindexes array specifies the indexes for a lexigraphic element.
   int[] KIndexes = new int[K];
   StringBuilder SB = new StringBuilder();
   // Loop thru all the combinations for this N choose K case.
   for (int Combo = 0; Combo < NumCombos; Combo++)
   {
      // Get the k-indexes for this combination.  
      BC.GetKIndexes(Combo, KIndexes);
      // Verify that the Kindexes returned can be used to retrive the
      // rank or lexigraphic order of the KIndexes in the table.
      int Val = BC.GetIndex(true, KIndexes);
      if (Val != Combo)
      {
         S = "Val of " + Val.ToString() + " != Combo Value of " + Combo.ToString();
         Console.WriteLine(S);
      }
      SB.Remove(0, SB.Length);
      for (Loop = 0; Loop < K; Loop++)
      {
         SB.Append(KIndexes[Loop].ToString());
         if (Loop < K - 1)
            SB.Append(" ");
      }
      S = "KIndexes = " + SB.ToString();
      Console.WriteLine(S);
   }
}

Убедитесь, что вы используете версию класса GetBinCoeff, которая реализует версию оценки количества комбинаций Mark Dominus. Он использует длинные значения, и код намного реже переполняется.

Ответ 3

Для полноты и ясности я выложу свой окончательный код:

// Given a pool of elements returns all the 
// combinations of the groups of lenght r in pool, 
// such that the combinations are ordered (ascending) by the sum of 
// the indexes of the elements.
// e.g. pool = {A,B,C,D,E} r = 3
// returns
// (A, B, C)   indexes: (0, 1, 2)   sum: 3
// (A, B, D)   indexes: (0, 1, 3)   sum: 4
// (A, B, E)   indexes: (0, 1, 4)   sum: 5
// (A, C, D)   indexes: (0, 2, 3)   sum: 5
// (A, C, E)   indexes: (0, 2, 4)   sum: 6
// (B, C, D)   indexes: (1, 2, 3)   sum: 6
// (A, D, E)   indexes: (0, 3, 4)   sum: 7
// (B, C, E)   indexes: (1, 2, 4)   sum: 7
// (B, D, E)   indexes: (1, 3, 4)   sum: 8
// (C, D, E)   indexes: (2, 3, 4)   sum: 9
public static IEnumerable<T[]>
GetCombinationsSortedByIndexSum<T>(this IList<T> pool, int r)
{
    int n = pool.Count;
    if (r > n)
        throw new ArgumentException("r cannot be greater than pool size");
    int minSum = F(r - 1);
    int maxSum = F(n) - F(n - r - 1);

    for (int sum = minSum; sum <= maxSum; sum++)
    {
        foreach (var indexes in AllSubSequencesWithGivenSum(0, n - 1, r, sum))
            yield return indexes.Select(x => pool[x]).ToArray();
    }
}


// Given a start element and a last element of a sequence of consecutive integers
// returns all the monotonically increasing subsequences of length "m" having sum "sum"
// e.g. seqFirstElement = 1, seqLastElement = 5, m = 3, sum = 8
//      returns {1,2,5} and {1,3,4}
static IEnumerable<IEnumerable<int>>
AllSubSequencesWithGivenSum(int seqFirstElement, int seqLastElement, int m, int sum)
{
    int lb = sum - F(seqLastElement) + F(seqLastElement - m + 1);
    int ub = sum - F(seqFirstElement + m - 1) + F(seqFirstElement);

    lb = Math.Max(seqFirstElement, lb);
    ub = Math.Min(seqLastElement - m + 1, ub);

    for (int i = lb; i <= ub; i++)
    {
        if (m == 1)
        {
            if (i == sum) // this check shouldn't be necessary anymore since LB/UB should automatically exclude wrong solutions
                yield return new int[] { i };
        }
        else
        {
            foreach (var el in AllSubSequencesWithGivenSum(i + 1, seqLastElement, m - 1, sum - i))
                yield return new int[] { i }.Concat(el);
        }
    }
}

// Formula to compute the sum of the numbers from 0 to n
// e.g. F(4) = 0 + 1 + 2 + 3 + 4 = 10
static int F(int n)
{
    return (n * (n + 1)) / 2;
}