Чтение Excel Open XML игнорирует пустые ячейки

Я использую принятое здесь решение , чтобы преобразовать лист excel в datatable. Это отлично работает, если у меня есть "идеальные" данные, но если у меня есть пустая ячейка в середине моих данных, она, кажется, помещает неправильные данные в каждый столбец.

Я думаю, это потому, что в приведенном ниже коде:

row.Descendants<Cell>().Count()

- количество заполненных ячеек (не все столбцы) И:

GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));

похоже, найдет следующую заполненную ячейку (не обязательно то, что находится в этом индексе), поэтому, если первый столбец пуст и я вызываю ElementAt (0), он возвращает значение во втором столбце.

Вот полный код синтаксического анализа.

DataRow tempRow = dt.NewRow();

for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
{
    tempRow[i] = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));
    if (tempRow[i].ToString().IndexOf("Latency issues in") > -1)
    {
        Console.Write(tempRow[i].ToString());
    }
}

Ответ 1

Это имеет смысл, поскольку Excel не сохранит значение для ячейки, которая равна null. Если вы откроете свой файл с помощью инструмента Openivity SDK 2.0 Productivity Tool и пройдете XML до уровня ячейки, вы увидите, что в этом файле будут находиться только ячейки, у которых есть данные.

Ваши параметры состоят в том, чтобы вставить пустые данные в диапазон ячеек, которые вы собираетесь пересекать, или программно определить, что ячейка была пропущена и соответствующим образом отрегулировать индекс.

Я сделал пример документа excel со строкой в ссылках на ячейки A1 и C1. Затем я открыл документ excel в Инструменте производительности Open XML и вот XML, который был сохранен:

<x:row r="1" spans="1:3" 
   xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
  <x:c r="A1" t="s">
    <x:v>0</x:v>
  </x:c>
  <x:c r="C1" t="s">
    <x:v>1</x:v>
  </x:c>
</x:row>

Здесь вы увидите, что данные соответствуют первой строке и что для этой строки сохраняются только две ячейки данных. Сохраненные данные соответствуют A1 и C1 и не сохраняются ячейки с нулевыми значениями.

Чтобы получить необходимую функциональность, вы можете перемещаться по ячейкам по мере того, как делаете это выше, но вам нужно будет проверить, какое значение ссылается на ячейку, и определить, были ли пропущены какие-либо ячейки. для этого вам понадобится две служебные функции, чтобы получить имя столбца из ссылки на ячейку и затем перевести это имя столбца в индекс на основе нуля:

    private static List<char> Letters = new List<char>() { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', ' ' };

    /// <summary>
    /// Given a cell name, parses the specified cell to get the column name.
    /// </summary>
    /// <param name="cellReference">Address of the cell (ie. B2)</param>
    /// <returns>Column Name (ie. B)</returns>
    public static string GetColumnName(string cellReference)
    {
        // Create a regular expression to match the column name portion of the cell name.
        Regex regex = new Regex("[A-Za-z]+");
        Match match = regex.Match(cellReference);

        return match.Value;
    }

    /// <summary>
    /// Given just the column name (no row index), it will return the zero based column index.
    /// Note: This method will only handle columns with a length of up to two (ie. A to Z and AA to ZZ). 
    /// A length of three can be implemented when needed.
    /// </summary>
    /// <param name="columnName">Column Name (ie. A or AB)</param>
    /// <returns>Zero based index if the conversion was successful; otherwise null</returns>
    public static int? GetColumnIndexFromName(string columnName)
    {
        int? columnIndex = null;

        string[] colLetters = Regex.Split(columnName, "([A-Z]+)");
        colLetters = colLetters.Where(s => !string.IsNullOrEmpty(s)).ToArray();

        if (colLetters.Count() <= 2)
        {
            int index = 0;
            foreach (string col in colLetters)
            {
                List<char> col1 = colLetters.ElementAt(index).ToCharArray().ToList();
                int? indexValue = Letters.IndexOf(col1.ElementAt(index));

                if (indexValue != -1)
                {
                    // The first letter of a two digit column needs some extra calculations
                    if (index == 0 && colLetters.Count() == 2)
                    {
                        columnIndex = columnIndex == null ? (indexValue + 1) * 26 : columnIndex + ((indexValue + 1) * 26);
                    }
                    else
                    {
                        columnIndex = columnIndex == null ? indexValue : columnIndex + indexValue;
                    }
                }

                index++;
            }
        }

        return columnIndex;
    }

Затем вы можете выполнить итерацию по ячейкам и проверить, сравнивается ли ссылка на ячейку с columnIndex. Если он меньше, чем тогда, вы добавляете пустые данные в tempRow, иначе просто читаете значение, содержащееся в ячейке. (Примечание: я не тестировал код ниже, но общая идея должна помочь):

DataRow tempRow = dt.NewRow();

int columnIndex = 0;
foreach (Cell cell in row.Descendants<Cell>())
{
   // Gets the column index of the cell with data
   int cellColumnIndex = (int)GetColumnIndexFromName(GetColumnName(cell.CellReference));

   if (columnIndex < cellColumnIndex)
   {
      do
      {
         tempRow[columnIndex] = //Insert blank data here;
         columnIndex++;
      }
      while(columnIndex < cellColumnIndex);
    }
    tempRow[columnIndex] = GetCellValue(spreadSheetDocument, cell);

    if (tempRow[i].ToString().IndexOf("Latency issues in") > -1)
    {
       Console.Write(tempRow[i].ToString());
    }
    columnIndex++;
}

Ответ 2

Здесь реализована реализация IEnumerable, которая должна выполнять то, что вы хотите, скомпилировать и протестировать.

    ///<summary>returns an empty cell when a blank cell is encountered
    ///</summary>
    public IEnumerator<Cell> GetEnumerator()
    {
        int currentCount = 0;

        // row is a class level variable representing the current
        // DocumentFormat.OpenXml.Spreadsheet.Row
        foreach (DocumentFormat.OpenXml.Spreadsheet.Cell cell in
            row.Descendants<DocumentFormat.OpenXml.Spreadsheet.Cell>())
        {
            string columnName = GetColumnName(cell.CellReference);

            int currentColumnIndex = ConvertColumnNameToNumber(columnName);

            for ( ; currentCount < currentColumnIndex; currentCount++)
            {
                yield return new DocumentFormat.OpenXml.Spreadsheet.Cell();
            }

            yield return cell;
            currentCount++;
        }
    }

Вот функции, на которые он опирается:

    /// <summary>
    /// Given a cell name, parses the specified cell to get the column name.
    /// </summary>
    /// <param name="cellReference">Address of the cell (ie. B2)</param>
    /// <returns>Column Name (ie. B)</returns>
    public static string GetColumnName(string cellReference)
    {
        // Match the column name portion of the cell name.
        Regex regex = new Regex("[A-Za-z]+");
        Match match = regex.Match(cellReference);

        return match.Value;
    }

    /// <summary>
    /// Given just the column name (no row index),
    /// it will return the zero based column index.
    /// </summary>
    /// <param name="columnName">Column Name (ie. A or AB)</param>
    /// <returns>Zero based index if the conversion was successful</returns>
    /// <exception cref="ArgumentException">thrown if the given string
    /// contains characters other than uppercase letters</exception>
    public static int ConvertColumnNameToNumber(string columnName)
    {
        Regex alpha = new Regex("^[A-Z]+$");
        if (!alpha.IsMatch(columnName)) throw new ArgumentException();

        char[] colLetters = columnName.ToCharArray();
        Array.Reverse(colLetters);

        int convertedValue = 0;
        for (int i = 0; i < colLetters.Length; i++)
        {
            char letter = colLetters[i];
            int current = i == 0 ? letter - 65 : letter - 64; // ASCII 'A' = 65
            convertedValue += current * (int)Math.Pow(26, i);
        }

        return convertedValue;
    }

Бросьте его в класс и попробуйте.

Ответ 3

Вот немного измененная версия Waylon answer, которая также основывалась на других ответах. Он инкапсулирует свой метод в класс.

Я изменил

IEnumerator<Cell> GetEnumerator()

IEnumerable<Cell> GetRowCells(Row row)

Здесь класс, вам не нужно его создавать, он просто служит в качестве служебного класса:

public class SpreedsheetHelper
{
    ///<summary>returns an empty cell when a blank cell is encountered
    ///</summary>
    public static IEnumerable<Cell> GetRowCells(Row row)
    {
        int currentCount = 0;

        foreach (DocumentFormat.OpenXml.Spreadsheet.Cell cell in
            row.Descendants<DocumentFormat.OpenXml.Spreadsheet.Cell>())
        {
            string columnName = GetColumnName(cell.CellReference);

            int currentColumnIndex = ConvertColumnNameToNumber(columnName);

            for (; currentCount < currentColumnIndex; currentCount++)
            {
                yield return new DocumentFormat.OpenXml.Spreadsheet.Cell();
            }

            yield return cell;
            currentCount++;
        }
    }

    /// <summary>
    /// Given a cell name, parses the specified cell to get the column name.
    /// </summary>
    /// <param name="cellReference">Address of the cell (ie. B2)</param>
    /// <returns>Column Name (ie. B)</returns>
    public static string GetColumnName(string cellReference)
    {
        // Match the column name portion of the cell name.
        var regex = new System.Text.RegularExpressions.Regex("[A-Za-z]+");
        var match = regex.Match(cellReference);

        return match.Value;
    }

    /// <summary>
    /// Given just the column name (no row index),
    /// it will return the zero based column index.
    /// </summary>
    /// <param name="columnName">Column Name (ie. A or AB)</param>
    /// <returns>Zero based index if the conversion was successful</returns>
    /// <exception cref="ArgumentException">thrown if the given string
    /// contains characters other than uppercase letters</exception>
    public static int ConvertColumnNameToNumber(string columnName)
    {
        var alpha = new System.Text.RegularExpressions.Regex("^[A-Z]+$");
        if (!alpha.IsMatch(columnName)) throw new ArgumentException();

        char[] colLetters = columnName.ToCharArray();
        Array.Reverse(colLetters);

        int convertedValue = 0;
        for (int i = 0; i < colLetters.Length; i++)
        {
            char letter = colLetters[i];
            int current = i == 0 ? letter - 65 : letter - 64; // ASCII 'A' = 65
            convertedValue += current * (int)Math.Pow(26, i);
        }

        return convertedValue;
    }
}

Теперь вы можете получить ячейки всех строк таким образом:

// skip the part that retrieves the worksheet sheetData
IEnumerable<Row> rows = sheetData.Descendants<Row>();
foreach(Row row in rows)
{
    IEnumerable<Cell> cells = SpreedsheetHelper.GetRowCells(row);
    foreach (Cell cell in cells)
    {
         // skip part that reads the text according to the cell-type
    }
}

Он будет содержать все ячейки, даже если они пусты.

Ответ 4

См. мою реализацию:

  Row[] rows = worksheet.GetFirstChild<SheetData>()
                .Elements<Row>()
                .ToArray();

  string[] columnNames = rows.First()
                .Elements<Cell>()
                .Select(cell => GetCellValue(cell, document))
                .ToArray();

  HeaderLetters = ExcelHeaderHelper.GetHeaderLetters((uint)columnNames.Count());

  if (columnNames.Count() != HeaderLetters.Count())
  {
       throw new ArgumentException("HeaderLetters");
  }

  IEnumerable<List<string>> cellValues = GetCellValues(rows.Skip(1), columnNames.Count(), document);

//Here you can enumerate through the cell values, based on the cell index the column names can be retrieved.

HeaderLetters собираются с использованием этого класса:

    private static class ExcelHeaderHelper
    {
        public static string[] GetHeaderLetters(uint max)
        {
            var result = new List<string>();
            int i = 0;
            var columnPrefix = new Queue<string>();
            string prefix = null;
            int prevRoundNo = 0;
            uint maxPrefix = max / 26;

            while (i < max)
            {
                int roundNo = i / 26;
                if (prevRoundNo < roundNo)
                {
                    prefix = columnPrefix.Dequeue();
                    prevRoundNo = roundNo;
                }
                string item = prefix + ((char)(65 + (i % 26))).ToString(CultureInfo.InvariantCulture);
                if (i <= maxPrefix)
                {
                    columnPrefix.Enqueue(item);
                }
                result.Add(item);
                i++;
            }
            return result.ToArray();
        }
    }

И вспомогательные методы:

    private static IEnumerable<List<string>> GetCellValues(IEnumerable<Row> rows, int columnCount, SpreadsheetDocument document)
    {
        var result = new List<List<string>>();
        foreach (var row in rows)
        {
            List<string> cellValues = new List<string>();
            var actualCells = row.Elements<Cell>().ToArray();

            int j = 0;
            for (int i = 0; i < columnCount; i++)
            {
                if (actualCells.Count() <= j || !actualCells[j].CellReference.ToString().StartsWith(HeaderLetters[i]))
                {
                    cellValues.Add(null);
                }
                else
                {
                    cellValues.Add(GetCellValue(actualCells[j], document));
                    j++;
                }
            }
            result.Add(cellValues);
        }
        return result;
    }


private static string GetCellValue(Cell cell, SpreadsheetDocument document)
{
    bool sstIndexedcell = GetCellType(cell);
    return sstIndexedcell
        ? GetSharedStringItemById(document.WorkbookPart, Convert.ToInt32(cell.InnerText))
        : cell.InnerText;
}

private static bool GetCellType(Cell cell)
{
    return cell.DataType != null && cell.DataType == CellValues.SharedString;
}

private static string GetSharedStringItemById(WorkbookPart workbookPart, int id)
{
    return workbookPart.SharedStringTablePart.SharedStringTable.Elements<SharedStringItem>().ElementAt(id).InnerText;
}

Решение касается общих элементов ячейки (индексированные ячейки SST).

Ответ 5

Все хорошие примеры. Вот тот, который я использую, поскольку мне нужно отслеживать все строки, ячейки, значения и заголовки для корреляции и анализа.

Метод ReadSpreadsheet открывает файл xlxs и проходит через каждый рабочий лист, строку и столбец. Поскольку значения хранятся в ссылочной таблице строк, я также явно использую их для каждого листа. Существуют и другие классы: DSFunction и StaticVariables. Последнее содержит значения используемых параметров, например, "quotdouble" (quotdouble = "\ u0022";) и "crlf" (crlf = "\ u000D" + "\ u000A";).

Соответствующий метод DSFunction GetIntColIndexForLetter приведен ниже. Он возвращает целочисленное значение для индекса столбца, соответствующего именам букв, таких как (A, B, AA, ADE и т.д.). Это используется вместе с параметром "ncellcolref", чтобы определить, были ли пропущены какие-либо столбцы и ввести пустые значения строк для каждого из них, которые отсутствуют.

Я также выполняю некоторую очистку значений перед сохранением временно в объекте List (используя метод Replace).

Впоследствии я использую хеш-таблицу (Словарь) имен столбцов для извлечения значений на разных листах, корреляции их, создания нормализованных значений, а затем создания объекта, используемого в нашем продукте, который затем сохраняется как файл XML. Ничего из этого не показано, но почему этот подход используется.

    public static class DSFunction {

    /// <summary>
    /// Creates an integer value for a column letter name starting at 1 for 'a'
    /// </summary>
    /// <param name="lettstr">Column name as letters</param>
    /// <returns>int value</returns>
    public static int GetIntColIndexForLetter(string lettstr) {
        string txt = "", txt1="";
        int n1, result = 0, nbeg=-1, nitem=0;
        try {
            nbeg = (int)("a".ToCharArray()[0]) - 1; //1 based
            txt = lettstr;
            if (txt != "") txt = txt.ToLower().Trim();
            while (txt != "") {
                if (txt.Length > 1) {
                    txt1 = txt.Substring(0, 1);
                    txt = txt.Substring(1);
                }
                else {
                    txt1 = txt;
                    txt = "";
                }
                if (!DSFunction.IsNumberString(txt1, "real")) {
                    nitem++;
                    n1 = (int)(txt1.ToCharArray()[0]) - nbeg;
                    result += n1 + (nitem - 1) * 26;
                }
                else {
                    break;
                }
            }
        }
        catch (Exception ex) {
            txt = ex.Message;
        }
        return result;
    }


}


    public static class Extractor {

    public static string ReadSpreadsheet(string fileUri) {
        string msg = "", txt = "", txt1 = "";
        int i, n1, n2, nrow = -1, ncell = -1, ncellcolref = -1;
        Boolean haveheader = true;
        Dictionary<string, int> hashcolnames = new Dictionary<string, int>();
        List<string> colvalues = new List<string>();
        try {
            if (!File.Exists(fileUri)) { throw new Exception("file does not exist"); }
            using (SpreadsheetDocument ssdoc = SpreadsheetDocument.Open(fileUri, true)) {
                var stringTable = ssdoc.WorkbookPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault();
                foreach (Sheet sht in ssdoc.WorkbookPart.Workbook.Descendants<Sheet>()) {
                    nrow = 0;
                    foreach (Row ssrow in ((WorksheetPart)(ssdoc.WorkbookPart.GetPartById(sht.Id))).Worksheet.Descendants<Row>()) {
                        ncell = 0;
                        ncellcolref = 0;
                        nrow++;
                        colvalues.Clear();
                        foreach (Cell sscell in ssrow.Elements<Cell>()) {
                            ncell++;
                            n1 = DSFunction.GetIntColIndexForLetter(sscell.CellReference);
                            for (i = 0; i < (n1 - ncellcolref - 1); i++) {
                                if (nrow == 1 && haveheader) {
                                    txt1 = "-missing" + (ncellcolref + 1 + i).ToString() + "-";
                                    if (!hashcolnames.TryGetValue(txt1, out n2)) {
                                        hashcolnames.Add(txt1, ncell - 1);
                                    }
                                }
                                else {
                                    colvalues.Add("");
                                }
                            }
                            ncellcolref = n1;
                            if (sscell.DataType != null) {
                                if (sscell.DataType.Value == CellValues.SharedString && stringTable != null) {
                                    txt = stringTable.SharedStringTable.ElementAt(int.Parse(sscell.InnerText)).InnerText;
                                }
                                else if (sscell.DataType.Value == CellValues.String) {
                                    txt = sscell.InnerText;
                                }
                                else txt = sscell.InnerText.ToString();
                            }
                            else txt = sscell.InnerText;
                            if (txt != "") txt1 = txt.ToLower().Trim(); else txt1 = "";
                            if (nrow == 1 && haveheader) {
                                txt1 = txt1.Replace(" ", "");
                                if (txt1 == "table/viewname") txt1 = "tablename";
                                else if (txt1 == "schemaownername") txt1 = "schemaowner";
                                else if (txt1 == "subjectareaname") txt1 = "subjectarea";
                                else if (txt1.StartsWith("column")) {
                                    txt1 = txt1.Substring("column".Length);
                                }
                                if (!hashcolnames.TryGetValue(txt1, out n1)) {
                                    hashcolnames.Add(txt1, ncell - 1);
                                }
                            }
                            else {
                                txt = txt.Replace(((char)8220).ToString(), "'");  //special "
                                txt = txt.Replace(((char)8221).ToString(), "'"); //special "
                                txt = txt.Replace(StaticVariables.quotdouble, "'");
                                txt = txt.Replace(StaticVariables.crlf, " ");
                                txt = txt.Replace("  ", " ");
                                txt = txt.Replace("<", "");
                                txt = txt.Replace(">", "");
                                colvalues.Add(txt);
                            }
                        }
                    }
                }
            }
        }
        catch (Exception ex) {
            msg = "notok:" + ex.Message;
        }
        return msg;
    }





}

Ответ 6

Буквенный код является базовым кодом 26, поэтому он должен работать, чтобы преобразовать его в смещение.

// Converts letter code (i.e. AA) to an offset
public int offset( string code)
{
    var offset = 0;
    var byte_array = Encoding.ASCII.GetBytes( code ).Reverse().ToArray();
    for( var i = 0; i < byte_array.Length; i++ )
    {
        offset += (byte_array[i] - 65 + 1) * Convert.ToInt32(Math.Pow(26.0, Convert.ToDouble(i)));
    }
    return offset - 1;
}

Ответ 7

Хорошо, я не совсем эксперт в этом, но другие ответы кажутся мне больше, чем убийство, поэтому мое решение:

// Loop through each row in the spreadsheet, skipping the header row
foreach (var row in sheetData.Elements<Row>().Skip(1))
{
    var i = 0;
    string[] letters = new string[15] {"A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O" };

    List<String> cellsList = new List<string>();
    foreach (var cell in row.Elements<Cell>().ToArray())
    {
        while (cell.CellReference.ToString()[0] != Convert.ToChar(letters[i]))
        {//accounts for multiple consecutive blank cells
            cellsList.Add("");
            i++;
        }
        cellsList.Add(cell.CellValue.Text);
        i++;
    }

    string[] cells = cellsList.ToArray();

    foreach(var cell in cellsList)
    {
        //display contents of cell, depending on the datatype you may need to call each of the cells manually
    }
}

Надеюсь, что кто-то найдет это полезным!

Ответ 8

Вы можете использовать эту функцию для извлечения ячейки из строки, передающей индекс заголовка:

public static Cell GetCellFromRow(Row r ,int headerIdx) {
        string cellname = GetNthColumnName(headerIdx) + r.RowIndex.ToString();
        IEnumerable<Cell> cells = r.Elements<Cell>().Where(x=> x.CellReference == cellname);
        if (cells.Count() > 0)
        {
            return cells.First();
        }
        else {
            return null;
        }
}
public static string GetNthColumnName(int n)
    {
        string name = "";
        while (n > 0)
        {
            n--;
            name = (char)('A' + n % 26) + name;
            n /= 26;
        }
        return name;
    }

Ответ 9

С извинениями за публикацию еще одного ответа на этот вопрос, здесь код, который я использовал.

У меня возникли проблемы с тем, что OpenXML не работает должным образом, если на листе была пустая строка вверху. Иногда он просто возвращает DataTable с 0 строками и 0 столбцами в нем. Приведенный ниже код справляется с этим и всеми другими рабочими листами.

Вот как вы бы назвали мой код. Просто введите имя файла и имя рабочего листа, чтобы читать:

DataTable dt = OpenXMLHelper.ExcelWorksheetToDataTable("C:\\SQL Server\\SomeExcelFile.xlsx", "Mikes Worksheet");

И вот сам код:

    public class OpenXMLHelper
    {
        //  A helper function to open an Excel file using OpenXML, and return a DataTable containing all the data from one
        //  of the worksheets.
        //
        //  We've had lots of problems reading in Excel data using OLEDB (eg the ACE drivers no longer being present on new servers,
        //  OLEDB not working due to security issues, and blatantly ignoring blank rows at the top of worksheets), so this is a more 
        //  stable method of reading in the data.
        //
        public static DataTable ExcelWorksheetToDataTable(string pathFilename, string worksheetName)
        {
            DataTable dt = new DataTable(worksheetName);

            using (SpreadsheetDocument document = SpreadsheetDocument.Open(pathFilename, false))
            {
                // Find the sheet with the supplied name, and then use that 
                // Sheet object to retrieve a reference to the first worksheet.
                Sheet theSheet = document.WorkbookPart.Workbook.Descendants<Sheet>().Where(s => s.Name == worksheetName).FirstOrDefault();
                if (theSheet == null)
                    throw new Exception("Couldn't find the worksheet: " + worksheetName);

                // Retrieve a reference to the worksheet part.
                WorksheetPart wsPart = (WorksheetPart)(document.WorkbookPart.GetPartById(theSheet.Id));
                Worksheet workSheet = wsPart.Worksheet;

                string dimensions = workSheet.SheetDimension.Reference.InnerText;       //  Get the dimensions of this worksheet, eg "B2:F4"

                int numOfColumns = 0;
                int numOfRows = 0;
                CalculateDataTableSize(dimensions, ref numOfColumns, ref numOfRows);
                System.Diagnostics.Trace.WriteLine(string.Format("The worksheet \"{0}\" has dimensions \"{1}\", so we need a DataTable of size {2}x{3}.", worksheetName, dimensions, numOfColumns, numOfRows));

                SheetData sheetData = workSheet.GetFirstChild<SheetData>();
                IEnumerable<Row> rows = sheetData.Descendants<Row>();

                string[,] cellValues = new string[numOfColumns, numOfRows];

                int colInx = 0;
                int rowInx = 0;
                string value = "";
                SharedStringTablePart stringTablePart = document.WorkbookPart.SharedStringTablePart;

                //  Iterate through each row of OpenXML data, and store each cell value in the appropriate slot in our [,] string array.
                foreach (Row row in rows)
                {
                    for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
                    {
                        //  *DON'T* assume there going to be one XML element for each column in each row...
                        Cell cell = row.Descendants<Cell>().ElementAt(i);
                        if (cell.CellValue == null || cell.CellReference == null)
                            continue;                       //  eg when an Excel cell contains a blank string

                        //  Convert this Excel cell CellAddress into a 0-based offset into our array (eg "G13" -> [6, 12])
                        colInx = GetColumnIndexByName(cell.CellReference);             //  eg "C" -> 2  (0-based)
                        rowInx = GetRowIndexFromCellAddress(cell.CellReference)-1;     //  Needs to be 0-based

                        //  Fetch the value in this cell
                        value = cell.CellValue.InnerXml;
                        if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
                        {
                            value = stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText;
                        }

                        cellValues[colInx, rowInx] = value;
                    }
                }

                //  Copy the array of strings into a DataTable.
                //  We don't (currently) make any attempt to work out which columns should be numeric, rather than string.
                for (int col = 0; col < numOfColumns; col++)
                    dt.Columns.Add("Column_" + col.ToString());

                for (int row = 0; row < numOfRows; row++)
                {
                    DataRow dataRow = dt.NewRow();
                    for (int col = 0; col < numOfColumns; col++)
                    {
                        dataRow.SetField(col, cellValues[col, row]);
                    }
                    dt.Rows.Add(dataRow);
                }

#if DEBUG
                //  Write out the contents of our DataTable to the Output window (for debugging)
                string str = "";
                for (rowInx = 0; rowInx < maxNumOfRows; rowInx++)
                {
                    for (colInx = 0; colInx < maxNumOfColumns; colInx++)
                    {
                        object val = dt.Rows[rowInx].ItemArray[colInx];
                        str += (val == null) ? "" : val.ToString();
                        str += "\t";
                    }
                    str += "\n";
                }
                System.Diagnostics.Trace.WriteLine(str);
#endif
                return dt;
            }
        }

        private static void CalculateDataTableSize(string dimensions, ref int numOfColumns, ref int numOfRows)
        {
            //  How many columns & rows of data does this Worksheet contain ?  
            //  We'll read in the Dimensions string from the Excel file, and calculate the size based on that.
            //      eg "B1:F4" -> we'll need 6 columns and 4 rows.
            //
            //  (We deliberately ignore the top-left cell address, and just use the bottom-right cell address.)
            try
            {
                string[] parts = dimensions.Split(':');     // eg "B1:F4" 
                if (parts.Length != 2)
                    throw new Exception("Couldn't find exactly *two* CellAddresses in the dimension");

                numOfColumns = 1 + GetColumnIndexByName(parts[1]);     //  A=1, B=2, C=3  (1-based value), so F4 would return 6 columns
                numOfRows = GetRowIndexFromCellAddress(parts[1]);
            }
            catch
            {
                throw new Exception("Could not calculate maximum DataTable size from the worksheet dimension: " + dimensions);
            }
        }

        public static int GetRowIndexFromCellAddress(string cellAddress)
        {
            //  Convert an Excel CellReference column into a 1-based row index
            //  eg "D42"  ->  42
            //     "F123" ->  123
            string rowNumber = System.Text.RegularExpressions.Regex.Replace(cellAddress, "[^0-9 _]", "");
            return int.Parse(rowNumber);
        }

        public static int GetColumnIndexByName(string cellAddress)
        {
            //  Convert an Excel CellReference column into a 0-based column index
            //  eg "D42" ->  3
            //     "F123" -> 5
            var columnName = System.Text.RegularExpressions.Regex.Replace(cellAddress, "[^A-Z_]", "");
            int number = 0, pow = 1;
            for (int i = columnName.Length - 1; i >= 0; i--)
            {
                number += (columnName[i] - 'A' + 1) * pow;
                pow *= 26;
            }
            return number - 1;
        }
    }

Ответ 10

Я не могу удержаться от оптимизации подпрограмм из ответа Амурры, чтобы удалить необходимость в Regex's.

Первая функция фактически не нужна, поскольку вторая может принять ссылку на ячейку (C3) или имя столбца (C) (но все же хорошая вспомогательная функция). Индексы также однонаправлены (только потому, что наша реализация использовалась один для того, чтобы строки соответствовали визуально с Excel).

    /// <summary>
    /// Given a cell name, return the cell column name.
    /// </summary>
    /// <param name="cellReference">Address of the cell (ie. B2)</param>
    /// <returns>Column Name (ie. B)</returns>
    /// <exception cref="ArgumentOutOfRangeException">cellReference</exception>
    public static string GetColumnName(string cellReference)
    {
        // Advance from L to R until a number, then return 0 through previous position
        //
        for (int lastCharPos = 0; lastCharPos <= 3; lastCharPos++)
            if (Char.IsNumber(cellReference[lastCharPos]))
                return cellReference.Substring(0, lastCharPos);

        throw new ArgumentOutOfRangeException("cellReference");
    }

    /// <summary>
    /// Return one-based column index given a cell name or column name
    /// </summary>
    /// <param name="columnNameOrCellReference">Column Name (ie. A, AB3, or AB44)</param>
    /// <returns>One based index if the conversion was successful; otherwise null</returns>
    public static int GetColumnIndexFromName(string columnNameOrCellReference)
    {
        int columnIndex = 0;            
        int factor = 1;
        for (int pos = columnNameOrCellReference.Length - 1; pos >= 0; pos--)   // R to L
        {
            if (Char.IsLetter(columnNameOrCellReference[pos]))  // for letters (columnName)
            {
                columnIndex += factor * ((columnNameOrCellReference[pos] - 'A') + 1);
                factor *= 26;
            }
        }
        return columnIndex;
    }

Ответ 11

Вот мое решение. Я обнаружил, что вышеупомянутое, кажется, не работает хорошо, когда пропущенные поля находятся в конце строки.

Предполагая, что первая строка на листе Excel имеет ВСЕ столбцы (через заголовки), затем захватите число столбцов, ожидаемое на строку (строка == 1). Затем выполните цикл по строкам данных (строка> 1). Ключ к обработке отсутствующих ячеек находится в методе getRowCells, где передается известное количество ячеек столбца, а также текущая строка для обработки.

int columnCount = worksheetPart.Worksheet.Descendants<Row>().Where(r => r.RowIndex == 1).FirstOrDefault().Descendants<Cell>().Count();

IEnumerable<Row> rows = worksheetPart.Worksheet.Descendants<Row>().Where(r => r.RowIndex > 1);

List<List<string>> docData = new List<List<string>>();

foreach (Row row in rows)
{
    List<Cell> cells = getRowCells(columnCount, row);

    List<string> rowData = new List<string>();

    foreach (Cell cell in cells)
    {
        rowData.Add(getCellValue(workbookPart, cell));
    }

    docData.Add(rowData);
}

Метод getRowCells имеет текущее ограничение: он поддерживает только лист (строку), который имеет менее 26 столбцов. Цикл, основанный на известном количестве столбцов, используется для поиска отсутствующих столбцов (ячеек). Если найдено, новое значение ячейки вставляется в коллекцию ячеек, причем новая ячейка имеет значение по умолчанию "" вместо "ноль". Модифицированная коллекция Cell затем возвращается.

private static List<Cell> getRowCells(int columnCount, Row row)
{
    const string COLUMN_LETTERS = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";

    if (columnCount > COLUMN_LETTERS.Length)
    {
       throw new ArgumentException(string.Format("Invalid columnCount ({0}).  Cannot be greater than {1}",
                columnCount, COLUMN_LETTERS.Length));
    }

    List<Cell> cells = row.Descendants<Cell>().ToList();

    for (int i = 0; i < columnCount; i++)
    {
       if (i < cells.Count)
       {
           string cellColumnReference = cells.ElementAt(i).CellReference.ToString();
            if (cellColumnReference[0] != COLUMN_LETTERS[i])
            {
                cells.Insert(i, new Cell() { CellValue = new CellValue("") });             }
        }
        else
        {
            cells.Insert(i, new Cell() { CellValue = new CellValue("") });
        }
    }

    return cells;
}

private static string getCellValue(WorkbookPart workbookPart, Cell cell)
{
    SharedStringTablePart stringTablePart = workbookPart.SharedStringTablePart;
    string value = (cell.CellValue != null) ? cell.CellValue.InnerXml : string.Empty;

    if ((cell.DataType != null) && (cell.DataType.Value == CellValues.SharedString))
    {
        return stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText;
    }
    else
    {
        return value;
    }
}

Ответ 12

Чтобы читать пустые ячейки, я использую переменную с именем "CN", назначенную за пределами считывателя строк, и в цикле while я проверяю, больше ли индекс столбца больше или нет из моей переменной, поскольку он увеличивается при каждом чтении ячейки. если это не соответствует, я заполняю свою колонку значением, которое хочу. Это трюк, который я использовал, чтобы догнать пустые ячейки в моем уважаемом значении столбца. Вот код:

public static DataTable ReadIntoDatatableFromExcel(string newFilePath)
        {
            /*Creating a table with 20 columns*/
            var dt = CreateProviderRvenueSharingTable();

            try
            {
                /*using stream so that if excel file is in another process then it can read without error*/
                using (Stream stream = new FileStream(newFilePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
                {
                    using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(stream, false))
                    {
                        var workbookPart = spreadsheetDocument.WorkbookPart;
                        var workbook = workbookPart.Workbook;

                        /*get only unhide tabs*/
                        var sheets = workbook.Descendants<Sheet>().Where(e => e.State == null);

                        foreach (var sheet in sheets)
                        {
                            var worksheetPart = (WorksheetPart)workbookPart.GetPartById(sheet.Id);

                            /*Remove empty sheets*/
                            List<Row> rows = worksheetPart.Worksheet.Elements<SheetData>().First().Elements<Row>()
                                .Where(r => r.InnerText != string.Empty).ToList();

                            if (rows.Count > 1)
                            {
                                OpenXmlReader reader = OpenXmlReader.Create(worksheetPart);

                                int i = 0;
                                int BTR = 0;/*Break the reader while empty rows are found*/

                                while (reader.Read())
                                {
                                    if (reader.ElementType == typeof(Row))
                                    {
                                        /*ignoring first row with headers and check if data is there after header*/
                                        if (i < 2)
                                        {
                                            i++;
                                            continue;
                                        }

                                        reader.ReadFirstChild();

                                        DataRow row = dt.NewRow();

                                        int CN = 0;

                                        if (reader.ElementType == typeof(Cell))
                                        {
                                            do
                                            {
                                                Cell c = (Cell)reader.LoadCurrentElement();

                                                /*reader skipping blank cells so data is getting worng in datatable rows according to header*/
                                                if (CN != 0)
                                                {
                                                    int cellColumnIndex =
                                                        ExcelHelper.GetColumnIndexFromName(
                                                            ExcelHelper.GetColumnName(c.CellReference));

                                                    if (cellColumnIndex < 20 && CN < cellColumnIndex - 1)
                                                    {
                                                        do
                                                        {
                                                            row[CN] = string.Empty;
                                                            CN++;
                                                        } while (CN < cellColumnIndex - 1);
                                                    }
                                                }

                                                /*stopping execution if first cell does not have any value which means empty row*/
                                                if (CN == 0 && c.DataType == null && c.CellValue == null)
                                                {
                                                    BTR++;
                                                    break;
                                                }

                                                string cellValue = GetCellValue(c, workbookPart);
                                                row[CN] = cellValue;
                                                CN++;

                                                /*if any text exists after T column (index 20) then skip the reader*/
                                                if (CN == 20)
                                                {
                                                    break;
                                                }
                                            } while (reader.ReadNextSibling());
                                        }

                                        /*reader skipping blank cells so fill the array upto 19 index*/
                                        while (CN != 0 && CN < 20)
                                        {
                                            row[CN] = string.Empty;
                                            CN++;
                                        }

                                        if (CN == 20)
                                        {
                                            dt.Rows.Add(row);
                                        }
                                    }
                                    /*escaping empty rows below data filled rows after checking 5 times */
                                    if (BTR > 5)
                                        break;
                                }
                                reader.Close();
                            }                            
                        }
                    }
                }
            }
            catch (Exception ex)
            {
                throw ex;
            }
            return dt;
        }

  private static string GetCellValue(Cell c, WorkbookPart workbookPart)
        {
            string cellValue = string.Empty;
            if (c.DataType != null && c.DataType == CellValues.SharedString)
            {
                SharedStringItem ssi =
                    workbookPart.SharedStringTablePart.SharedStringTable
                        .Elements<SharedStringItem>()
                        .ElementAt(int.Parse(c.CellValue.InnerText));
                if (ssi.Text != null)
                {
                    cellValue = ssi.Text.Text;
                }
            }
            else
            {
                if (c.CellValue != null)
                {
                    cellValue = c.CellValue.InnerText;
                }
            }
            return cellValue;
        }

public static int GetColumnIndexFromName(string columnNameOrCellReference)
        {
            int columnIndex = 0;
            int factor = 1;
            for (int pos = columnNameOrCellReference.Length - 1; pos >= 0; pos--)   // R to L
            {
                if (Char.IsLetter(columnNameOrCellReference[pos]))  // for letters (columnName)
                {
                    columnIndex += factor * ((columnNameOrCellReference[pos] - 'A') + 1);
                    factor *= 26;
                }
            }
            return columnIndex;
        }

        public static string GetColumnName(string cellReference)
        {
            /* Advance from L to R until a number, then return 0 through previous position*/
            for (int lastCharPos = 0; lastCharPos <= 3; lastCharPos++)
                if (Char.IsNumber(cellReference[lastCharPos]))
                    return cellReference.Substring(0, lastCharPos);

            throw new ArgumentOutOfRangeException("cellReference");
        }

Код работает для:

Этот код читает пустые ячейки
пропустите пустые строки после завершения чтения.
читать лист с первого по возрастанию
если файл excel используется другим процессом, OpenXML все еще читает это.

Ответ 13

Добавлена еще одна реализация, на этот раз, когда количество столбцов известно заранее:

        /// <summary>
        /// Gets a list cells that are padded with empty cells where necessary.
        /// </summary>
        /// <param name="numberOfColumns">The number of columns expected.</param>
        /// <param name="cells">The cells.</param>
        /// <returns>List of padded cells</returns>
        private static IList<Cell> GetPaddedCells(int numberOfColumns, IList<Cell> cells)
        {
            // Only perform the padding operation if existing column count is less than required
            if (cells.Count < numberOfColumns - 1)
            {
                IList<Cell> padded = new List<Cell>();
                int cellIndex = 0;

                for (int paddedIndex = 0; paddedIndex < numberOfColumns; paddedIndex++)
                {
                    if (cellIndex < cells.Count)
                    {
                        // Grab column reference (ignore row) <seealso cref="https://stackoverflow.com/a/7316298/674776"/>
                        string columnReference = new string(cells[cellIndex].CellReference.ToString().Where(char.IsLetter).ToArray());

                        // Convert reference to index <seealso cref="https://stackoverflow.com/a/848552/674776"/>
                        int indexOfReference = columnReference.ToUpper().Aggregate(0, (column, letter) => (26 * column) + letter - 'A' + 1) - 1;

                        // Add padding cells where current cell index is less than required
                        while (indexOfReference > paddedIndex)
                        {
                            padded.Add(new Cell());
                            paddedIndex++;
                        }

                        padded.Add(cells[cellIndex++]);
                    }
                    else
                    {
                        // Add padding cells when passed existing cells
                        padded.Add(new Cell());
                    }
                }

                return padded;
            }
            else
            {
                return cells;
            }
        }

Вызов с использованием:

IList<Cell> cells = GetPaddedCells(38, row.Descendants<Cell>().ToList());

Где 38 - необходимое количество столбцов.

Ответ 14

Я собираюсь сказать, что внутренняя функция Open Excel, позволяющая пропускать пустые ячейки, абсолютно глупа !! Кто нибудь думал, что этот дизайн хорош? Как & ^% $ вы можете импортировать файл Excel, если не каждая ячейка извлекается в том порядке, в котором она была создана?

Кстати, ни один из обходных путей, представленных здесь, не работает.

Очень надоедливый...

Джерри