Как написать оболочку script для нахождения количества страниц в PDF?

Я создаю PDF динамически. Как проверить количество страниц в PDF с помощью оболочки script?

Ответ 1

Без дополнительного пакета:

foo=$(strings < pdffile.pdf | sed -n 's|.*/Count -\{0,1\}\([0-9]\{1,\}\).*|\1|p' | sort -rn | head -n 1)

Использование pdfinfo:

foo=$(pdfinfo pdffile.pdf | grep Pages | awk '{print $2}')

Использование pdftk:

foo=$(pdftk pdffile.pdf dump_data|grep NumberOfPages| awk '{print $2}')

Ответ 2

Библиотека imagemagick предоставляет инструмент, называемый идентификатором, который в сочетании с подсчетом строк вывода получает вас, что вы после... imagemagick - простая установка на osx с brew.

Вот функциональный bash script, который захватывает его переменной оболочки и выгружает обратно на экран...

#/bin/bash
pdfFile=$1
echo "Processing $pdfFile"
numberOfPages=$(/usr/local/bin/identify "$pdfFile" 2>/dev/null | wc -l | tr -d ' ')
#Identify gets info for each page, dump stderr to dev null
#count the lines of output
#trim the whitespace from the wc -l outout
echo "The number of pages is: $numberOfPages"

И результат его запуска...

$ ./countPages.sh aSampleFile.pdf 
Processing aSampleFile.pdf
The number of pages is: 2
$

Ответ 3

Утилита pdftotext преобразует PDF файл в текстовый формат, вставляя разрывы страниц между страницами. (aka: символы формы $'\f'):

NAME
       pdftotext - Portable Document Format (PDF) to text converter.

SYNOPSIS
       pdftotext [options] [PDF-file [text-file]]

DESCRIPTION
       Pdftotext converts Portable Document Format (PDF) files to plain text.

       Pdftotext  reads  the PDF file, PDF-file, and writes a text file, text-file.  If text-file is
       not specified, pdftotext converts file.pdf to file.txt.  If text-file is  ´-',  the  text  is
       sent to stdout.

Существует множество комбинаций для решения вашей проблемы, выберите один из них:

1) pdftotext + grep:

$ pdftotext file.pdf - | grep -c $'\f'

2) pdftotext + awk (v1):

$ pdftotext file.pdf - | awk 'BEGIN{n=0} {if(index($0,"\f")){n++}} END{print n}'

3) pdftotext + awk (v2):

$ pdftotext sample.pdf - | awk 'BEGIN{ RS="\f" } END{ print NR }'

4) pdftotext + awk (v3):

$ pdftotext sample.pdf - | awk -v RS="\f" 'END{ print NR }'

Надеюсь, что это поможет!

Ответ 4

Просто выкопал старый script (в ksh), который я нашел:

#!/usr/bin/env ksh
# Usage: pdfcount.sh file.pdf
#
# Optimally, this would be a mere:
#       pdfinfo file.pdf | grep Pages | sed 's/[^0-9]*//'

[[ "$#" != "1" ]] && {
   printf "ERROR: No file specified\n"
   exit 1
}

numpages=0
while read line; do
   num=${line/*([[:print:]])+(Count )?(-)+({1,4}(\d))*([[:print:]])/\4}
   (( num > numpages)) && numpages=$num
done < <(strings "[email protected]" | grep "/Count")
print $numpages

Ответ 5

Вот версия для командной строки напрямую (на основе pdfinfo):

for f in *.pdf; do pdfinfo "$f" | grep Pages | awk '{print $2}'; done