itextsharp - Error when extracting text -
type 'iTextSharp.text.pdf.PdfLiteral' Unable to type to cast object 'iTextSharp.text.pdf .PDFNumber '.
Code: Only get this error with a lot of PDFFF. Any thoughts stack trace:? The page content of your document is completely broken. This is actually very badly broken by Adobe preflight (Acrobat 9.5.4), as iText runs in an error while trying to analyze it. A manual inspection shows that the most obvious errors are associated with an injection TJ operations, such as This is incorrect, cf. Section 7.8.2 of the PDF specification: In PDF, all operands required by an operator must be immediately before that operator. The operator does not return the results, and the operator will not miss the operation when the execution ends. Run it in the StringBinder Text = New StringBuilder (); SimpleTextExtractionStrategyTraffic = New SimpleTextExtractionStrategy (); {Text.AppendLine (PdfTextExtractor.GetTextFromPage (Reader, P, Strategy)) for (; P & lt; = reader.NumberPages; integer P = 1P ++); } Reader. Stop it (); Return text. Toasting ();
iTextSharp.text.pdf.parser.PdfContentStreamProcessor.ShowTextArray.Invoke (PdfContentStreamProcessor Processor, PdfLiteral Oper, List`1 Operand) but on ITextSharp.text.pdf.parser.PdfContentStreamProcessor.InvokeOperator iTextSharp. ITextSharp.text.pdf.parser.PdfContentStreamProcessor.ProcessContent (via textparty), PDF literal operation in ListBytes, on pdfDictionary resources, List `1 Operator) on text.pdf.parser. DCS Comon.pdf Functions. PDF reader content on Gatetext PDF (PDF Reader Reader) content content process [E] (introse page number, e-render listener) iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage (PDF reader reader, intros page number, iTacactitRT strategy strategy ) C: \ Users \ rmaldonado \ Documents \ Visual Studio 2008 \ Projects \ DCS \ Contract \ Common \ PDF \ Functions.cs: \ Users: In line C. DCS.Common.PDF.Functions.ParsePDF (byte [] bytes) 35 \ Rmaldonado \ Document \ Visual Studio Udio 2008 \ Projects \ DCS \ contract \ common \ PDF \ functions CS: line Disisksiapeepiskbielel 23. .Attachment.ReParseText () In: \ Users \ rmaldonado \ Document \ Visual Studio 2008 \ Projects \ DCS \ Contract \ ContractBLL \ Common \ Common.cs: Line 1120
[(OMB) 0.0 Tc -278.0 (Approval) 0.0 Tc -278.0 (2700-0042) TJ [(Array of Operand Array) 0.0 TC -278.0 (Off) 0.0 TC-277.0 (Solicitation / Modification) 0.0 TC-277.0 (Off) 0.0 TC- This pattern continues, that is, every non-trivial [...] TJ injection in operation 0.0 TC Strong> set Alalan is.
PdfContentStreamProcessor.ShowTextArray.Invoke (processing TJ operations) error. As in the circuit array of TJ , there can only be wires and numbers, whichever is nothing,
pdfString, is entered on
pdfNumber But there are examples of TC operator
PdfLiteral.
Comments
Post a Comment