c# Getting the text from PDF as copy paste will do
I am trying to get the text from PDF file. I tried using itextsharp and
sautinsoft.pdffocus.
Is ok but i get, but i want to get the same text as i would select all
text from pdf , copy, and paste.
The text i get with pdffocus pastebin.com/Cfcyvne0
The text i get with itextsharp pastebin.com/hsw8Q22i
And the text i get from doing CTRL + A, CTRL + C, CTRL +V in Adobe Reader:
pastebin.com/Rr7WiTJ0
The PDF i want to read:
i.stack.imgur.com/6LKsr.png
If I use copy paste I will get the first big column and then the second,
just like i want to. If I use PDfFocus or itextsharp the text of the
columns are combining.
My code for PDFFocus
SautinSoft.PdfFocus f = new SautinSoft.PdfFocus();
f.OpenPdf("50.pdf");
text = f.ToText();
My code for itextsharp:
PdfReader reader = new PdfReader("50.pdf");
for (int page = 1; page <= reader.NumberOfPages; page++)
{
ITextExtractionStrategy its = new
LocationTextExtractionStrategy();
string s = PdfTextExtractor.GetTextFromPage(reader, page,its);
richTextBox1.Text = s;
} }
What should I Do?
No comments:
Post a Comment