Batch Convert PDF to PDF/A

I’m developing a paperless workflow for my home and office. I want to save all my documents in PDF/A-1b archival format so I will be able to open them for years to come. The PDFs should be searchable, meaning they contain not only images of documents, but strings of text. This allows the documents to be indexed so I can quickly find documents when I type in Windows Explorer’s search box.

There are basically three types of documents that need to be archived:

Paper documents. These must be scanned and, in order for them to be searchable, have Optical Character Recognition (OCR) applied. I’ve found OmniPage 18 Standard to be pretty good at this, except for the annoying bug that white-on-black text (often used in column headings of printed documents) disappears.
Non-PDF electronic documents like emails, web pages, etc. These already have text; they just need to be converted to PDF/A. I’ve already blogged about using CutePDF to print these to PDF/A.
PDF documents. Once you opt out of paper statements, your bank, credit card company, telephone company, and utility will give you links to PDF files for download. Your tax software probably saved a PDF file too. You could re-print these to PDF/A using CutePDF, but I chose to write a batch file to quickly convert an existing PDF to PDF/A using Ghostscript. This batch process is the subject of this article.

Set up the Batch Components

Caveat This approach should create valid PDF/A documents, but even among experts, there is some disagreement about the PDF/A standard. Use this approach at your own risk. If you have Adobe Acrobat Professional, you can use its “pre-flight” validation to check the output. Or you may want to try a free online validator like the one at PDF-Tools.com or the one at intarsys.de (German). For more background on the process, see this this superuser article and this Ghostscript bug report.

The underlying technology for this batch file is the same as for the CutePDF process, so if you have already followed the other post, you can skip the identical steps.

1. Download the GNU Affero-licensed version of Ghostscript 9.07 here. I found that the 32-bit version works fine even under 64-bit Windows 7. Install Ghostscript to the default directory, C:\Program Files (x86)\gs\gs9.07. At the end of the install, go ahead and let it Generate cidfmap for Windows CJK TrueType fonts.

Image may be NSFW.
Clik here to view.

2. Create an empty folder on your C: drive called C:\GS_PDFA (Ghostscript PDF/A).

3. Go to Control Panel > System and Security > System. Click on Advanced system settings. Add C:\GS_PDFA to end of the Path statement (System environment variable):

Image may be NSFW.
Clik here to view.

4. Download

PDFAbatch_1.1.zip

and unzip it into C:\GS_PDFA. This will give you three files:

pdfa.cmd – the batch file
PDFA_def.ps – the prefix file for Ghostscript conversion to PDF/A
PDF_ShowBookmarksPanel.ps – a Postscript instruction to tell a PDF reader to show the Bookmarks Panel when opening the document

Note that PDFA_def.sys is the same file described in the CutePDF post, so it’s okay to overwrite it.

5. Locate the path to Ghostscript’s gswin32c.exe on your system. pdfa.cmd assumes it is in C:\Program Files (x86)\gs\gs9.07\bin\. If it is somewhere else, update line 45 of pdfa.cmd to point to the correct path.

6. Download the Adobe ICC profiles here. An ICC profile describes a “color space.” We’ll use the simplest one, Adobe RGB (1998). From the downloaded zip archive, extract AdobeRGB1998.icc to the C:\GS_PDFA folder. Again, this is the same file used in the CutePDF post so it’s okay to overwrite it. (You can use a different profile, e.g. sRGB_IEC61966-2-1_no black_scaling.icc from www.color.org; you’ll need to modify PDFA_def.ps accordingly.)

That’s it! You’re now ready to convert PDF files to PDF/A.

Use the Batch File

Since the batch file is in your path, you should be able to open a command prompt anywhere on your system, type pdfa <filename>, and watch it convert the file to PDF/A. Some notes and advanced usage:

Do not type the .pdf extension on the input parameters. Just type the file name.
If the file name contains spaces, enclose it in quotation marks.
The batch program will rename the input file to .old.pdf and create the PDF/A as .pdf. You can delete the .old.pdf file(s) if you are satisfied with the new PDF/A document.
You can concatenate up to five input PDFs into one output PDF/A. Separate the input file names with spaces.
When conversion finishes, the PDF/A output file will open in the program on your computer that is registered for viewing PDF files (e.g. Adobe Reader).
To set the Initial View of the PDF to show the Bookmarks (outline) panel, set the last parameter to -sb (show bookmarks). The input file must already contain bookmarks. Bookmarks will not work properly when concatenating files because bookmarks copied from later files will point to incorrect page numbers.
Type pdfa by itself to see some usage notes.

Usage

pdfa file1 [file2^|-sb] [file3^|-sb] [file4^|-sb] [file5^|-sb]

Usage Examples

1. If you have a PDF utility bill, open a command prompt where the PDF file resides and use this command:

pdfa “Utility Bill”

Output

Utility Bill.pdf – the PDF/A document
Utility Bill.old.pdf – the original PDF document

2. If you have a credit card statement with two reconciliation reports to attach, use the following command:

pdfa CCstatement recon1 recon2

Output

CCstatement.pdf – the combined PDF/A document
CCstatement.old.pdf
recon1.old.pdf
recon2.old.pdf

3. If you have a tax return that includes bookmarks, use the following command:

pdfa “Tax Return” -sb

Output

Tax Return.pdf – the PDF/A document, should open with bookmarks panel
Tax Return.old.pdf

Batch Convert PDF to PDF/A

Set up the Batch Components

Use the Batch File

Usage

Usage Examples

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112