Edit in GitHubLog an issue

Quickstart for PDF Accessibility Auto-Tag API (.NET)

To get started using Adobe PDF Accessibility Auto Tag API, let's walk through a simple scenario - taking an input PDF document and running PDF Accessibility Auto Tag API against it. Once the PDF has been tagged, we'll provide the document with tags and optionally, a report file. In this guide, we will walk you through the complete process for creating a program that will accomplish this task.

Prerequisites

To complete this guide, you will need:

  • .NET Core: version 6.0 or above
  • .Net SDK
  • A build tool: Either Visual Studio or .NET Core CLI.
  • An Adobe ID. If you do not have one, the credential setup will walk you through creating one.
  • A way to edit code. No specific editor is required for this guide.

Step One: Getting credentials

1) To begin, open your browser to https://documentservices.adobe.com/dc-integration-creation-app-cdn/main.html?api=pdf-extract-api. If you are not already logged in to Adobe.com, you will need to sign in or create a new user. Using a personal email account is recommend and not a federated ID.

Sign in

2) After registering or logging in, you will then be asked to name your new credentials. Use the name, "New Project".

3) Change the "Choose language" setting to ".Net".

4) Also note the checkbox by, "Create personalized code sample." This will include a large set of samples along with your credentials. These can be helpful for learning more later.

5) Click the checkbox saying you agree to the developer terms and then click "Create credentials."

Project setup

6) After your credentials are created, they are automatically downloaded:

alt

Step Two: Setting up the project

1) In your Downloads folder, find the ZIP file with your credentials: PDFServicesSDK-.NetSamples.zip. If you unzip that archive, you will find a README file, your private key, and a folder of samples:

alt

2) We need two things from this download. The private.key file (as shown in the screenshot above, and the pdfservices-api-credentials.json file. You can find this in the adobe-DC.PDFServicesSDK.NET.Samples folder, inside any of the sample subdirectories, so for example, the CombinePDF folder.

alt

3) Take these two files and place them in a new directory.

4) In your new directory, create a new file, ExtractTextInfoFromPDF.csproj. This file will declare our requirements as well as help define the application we're creating.

Copied to your clipboard
1<Project Sdk="Microsoft.NET.Sdk">
2
3 <PropertyGroup>
4 <OutputType>Exe</OutputType>
5 <TargetFramework>netcoreapp3.1</TargetFramework>
6 </PropertyGroup>
7
8 <ItemGroup>
9 <PackageReference Include="log4net" Version="2.0.12" />
10 <PackageReference Include="Adobe.PDFServicesSDK" Version="3.3.0" />
11 </ItemGroup>
12
13 <ItemGroup>
14 <None Update="pdfservices-api-credentials.json">
15 <CopyToOutputDirectory>Always</CopyToOutputDirectory>
16 </None>
17 <None Update="private.key">
18 <CopyToOutputDirectory>Always</CopyToOutputDirectory>
19 </None>
20 <None Update="extractPDFInput.pdf">
21 <CopyToOutputDirectory>Always</CopyToOutputDirectory>
22 </None>
23 <None Update="log4net.config">
24 <CopyToOutputDirectory>Always</CopyToOutputDirectory>
25 </None>
26 </ItemGroup>
27
28</Project>

Our application will take a PDF, Adobe Extract API Sample.pdf (downloadable from here) and extract it's contents. The results will be saved as a ZIP file, ExtractTextInfoFromPDF.zip. We will then parse the results from the ZIP and print out the text of any H1 headers found in the PDF.

5) In your editor, open the directory where you previously copied the credentials and created the csproj file. Create a new file, Program.cs.

Now you're ready to begin coding.

Step Three: Creating the application

1) We'll begin by including our required dependencies:

Copied to your clipboard
1using System.Text.Json;
2using System.IO.Compression;
3using System.IO;
4using System;
5using System.Collections.Generic;
6using log4net.Repository;
7using log4net.Config;
8using log4net;
9using System.Reflection;
10using Adobe.PDFServicesSDK;
11using Adobe.PDFServicesSDK.auth;
12using Adobe.PDFServicesSDK.pdfops;
13using Adobe.PDFServicesSDK.io;
14using Adobe.PDFServicesSDK.exception;
15using Adobe.PDFServicesSDK.options.extractpdf;

2) Now let's define our main class and Main method:

Copied to your clipboard
1namespace ExtractTextInfoFromPDF
2{
3 class Program
4 {
5 private static readonly ILog log = LogManager.GetLogger(typeof(Program));
6 static void Main()
7 {
8 }
9 }
10}

3) Now let's define our input and output:

Copied to your clipboard
1String input = "Adobe Extract API Sample.pdf";
2
3String output = "ExtractTextInfoFromPDF.zip";
4if(File.Exists(Directory.GetCurrentDirectory() + output))
5{
6 File.Delete(Directory.GetCurrentDirectory() + output);
7}

This defines what our output ZIP will be and optionally deletes it if it already exists. Then we define what PDF will be extracted. (You can download the source we used here.) In a real application, these values would be typically be dynamic.

4) Next, we setup the SDK to use our credentials.

Copied to your clipboard
1// Initial setup, create credentials instance.
2Credentials credentials = Credentials.ServiceAccountCredentialsBuilder()
3 .FromFile(Directory.GetCurrentDirectory() + "/pdfservices-api-credentials.json")
4 .Build();
5
6// Create an ExecutionContext using credentials and create a new operation instance.
7ExecutionContext executionContext = ExecutionContext.Create(credentials);

This code both points to the credentials downloaded previously as well as sets up an execution context object that will be used later.

5) Now, let's create the operation:

Copied to your clipboard
1ExtractPDFOperation extractPdfOperation = ExtractPDFOperation.CreateNew();
2
3// Provide an input FileRef for the operation.
4FileRef sourceFileRef = FileRef.CreateFromLocalFile(input);
5extractPdfOperation.SetInputFile(sourceFileRef);
6
7// Build ExtractPDF options and set them into the operation.
8ExtractPDFOptions extractPdfOptions = ExtractPDFOptions.ExtractPDFOptionsBuilder()
9 .AddElementsToExtract(new List<ExtractElementType>(new []{ ExtractElementType.TEXT}))
10 .Build();
11extractPdfOperation .SetOptions(extractPdfOptions);

This set of code defines what we're doing (an Extract operation), points to our local file and specifies the input is a PDF, and then defines options for the Extract call. PDF Extract API has a few different options, but in this example, we're simply asking for the most basic of extractions, the textual content of the document.

6) The next code block executes the operation:

Copied to your clipboard
1// Execute the operation.
2FileRef result = extractPdfOperation.Execute(executionContext);
3
4// Save the result to the specified location.
5result.SaveAs(Directory.GetCurrentDirectory() + output);

This code runs the Extraction process and then stores the result zip to the file system.

7) In this block, we read in the ZIP file, extract the JSON result file, and parse it:

Copied to your clipboard
1ZipArchive archive = ZipFile.OpenRead(Directory.GetCurrentDirectory() + output);
2ZipArchiveEntry jsonEntry = archive.GetEntry("structuredData.json");
3StreamReader osr = new StreamReader(jsonEntry.Open());
4String contents = osr.ReadToEnd();
5
6JsonElement data = JsonSerializer.Deserialize<JsonElement>(contents);

8) Finally we can loop over the result and print out any found element that is an H1:

Copied to your clipboard
1JsonElement elements = data.GetProperty("elements");
2foreach(JsonElement element in elements.EnumerateArray()) {
3 JsonElement pathElement = element.GetProperty("Path");
4 String path = pathElement.GetString();
5 if(path.EndsWith("/H1")) {
6 JsonElement textElement = element.GetProperty("Text");
7 Console.Write(textElement.GetString() +"\n");
8 }
9}

Example running in the command line

Here's the complete application (Program.cs):

Copied to your clipboard
1using System.Text.Json;
2using System.IO.Compression;
3using System.IO;
4using System;
5using System.Collections.Generic;
6using log4net.Repository;
7using log4net.Config;
8using log4net;
9using System.Reflection;
10using Adobe.PDFServicesSDK;
11using Adobe.PDFServicesSDK.auth;
12using Adobe.PDFServicesSDK.pdfops;
13using Adobe.PDFServicesSDK.io;
14using Adobe.PDFServicesSDK.exception;
15using Adobe.PDFServicesSDK.options.extractpdf;
16
17namespace ExtractTextInfoFromPDF
18{
19 class Program
20 {
21 private static readonly ILog log = LogManager.GetLogger(typeof(Program));
22 static void Main()
23 {
24 // Configure the logging.
25 ConfigureLogging();
26 try
27 {
28
29 String input = "Adobe Extract API Sample.pdf";
30
31 String output = "ExtractTextInfoFromPDF.zip";
32 if(File.Exists(Directory.GetCurrentDirectory() + output))
33 {
34 File.Delete(Directory.GetCurrentDirectory() + output);
35 }
36
37 // Initial setup, create credentials instance.
38 Credentials credentials = Credentials.ServiceAccountCredentialsBuilder()
39 .FromFile(Directory.GetCurrentDirectory() + "/pdfservices-api-credentials.json")
40 .Build();
41
42 // Create an ExecutionContext using credentials and create a new operation instance.
43 ExecutionContext executionContext = ExecutionContext.Create(credentials);
44 ExtractPDFOperation extractPdfOperation = ExtractPDFOperation.CreateNew();
45
46 // Provide an input FileRef for the operation.
47 FileRef sourceFileRef = FileRef.CreateFromLocalFile(input);
48 extractPdfOperation.SetInputFile(sourceFileRef);
49
50 // Build ExtractPDF options and set them into the operation.
51 ExtractPDFOptions extractPdfOptions = ExtractPDFOptions.ExtractPDFOptionsBuilder()
52 .AddElementsToExtract(new List<ExtractElementType>(new []{ ExtractElementType.TEXT}))
53 .Build();
54 extractPdfOperation .SetOptions(extractPdfOptions);
55
56 // Execute the operation.
57 FileRef result = extractPdfOperation.Execute(executionContext);
58
59 // Save the result to the specified location.
60 result.SaveAs(Directory.GetCurrentDirectory() + output);
61
62 Console.Write("Successfully extracted information from PDF. Printing H1 Headers:\n\n");
63
64 ZipArchive archive = ZipFile.OpenRead(Directory.GetCurrentDirectory() + output);
65 ZipArchiveEntry jsonEntry = archive.GetEntry("structuredData.json");
66 StreamReader osr = new StreamReader(jsonEntry.Open());
67 String contents = osr.ReadToEnd();
68
69 JsonElement data = JsonSerializer.Deserialize<JsonElement>(contents);
70 JsonElement elements = data.GetProperty("elements");
71 foreach(JsonElement element in elements.EnumerateArray()) {
72 JsonElement pathElement = element.GetProperty("Path");
73 String path = pathElement.GetString();
74 if(path.EndsWith("/H1")) {
75 JsonElement textElement = element.GetProperty("Text");
76 Console.Write(textElement.GetString() +"\n");
77 }
78 }
79
80
81 }
82 catch (ServiceUsageException ex)
83 {
84 log.Error("Exception encountered while executing operation", ex);
85 }
86 catch (ServiceApiException ex)
87 {
88 log.Error("Exception encountered while executing operation", ex);
89 }
90 catch (SDKException ex)
91 {
92 log.Error("Exception encountered while executing operation", ex);
93 }
94 catch (IOException ex)
95 {
96 log.Error("Exception encountered while executing operation", ex);
97 }
98 catch (Exception ex)
99 {
100 log.Error("Exception encountered while executing operation", ex);
101 }
102 }
103
104 static void ConfigureLogging()
105 {
106 ILoggerRepository logRepository = LogManager.GetRepository(Assembly.GetEntryAssembly());
107 XmlConfigurator.Configure(logRepository, new FileInfo("log4net.config"));
108 }
109 }
110}

Next Steps

Now that you've successfully performed your first operation, review the documentation for many other examples and reach out on our forums with any questions. Also remember the samples you downloaded while creating your credentials also have many demos.

Was this helpful?
  • Privacy
  • Terms of Use
  • Do not sell or share my personal information
  • AdChoices
Copyright © 2023 Adobe. All rights reserved.