PDF Morph: Features and Technology Stack

By G.N. Shah, Nellaiappan L, Ronald Mueller

This is the second paper in the series describing Macrosoft’s new Quadient Migration tool, PDF Morph (v 1.0). Here we lay out the basic features of the app, along with a brief view of the underlying technology stack. We also describe how it can be used effectively giving the basic process flow of the system.

PDF Morph is an automation tool, intended to greatly accelerate the conversion of PDF templates (derived from other CCM tools in a migration project or based on the day-to-day development work of clients that are using Quadient) to Quadient Inspire. It saves most of the manual work involved in such conversions and provides for much more accurate outputs. In the next paper in this series, we provide estimates of the manual savings achievable with this application. We are currently using PDF Morph in our own Quadient migration and development support practices.

Our Quadient developers have registered and self-configured the application and have started using it directly. We are continuing to add new features and functionality to the application and are planning a new release of the software in March 2021. We are continuing to invest in development of the tool to further automate our Quadient work, thereby providing work outputs to our clients that is faster and allows for much higher throughput.

Download article as PDF

You can now download the entire article as a PDF for free and use for future reference. Click the button and get the PDF version

Feature Description

PDF Morph is a cloud-based tool that assists users to mark and extract data from their PDF files and create the WFD in Inspire Designer. You can extract data from any kind of PDF. The data are extracted based on X, Y coordinates. It helps our Quadient to extract Texts, Font, Barcodes, Image coordinates, etc. from the PDF, and create the corresponding information in Inspire Designer’s WFD.

The basic features of the application include the following. Quadient developers can extract text field data including font information. When identified by the Quadient developers, the system will automatically extract image coordinates and extract Barcode information. The system then generates an Excel file from the marked PDF with data and coordinate information. The system is able to extract data from a single page PDF, multiple page PDF, and is able to support a multi-layout PDF too.

The automation comes in at this point. An Excel file will be generated when the developers clicks ‘Extract Data’ from the marked PDF. The Automation tool will run and read the Excel file, reading row by row and rendering the equivalent flow area in Inspire Designer. At the same time, it will apply the font information extracted from the PDF for the controls in Inspire designer. For image coordinates in Excel, the automation tool will create a container in Inspire Designer at these specific coordinates for a Quadient developer to input the image into Designer.

The Automation tool can be run on any number of PDF files, creating the same number of excel files and rendering all the equivalents in Inspire Designer. The system is thus fully scalable, allowing Quadient developers to quickly convert a whole set of PDF’s into Inspire Designer. This is the step where very significant savings in manual resources and time are achieved.

Process Flow

The flow chart below gives the basics steps for engaging the system. PDF Morph is cloud-based, and new users can register and use the application on their own. The basic functionality of the system allows for multiple concurrent Quadient developers to work simultaneously.

As noted at the bottom of the flow chart, the system automatically logs any errors if any in loading the template designs into Quadient Inspire. Macrosoft will respond to any errors encountered, and provide options for overcoming any of these errors. We have used the system widely in our own migration practice and expect there will be few if any errors encountered.

Technology Stack

The chart below provides details of the technology stack underlying PDF Morph.

We currently use a couple of 3rd party libraries for extracting image coordinates, font information, bar codes, and extracting text along with the associated paragraph information. In the next release of the product in March 2021, we expect these libraries to be useful for other elements we might encounter in a PDF template.

Have a look at our Roadmap for the second version of PDF Morph. it includes a list of additional features we are committed to for this 2nd version. We encourage anyone to send us the features you would be looking for in this product. While we expect to yet add a few more features to version 2.0, we will be certain to consider any suggested new capabilities for later versions.

Walkthrough of User Steps

Here we provide a short walk-through of the main user steps to using PDF Morph.

Step 1

The Quadient developers starts by opening the PDF Morph tool and clicking on Choose PDF File. A file open dialog will be loaded. The Quadient developers needs to select the PDF file.

Step 2

Once the PDF file is loaded, fill in the name and choose the object type (Text; Image; Barcode, other) to select from. Click on Mark Element and mark the section in PDF. The preview will be populated

Step 3

Continue the above process and mark all the areas which need to be exported. Once marking is complete, click on the Process Data button for pushing this request in Queue for generating the WFD file.

Step 4

Navigate to the Dashboard page and refresh to view the progress of the request. Once the file is generated, you will get a notification. The Dashboard provides an option to download the WFD file and Log file with details on the automation process. A sample of the output the Quadient developers gets in Quadient Inspire is shown in the chart below.

Step 5


Once this template is provided in Quadient Inspire, the Quadient developers will need to make any last-minute tweaking of the template. With v 1.0 of the tool, the amount of ‘tweaking’ we are encountering in our migration work is less than 15% of the amount to generate the entire template from scratch in Quadient Inspire.

We are finding the tool to be a huge saver of time and resources. Also, as we move to V 2.0 of the product in March, we expect the amount of tweaking to go down even further, as more and more elements and details of the original PDF are fully and completely transcribed into the Quadient Inspire designer system.

Download article as PDF

You can now download the entire article as a PDF for free and use for future reference. Click the button and get the PDF version

Demonstration


From here, the next step we recommend is a demonstration of the product. This can go two ways. We are happy to demo the product using a sample PDF content we already have. We believe the better way to go here is to use one of your PDF files on hand, and one of our analysts in real-time will go through the conversion of that PDF file to Quadient Inspire.

You will see first-hand in either case how to select the main elements from the PDF file and enter them in the file and from there we show the loading of the data into the Quadient Inspire framework. The good thing about this demo is you will see first-hand how quickly the process goes, and thus how much time and resources you will be saving using this product!

See the next paper in this series for estimates of time and resource savings you can expect by using PDF Morph. These are estimates we have achieved in our own Quadient migration work. While the details of our migration work are client proprietary, we will be happy to run a sample or two of your own PDF’s to show you the time and resource savings!

Share this:

By G.N. Shah, Nellaiappan L, Ronald Mueller | February 16th, 2021 | Quadient Inspire

About the Author

G. N. Shah Chief Technology Officer of Macrosoft

G.N. Shah

Shah is a forward thinking, institutional leader with eighteen years of experience. Throughout his tenure, Shah has delivered top notch customer solutions in large scale and enterprise environments. His proven abilities as a technology visionary and driver of strategic business systems development allow Macrosoft to deliver best in class software solutions. Shah currently holds a compiler patent with the US Patent Office.

Shah holds an MBA (Computer Science), in addition to 20+ professional and technical certifications. While he is proficient in a variety of development languages, his preferred language is Python. His areas of expertise include enterprise-wide architecture, application migration, IT transformation, mobile, and offshore development management. Shah’s ultimate goal, at Macrosoft, is to create a larger offering of product-based services while adopting new technologies.

In that rare instance when he has time for leisure, Shah is an avid cricket and football fan– as well as a weekly racquetball player.

Nellaiappan L Application Delivery Manager at Macrosoft

Nellaiappan L

Nellaiappan is the Application Delivery Manager for Macrosoft's .NET migration team. Subsequently, Nellai leads the Migration Service to migrate client legacy systems to the latest technologies such as .NET. As a PMP certified professional, he has received accolades from clients for his efficient leadership. Incidentally, during his first migration project, Nellai accelerated the migration resulting in a new service offering from Macrosoft. Later, Nellai worked on an array of proprietary migration tools that form the backbone of the Migrations Practice at Macrosoft.

Nellaiappan holds a Masters in Computer Applications (MCA) having an industry experience of 17+ years in developing and leading Windows based Applications using Microsoft Technologies.

Dr. Ronald Mueller CEO of Macrosoft

Ronald Mueller

Ron is CEO and Founder of Macrosoft, Inc. He heads up all company strategic activities and directs day-to-day work of the Leadership Team at Macrosoft. As Macrosoft’s Chief Scientist, Ron defines and structures Macrosoft’s path forward. Ron's focus on new technologies and products, such as Cloud, Big Data, and AI/ML/WFP. Ron has a Ph.D. in Theoretical Physics from New York University and worked in physics for over a decade at Yale University, The Fusion Energy Institute in Princeton, New Jersey, and at Argonne National Laboratory.

Ron also worked at Bell Laboratories in Murray Hill, New Jersey., where he managed a group on Big Data. Ron's work focused around the early work on neural networks. Ron has a career-long passion in ultra-large-scale data processing and analysis including predictive analytics, data mining, machine learning and deep learning.

Recent Blogs

The Microsoft Power Platform Tool Set: A ‘Down-to-Earth’ Primer
Read Blog
Automation of An External Data Feed Extraction and Assembly Process
Read Blog
Microsoft Power Automate Desktop: Capabilities and Instructive Scenarios on How to Use It
Read Blog
Automated Monitoring of Foreign Exchange Rates using RPA
Read Blog
TOP