DevLost

A developer lost in the mountains

Assembling pdf files from different sources - part 1

Let’s suppose you want to create a pdf as a result of a mixture of data and information coming from different sources; and, suppose you want to automatize the creation process.


In my case I had to face a similar challenge when I was asked to assemble a pdf based on the following sources of information:

  • Sharepoint lists
  • Existing pdf files
  • Existing images

Features to be provided included:

  • Easy look&feel upgradability  (covers, header’s and footer’s images)
  • Text modifications of some parts without coding
  • Automatic generation of internal bookmarks
  • Automatic generation of the table of content

What to do then?

After some investigation I got just the thing: PDFsharp & Migradoc library.
It is a double-face c# library, double-face in the meaning that it provides a low level library (PDFsharp) to interact with pdf objects and a high level library (MigraDoc) which adds an abstraction layer that allows to think in terms of sections, headers, chapters, paragraphs etc.

The solution to the scenario described above involved the use of both:

  • PDFsharp to concatenate two or more pdf files
  • MigraDoc to create various sections of the document
  • MigraDoc Document Description Language to modify some text formatting included
  • PDFsharp to add page numbers to the whole document
  • MigraDoc to add internal bookmars
  • MigraDoc to create the table of contents

The strategy adopted was the following:

  1. Create all the sections needed for the document included those which will be populated by external pdf files.
  2. Add headers and footers to the sections created above
  3. Create all the bookmarks needed when creating the sections above; this step is important for the internal links to be correctly resolved and the table of contents to be build accordingly. In fact, if you add content with references to parts of document which at the time are not present, you get an unresolved link error. It does not matter if they are filled with content or they are just blank pages at the moment, what matters is that they exist and are bookmarked.
  4. Fill the remaining parts of the document importing Sharepoint data and mdddl files.
  5. Create the table of contents.
  6. Inject the external pdf files in the sections foreseen above
  7. Add page numbers. This action at the end of all the operations to be sure not to exclude any pages.

Implementation

Let me skip the “getting started” part of PDFsharp& MigraDoc, you can find exhaustive tutorials on the product web site, and jump to the interesting parts.

First of all, prepare the MigraDoc document:

void PrepareDocument()
{
    // Create a new MigraDoc document.
    document = new Document();
    document.Info.Title = "GUIDELINES";
    document.Info.Subject = "Guidelines for working group";
    document.Info.Author = "Ab";
    PageSetup pageSetup = document.DefaultPageSetup.Clone();
    // set orientation
    pageSetup.Orientation = Orientation.Landscape;
    Guide_Creator.Classes.Styles.DefineStyles(document);
}


Then create all default sections you need:

public void AddDefaultSection(string sectionName)
        {
            var section = document.AddSection();
            section.Tag = sectionName;
            section.PageSetup.Orientation = Orientation.Landscape;
            section.PageSetup.TopMargin = "25mm";
            section.PageSetup.BottomMargin = "25mm";
            section.PageSetup.LeftMargin = "25mm";
            section.PageSetup.RightMargin = "25mm";
            section.PageSetup.HeaderDistance = "10mm";
            section.PageSetup.FooterDistance = "15mm";
            section.PageSetup.MirrorMargins = true;
            section.PageSetup.OddAndEvenPagesHeaderFooter = true;
        }


For instance, you can add an Introduction" section with a cover image and a bookmark in this way:

        public void AddIntroSection()
        {
            AddDefaultSection("RetroCoverSection");
            document.LastSection.AddParagraph();

            // add a page break
            document.LastSection.AddPageBreak();

            // Add the "intro" section to the document.
            AddDefaultSection("IntroSection");

            // add cover
            var img = document.LastSection.AddImage("covers\\intro.png");
            img.Height = "21cm";
            img.Width = "29.7cm";
            img.RelativeVertical = RelativeVertical.Page;
            img.RelativeHorizontal = RelativeHorizontal.Page;
            document.LastSection.AddPageBreak();

            // Add a bookmark
            string titleSection = "INTRODUCTION";
            var par = document.LastSection.AddParagraph(titleSection);
            par.Format.Font.Color = Colors.Transparent;
            par.Format.SpaceBefore = "1mm";

            bCreator.AddBookmarkMigraDoc(par, titleSection, titleSection, typeOfBookmark.part);
        }
        
        /// <param name="par"> paragraph</param>
        /// <param name="bookmarkName"> name or key name of the paragraph</param>
        /// <param name="bookmarkTOC">name of the paragraph as it appears in the Table Of Content</param>
        /// <param name="bkType">bookmark type</param>
        public void AddBookmarkMigraDoc(Paragraph par, string bookmarkName, string bookmarkTOC, typeOfBookmark bkType)
        {
            if (bkType == typeOfBookmark.part)
            {
                par.Format.OutlineLevel = OutlineLevel.Level1;
            }
            else if (bkType == typeOfBookmark.chapter)
            {
                par.Format.OutlineLevel = OutlineLevel.Level2;
            }
            else
            {
                par.Format.OutlineLevel = OutlineLevel.BodyText;
            }
            par.AddBookmark(bookmarkName);
            // update list of bookmarks; it is needed when adding internal links
            bookmarkList.Add(new Bookmark(bookmarkName, bookmarkTOC, bkType));
        }

Continue on part 2


Loading