Posted by: R Manimaran | August 3, 2010

Programatically get Count – Pages/Slides/Sheet in Microsoft Word,Powerpoint,Excel 2007

Office Open XML is a zipped, XML-based file format developed by Microsoft for representing spreadsheets, charts, presentations and word processing documents.

.docx,.xlsx  and .pptx extension files are Zipped file contains XML data. To do

Get a docx or xlsx or pptx file and rename the extension to .zip.

Now extract the zip file to a folder.

The extracted folder contains the following folders.

Inside the docProps folder, we have two xml files

Inside the app.xml, we have the document information like No.of. Pages, Paragraphs & so on.

We will have the same structure for the xlsx and pptx files.

We can get the properties using the OpenXML

  1. Add reference to System.IO.Packaging
    • When I search for the above namespace I can’t able to find it. It is available in WindowsBase namespace.
  2. Add reference to WindowsBase
  3. Add the using statements
  • using System.IO.Packaging;
  • using System.Xml;
  • using System.Xml.XPath;

Declare the following Enum

public enum Document

{

DOCX,

XLSX,

PPTX

}

Create the following method

public static void CountPagesUsingOpenXML(Document fileType,string fileName)

{

FileInfo file = new FileInfo(fileName);

if (file.Extension.ToUpper() != “DOCX” ||

file.Extension.ToUpper() != “XLSX” ||

file.Extension.ToUpper() != “PPTX“)

return;

XmlDocument doc=new XmlDocument();

Package wordDoc = Package.Open(fileName);

Uri uriData = new Uri(“/docProps/app.xml“, UriKind.Relative);

PackagePart part = wordDoc.GetPart(uriData);

doc.Load(part.GetStream());

XmlNamespaceManager nsMgr = new XmlNamespaceManager(doc.NameTable);

nsMgr.AddNamespace(“vt“, “http://schemas.openxmlformats.org/officeDocument/2006/docPropsVTypes“);

nsMgr.AddNamespace(“def“, “http://schemas.openxmlformats.org/officeDocument/2006/extended-properties“);

string count=string.Empty;

switch (fileType)

{

case Document.DOCX:

XmlNode node1 = doc.SelectSingleNode(“/def:Properties/def:Pages“,nsMgr);

count = node1.InnerXml;

Console.WriteLine(“No. Of pages in Word document:” + count);

break;

case Document.XLSX:

System.Xml.XmlNode node = doc.SelectSingleNode(“/def:Properties/def:TitlesOfParts/vt:vector“, nsMgr);

if (node != null)

count=node.Attributes[“size“].Value;

Console.WriteLine(“No. Of Sheets in Excel:” + count);

break;

case Document.PPTX:

count = doc.SelectSingleNode(“/def:Properties/def:Slides“,nsMgr).InnerXml;

Console.WriteLine(“No. Of Slides in Powerpoint:” + count);

break;

}

}

Advertisements

Responses

  1. Is it work if we have changed document.xml in .docx? don’t think so. Can I count document total pages numberanother way? without getting this property.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Categories

%d bloggers like this: