• Monday, May 28, 2012

Previous

Next

Pandoc Converts All Your (Text) Documents

February 23, 2012, 11:00 am

Pandoc conversion networkFor the past few months we ProfHackers have been running an occasional series about using the command line. I got us started with a couple posts explaining why you might want to use the command line and how to get started using it. Konrad followed with a posts about the uniq command and the sort command for working with text and data files. Amy added a post about how the command line let her hack the NOOK Color, and I wrote about using pdftk to manipulate PDFs.

Taking up the command line is easier if you have a specific problem you’re trying to solve. For me, the problem was that I wanted to do all of my writing in a plain text format, like Markdown or LaTeX. But I need to be able to share my writing in a variety of formats: HTML for the web, PDF for printed documents or academic writing, and occasionally RTF or Microsoft Word or OpenOffice.

The best way I’ve found to move between these formats is Pandoc. Pandoc is a command line tool written by a philosophy professor, John MacFarlane. Its general use is to take a document in one format and convert it to another. You can get an idea of the wide variety of formats Pandoc can translate by looking at an enlargement of the header diagram.

Here’s an example of how this works. Suppose that you have a Markdown document like the one we created for the post on Markdown. (View pandoc-example.markdown on GitHub.) You can convert this to a number of text formats with a simple terminal command:

Markdown to HTML (HTML output on GitHub):

pandoc pandoc-example.markdown -o pandoc-example.html

Markdown to LaTeX (LaTeX output on GitHub):

pandoc pandoc-example.markdown -o pandoc-example.tex

Markdown to DOCX:

pandoc pandoc-example.markdown -o pandoc-example.docx

Markdown to PDF (download PDF):

pandoc pandoc-example.markdown -o pandoc-example.pdf

That command calls pandoc, tells it which file to convert (pandoc-example.markdown) and tells it which file to export (e.g., pandoc-example.html). Pandoc figures out what types of files these are from the extension, or you can pass it additional arguments. For some of the formats, you can convert the other way. For example, you could convert LaTex to Markdown or to a Word DOCX, or HTML to Markdown or LaTeX.  To convert to PDF, though, you’ll need to have LaTeX installed on your system.

Another useful thing that Pandoc can do is take a URL and convert the webpage to another format. For example, this command turns a page on my website into Markdown.

pandoc -s -r html http://lincolnmullen.com/writing.html -o test.markdown

You can see many more uses for Pandoc on its example page, and you can try some conversions with its online demo.

There are several pros to using Pandoc. It’s easy to install if you use the binaries for Windows or Mac. (It was a bit of a pain for me to compile from source, but there’s no reason you’d need to do that.) The tool is under active development, so bugs are being fixed and occasionally new formats are added. And there are quite a few advanced things that you can do, like create EPUB e-books and automatically generate citations using citeproc-hs and bibliographies like BibTeX (which you can export from Zotero). There are some conversions that it would be nice if Pandoc could do, but it can’t. For example, Pandoc can turn Markdown into a Word DOCX, but it can’t turn a DOCX into Markdown, HTML, etc., because of the limitations of the DOCX format.

If you do your writing in plain text or a markup format like LaTeX, Pandoc is an essential, everyday tool for moving between formats. And if you occasionally need to turn HTML into other formats, it’s handy to have Pandoc in your toolkit.

Have you tried Pandoc? What uses have you found for it?

This entry was posted in Reviews, Software and tagged , , , , , . Bookmark the permalink.

  • Print
  • Comment
  • reagle

    I’ve used it to write my book, documentation, course materials and slides etc.

    See: http://reagle.org/joseph/blog/career/teaching/fork-merge-share

  • http://twitter.com/wcaleb Caleb McDaniel

    Thanks for posting this! I’m a huge fan of Pandoc, like you, and wrote my entire book manuscript using it. When I wanted to give it to people to read while still working on it, I could produce a nicely typeset PDF or an EPUB for those with iPads. And now that the Press wants it in Word files, that’s no problem either. I also use pandoc and my own LaTeX templates to do my CV, recommendation letters, syllabi … such a great tool.

  • dolllar

    Having come late to the party, I pray there is some way (or plan to create a way) to review all these wonderful profhacker items easily, without backtracking through the whole list?  Some web site or document that has them all organized or at least indexed by topic?  It would make the best nook, if not book, where to find what we need to learn to do that you wizards can do. 

  • http://lincolnmullen.com/ Lincoln Mullen

    @woodycarter: Probably the easiest way to find past materials is to search the archives for your specific need. If you look in the sidebar of any ProfHacker page, you can find a box to search just ProfHacker (and not the entire Chronicle). You can also look at the categories in the sidebar, like “Teaching” or “Software.”

    Another way to find things is to look at the tags at the end of posts you like. For example, this post is tagged “The ProfHacker Guide to the Command Line” right above the comments and below the post. You can click those links to find posts on similar topics.

    http://chronicle.com/blogs/profhacker/tag/the-profhacker-guide-to-the-command-line

    I hope this helps!

  • The Chronicle of Higher Education
  • 1255 Twenty-Third St, N.W.
  • Washington, D.C. 20037