Pandoc can convert between numerous markup and word processing formats, including, but not limited to, various flavors of Markdown, HTML, LaTeX and Word docx. For the full lists of input and output formats, see the --from and --to options below. Pandoc can also produce PDF output: see creating a PDF, below.
Pandoc has a modular design: it consists of a set of readers, which parse text in a given format and produce a native representation of the document (an abstract syntax tree or AST), and a set of writers, which convert this native representation into a target format. Thus, adding an input or output format requires only adding a reader or writer. Users can also run custom pandoc filters to modify the intermediate AST.
The format of the input and output can be specified explicitly using command-line options. The input format can be specified using the -f/--from option, the output format using the -t/--to option. Thus, to convert hello.txt from Markdown to LaTeX, you could type:
Supported input and output formats are listed below under Options (see -f for input formats and -t for output formats). You can also use pandoc --list-input-formats and pandoc --list-output-formats to print lists of supported formats.
Note that in some output formats (such as HTML, LaTeX, ConTeXt, RTF, OPML, DocBook, and Texinfo), information about the character encoding is included in the document header, which will only be included if you use the -s/--standalone option.
By default, pandoc will use LaTeX to create the PDF, which requires that a LaTeX engine be installed (see --pdf-engine below). Alternatively, pandoc can use ConTeXt, roff ms, or HTML as an intermediate format. To do this, specify an output file with a .pdf extension, as before, but add the --pdf-engine option or -t context, -t html, or -t ms to the command line. The tool used to generate the PDF from the intermediate format may be specified using --pdf-engine.
You can control the PDF style using variables, depending on the intermediate format used: see variables for LaTeX, variables for ConTeXt, variables for wkhtmltopdf, variables for ms. When HTML is used as an intermediate format, the output can be styled using --css.
To debug the PDF creation, it can be useful to look at the intermediate representation: instead of -o test.pdf, use for example -s -o test.tex to output the generated LaTeX. You can then test it with pdflatex test.tex.
When using LaTeX, the following packages need to be available (they are included with all recent versions of TeX Live): amsfonts, amsmath, lm, unicode-math, iftex, listings (if the --listings option is used), fancyvrb, longtable, booktabs, [multirow] (if the document contains a table with cells that cross multiple rows), graphicx (if the document contains images), bookmark, xcolor, soul, geometry (with the geometry variable set), setspace (with linestretch), and babel (with lang). If CJKmainfont is set, xeCJK is needed. framed is required if code is highlighted in a scheme that use a colored background. The use of xelatex or lualatex as the PDF engine requires fontspec. lualatex uses selnolig and lua-ul. xelatex uses bidi (with the dir variable set). If the mathspec variable is set, xelatex will use mathspec instead of unicode-math. The upquote and microtype packages are used if available, and csquotes will be used for typography if the csquotes variable or metadata field is set to a true value. The natbib, biblatex, bibtex, and biber packages can optionally be used for citation rendering. The following packages will be used to improve output quality if present, but pandoc does not require them to be present: upquote (for straight quotes in verbatim environments), microtype (for better spacing adjustments), parskip (for better inter-paragraph spaces), xurl (for better line breaks in URLs), and footnotehyper or footnote (to allow footnotes in tables).
Extensions can be individually enabled or disabled by appending +EXTENSION or -EXTENSION to the format name. See Extensions below, for a list of extensions and their names. See --list-input-formats and --list-extensions, below.
Extensions can be individually enabled or disabled by appending +EXTENSION or -EXTENSION to the format name. See Extensions below, for a list of extensions and their names. See --list-output-formats and --list-extensions, below.
Write output to FILE instead of stdout. If FILE is -, output will go to stdout, even if a non-textual format (docx, odt, epub2, epub3) is specified. If the output format is chunkedhtml and FILE has no extension, then instead of producing a .zip file pandoc will create a directory FILE and unpack the zip archive there (unless FILE already exists, in which case an error will be raised).
Specify a set of default option settings. FILE is a YAML file whose fields correspond to command-line option settings. All options for document conversion, including input and output files, can be set using a defaults file. The file will be searched for first in the working directory, and then in the defaults subdirectory of the user data directory (see --data-dir). The .yaml extension may be omitted. See the section Defaults files for more information on the file format. Settings from the defaults file may be overridden or extended by subsequent options on the command line.
Shift heading levels by a positive or negative integer. For example, with --shift-heading-level-by=-1, level 2 headings become level 1 headings, and level 3 headings become level 2 headings. Headings cannot have a level less than 1, so a heading that would be shifted below level 1 becomes a regular paragraph. Exception: with a shift of -N, a level-N heading at the beginning of the document replaces the metadata title. --shift-heading-level-by=-1 is a good choice when converting HTML or Markdown documents that use an initial level-1 heading for the document title and level-2+ headings for sections. --shift-heading-level-by=1 may be a good choice for converting Markdown documents that use level-1 headings for sections to HTML, since pandoc uses a level-1 heading to render the document title.
Specify a default extension to use when image paths/URLs have no extension. This allows you to use the same source for formats that require different kinds of images. Currently this option only affects the Markdown and LaTeX readers.
Parse each file individually before combining for multifile documents. This will allow footnotes in different files with the same identifiers to work as expected. If this option is set, footnotes and links will not work across files. Reading binary files (docx, odt, epub) implies --file-scope.
If two or more files are processed using --file-scope, prefixes based on the filenames will be added to identifiers in order to disambiguate them, and internal links will be adjusted accordingly. For example, a header with identifier foo in subdir/file1.txt will have its identifier changed to subdir__file1.txt__foo.
Filters may be written in any language. Text.Pandoc.JSON exports toJSONFilter to facilitate writing filters in Haskell. Those who would prefer to write filters in python can use the module pandocfilters, installable from PyPI. There are also pandoc filter libraries in PHP, perl, and JavaScript/node.js.
Set the metadata field KEY to the value VAL. A value specified on the command line overrides a value specified in the document using YAML metadata blocks. Values will be parsed as YAML boolean or string values. If no value is specified, the value will be treated as Boolean true. Like --variable, --metadata causes template variables to be set. But unlike --variable, --metadata affects the metadata of the underlying document (which is accessible from filters and may be printed in some output formats) and metadata values will be escaped when inserted into the template.
Preserve tabs instead of converting them to spaces. (By default, pandoc converts tabs to spaces before parsing its input.) Note that this will only affect tabs in literal code spans and code blocks. Tabs in regular text are always treated as spaces.
Extract images and other media contained in or linked from the source document to the path DIR, creating it if necessary, and adjust the images references in the document so they point to the extracted files. Media are downloaded, read from the file system, or extracted from a binary container (e.g. docx), as needed. The original file paths are used if they are relative paths not containing ... Otherwise filenames are constructed from the SHA1 hash of the contents.
Specifies a custom abbreviations file, with abbreviations one to a line. If this option is not specified, pandoc will read the data file abbreviations from the user data directory or fall back on a system default. To see the system default, use pandoc --print-default-data-file=abbreviations. The only use pandoc makes of this list is in the Markdown reader. Strings found in this list will be followed by a nonbreaking space, and the period will not produce sentence-ending space in formats like LaTeX. The strings may not contain spaces.
Produce output with an appropriate header and footer (e.g. a standalone HTML, LaTeX, TEI, or RTF file, not a fragment). This option is set automatically for pdf, epub, epub3, fb2, docx, and odt output. For native output, this option causes metadata to be included; otherwise, metadata is suppressed.
Use the specified file as a custom template for the generated document. Implies --standalone. See Templates, below, for a description of template syntax. If no extension is specified, an extension corresponding to the writer will be added, so that --template=special looks for special.html for HTML output. If the template is not found, pandoc will search for it in the templates subdirectory of the user data directory (see --data-dir). If this option is not used, a default template appropriate for the output format will be used (see -D/--print-default-template).
Run pandoc in a sandbox, limiting IO operations in readers and writers to reading the files specified on the command line. Note that this option does not limit IO operations by filters or in the production of PDF documents. But it does offer security against, for example, disclosure of files through the use of include directives. Anyone using pandoc on untrusted user input should use this option.
d3342ee215