I finally stumbled across an example matplotlibrc, and after some searching found two key settings: pdf.fonttype and ps.fonttype. You have to change these settings from the default of 3 to the alternative 42. You can do this in your matplotlibrc file with:
This causes matplotlib to use Type 42 (a.k.a. TrueType) fonts for PostScript and PDF files. This allows you to avoid Type 3 fonts without limiting yourself to the stone-age technology of Type 1 fonts.
to force matplotlib to produce Type 1 fonts. However, this caused some of the fonts to look quite different in the plots, and also garbled some of the text where my strings conflicted with TeX syntax.
The python visualization world can be a frustrating place for a new user. Thereare many different options and choosing the right one is a challenge.For example, even after 2 years, this article is one of the top posts thatlead people to this site. In that article, I threw some shade at matplotliband dismissed it during the analysis. However, after using tools such as pandas,scikit-learn, seaborn and the rest of the data science stack in python - Ithink I was a little premature in dismissing matplotlib. To be honest, Idid not quite understand it and how to use it effectively in my workflow.
Now that I have taken the time to learn some of these tools and how to use them withmatplotlib, I have started to see matplotlib as an indispensable tool.This post will show how I use matplotlib and provide some recommendations for users gettingstarted or users who have not taken the time to learn matplotlib. I do firmly believematplotlib is an essential part of the python data science stack and hope thisarticle will help people understand how to use it for their own visualizations.
The reason two interfaces cause confusion is that in the world of stack overflowand tons of information available via google searches, new users will stumbleacross multiple solutions to problems that look somewhat similar but are not the same.I can speak from experience. Looking back on some of my old code, I can tell that thereis a mishmash of matplotlib code - which is confusing to me (even if I wrote it).
Another historic challenge with matplotlib is that some of the default stylechoices were rather unattractive. In a world where R could generate some reallycool plots with ggplot, the matplotlib options tended to look a bit ugly incomparison. The good news is that matplotlib 2.0 has much nicer styling capabilitiesand ability to theme your visualizations with minimal effort.
Despite some of these issues, I have come to appreciate matplotlib because itis extremely powerful. The library allows you to create almost any visualizationyou could imagine. Additionally, there is a rich ecosystem of python toolsbuilt around it and many of the more advanced visualization tools use matplotlib asthe base library. If you do any work in the python data science stack, you willneed to develop some basic familiarity with how to use matplotlib. That isthe focus of the rest of this post - developing a basic approach for effectivelyusing matplotlib.
The other benefit of this knowledge is that you have a starting point when yousee things on the web. If you take the time to understand this point, the restof the matplotlib API will start to make sense. Also, many of the advanced pythonpackages like seaborn and ggplot rely on matplotlib so understanding the basicswill make those more powerful frameworks much easier to learn.
The rest of this post will be a primer on how to do the basic visualizationcreation in pandas and customize the most common items using matplotlib. Onceyou understand the basic process, further customizations are relatively straightforward.
Up until now, I have been relying on the jupyter notebook to display the figuresby virtue of the %matplotlib inline directive. However, there are goingto be plenty of times where you have the need to save a figure in a specific formatand integrate it with some other presentation.
Hopefully this process has helped you understand how to more effectively usematplotlib in your daily data analysis. If you get in the habit of using this approachwhen doing your analysis, you should be able to quickly find out how to do whatever youneed to do to customize your plot.
I have been having fun with Python and generating tables and other text based outputs from Splunk but now want to generate charts and graphs. I've beating my head against the wall for two weeks trying to understand how to properly install Matplotlib and numpy. Everywhere I look I'm told to use pip but I can't find a version of pip to install that doesn't appear to require pip itself. I'm not in a position where I can build and compile the source code for pip, numpy, matplotlib or any other product.
One major feature of the IPython kernel is the ability to display plots thatare the output of running code cells. The IPython kernel is designed to workseamlessly with the matplotlib plotting library to provide this functionality.
To set this up, before any plotting or import of matplotlib is performed youmust execute the %matplotlib magic command. Thisperforms the necessary behind-the-scenes setup for IPython to work correctlyhand in hand with matplotlib; it does not, however, actually execute anyPython import commands, that is, no names are added to the namespace.
If the %matplotlib magic is called without an argument, theoutput of a plotting command is displayed using the default matplotlibbackend in a separate window. Alternatively, the backend can be explicitlyrequested using, for example:
matplotlib is a Python-based plotting library with full support for 2D and limited support for 3D graphics, widely used inthe Python scientific computing community. The library targets a broad range ofuse cases. It can embed graphics in the user interface toolkit ofyour choice, and currently supports interactive graphics on all majordesktop operating systems using the GTK+, Qt, Tk, FLTK, wxWidgets andCocoa toolkits. It can be called interactively from theinteractive Python shell to produce graphics with simple, proceduralcommands, much like Mathematica, IDL orMATLAB. matplotlib can also be embedded in a headlesswebserver to provide hardcopy in both raster-based formats likePortable Network Graphics (PNG) and vector formats likePostScript, Portable Document Format (PDF) and Scalable VectorGraphics (SVG) that look great on paper.
matplotlib's origin dates to an attempt by one of us (John Hunter) tofree himself and his fellow epilepsy researchers from a proprietarysoftware package for doing electrocorticography (ECoG) analysis. Thelaboratory in which he worked had only one license for the software,and the various graduate students, medical students, postdocs, interns,and investigators took turns sharing the hardware key dongle.MATLAB is widely used in the biomedical community fordata analysis and visualization, so Hunter set out, with some success,to replace the proprietary software with a MATLAB-based version thatcould be utilized and extended by multiple investigators. MATLAB,however, naturally views the world as an array of floating pointnumbers, and the complexities of real-world hospital records forepilepsy surgery patients with multiple data modalities (CT, MRI,ECoG, EEG) warehoused on different servers pushed MATLAB to its limitsas a data management system. Unsatisfied with the suitability ofMATLAB for this task, Hunter began working on a new Python applicationbuilt on top of the user interface toolkit GTK+, which was at the timethe leading desktop windowing system for Linux.
matplotlib was thus originally developed as an EEG/ECoG visualizationtool for this GTK+ application, and this use case directed itsoriginal architecture. matplotlib was originally designed to serve asecond purpose as well: as a replacement for interactive command-driven graphics generation, something that MATLAB does very well. TheMATLAB design makes the simple task of loading a data file andplotting very straightforward, where a full object-oriented API wouldbe too syntactically heavy. So matplotlib also provides a statefulscripting interface for quick and easy generation of graphics similarto MATLAB's. Because matplotlib is a library, users have access toall of the rich built-in Python data structures such as lists,dictionaries, sets and more.
The top-level matplotlib object that contains and manages all of theelements in a given graphic is called the Figure. One of thecore architectural tasks matplotlib must solve is implementing aframework for representing and manipulating the Figure thatis segregated from the act of rendering the Figure to a userinterface window or hardcopy. This enables us to build increasinglysophisticated features and logic into the Figures, whilekeeping the "backends", or output devices, relatively simple.matplotlib encapsulates not just the drawing interfaces to allowrendering to multiple devices, but also the basic eventhandling and windowing of most popular user interface toolkits.Because of this, users can create fairly rich interactive graphicsand toolkits incorporating mouse and keyboard input that can beplugged without modification into the six user interface toolkits we support.
For a user interface toolkit such as Qt, the FigureCanvas has aconcrete implementation which knows how to insert itself into a nativeQt window (QtGui.QMainWindow), transfer the matplotlib Renderercommands onto the canvas (QtGui.QPainter), and translate nativeQt events into the matplotlib Event framework, which signals thecallback dispatcher to generate the events so upstream listeners canhandle them. The abstract base classes reside inmatplotlib.backend_bases and all of the derived classes livein dedicated modules like matplotlib.backends.backend_qt4agg.For a pure image backend dedicated to producing hardcopy output likePDF, PNG, SVG, or PS, the FigureCanvas implementation mightsimply set up a file-like object into which the default headers,fonts, and macro functions are defined, as well as the individualobjects (lines, text, rectangles, etc.) that the Renderer creates.
df19127ead