Check this, may need to be a block quote: The readxl package makes it easy to import tabular data from Excel spreadsheets (.xls or .xlsx files) and includes several options for cleaning data during import. readxl has no external dependencies and functions on any operating system, making it an OS- and user-friendly package that simplifies getting your data from Excel into R.
The readxl package makes it easy to get tabular data out of Excel files and into R with code, not mouse clicks. It supports both the legacy .xls format and the modern XML-based .xlsx format. readxl is expressly designed to be easy to install and use on all operating systems. Therefore it has no external dependencies, such as Java or Perl, which have historically been a source of aggravation with some R packages that read Excel files.
readxl was last released almost exactly one year ago, in April 2017, at version 1.0.0. The associated blog post summarizes many nifty new features in version 1.0.0. In contrast, version 1.1.0 is considerably less exciting for most users, but includes two important improvements:
readxl embeds the library libxls in order to read xls files. Three security vulnerabilities have been identified in libxls and were shown to affect readxl (CVE = Common Vulnerabilities and Exposures):
The recent maintenance of libxls by Evan Miller (@evanmiller) is a very positive development and has allowed us to close many readxl issues related to crashes or other unsavory behaviour seen when reading specific xls files.
The next release will also have a breaking-ish change around name repair. readxl will switch to tibble::set_tidy_names(), which remediates missing and duplicate variable names. I will make this change soon in the dev version, so that interested users can begin to adjust.
I am very thankful to the maintainers of the embedded libraries, especially the recent work on libxls by David Hoerl (@dhoerl) and Evan Miller (@evanmiller). readxl includes a great deal of compiled code, from disparate sources, and Jim Hester (@jimhester) is a fantastic troubleshooter. David Hood (@thoughtfulbloke) delivered some delightful Clippy photos (featured here), in response to an absurd Twitter request.
A big thanks goes out to the 86 users who contributed issues and pull requests since the previous readxl release: @aaa34169, @afdta, @alexeyknorre, @alexhallam, @anjurad, @arnyeinstein, @batpigandme, @bbrewington, @bellafeng, @burchill, @chrisholbrook, @Courvoisier13, @danielsjf, @DavisVaughan, @dchiu911, @Deepu298, @dpprdan, @ea-guerette, @espinielli, @gergness, @gp2x, @heseber, @hlynurhallgrims, @hrecht, @huftis, @hughmarera, @iiLaurens, @ilpepe, @Ironholds, @jameshowison, @jamesLSI, @jcolomb, @jebyrnes, @jekriske-lilly, @jennybc, @jeroen, @jimhester, @jjcad, @Jmarks199, @KKPMW, @krlmlr, @kwstat, @KyleHaynes, @Lu2017, @lz1nwm, @m-macaskill, @mbeer, @mdbauer, @melikovk, @MichaelChirico, @MidhunT, @MikhailLagutin, @mplatzer, @msgoussi, @nacnudus, @nealrichardson, @nick-ulle, @nortonle, @oozdmr, @PMassicotte, @ramanan82, @reinderien, @reinierv4, @robbriers, @RobertMyles, @rsbivand, @rstub, @ruaridhw, @sebastianschweer, @shoebodh, @simonthelwall, @slfan2013, @smasuda, @sncr-github, @stephlocke, @steve4444, @sz-cgt, @t-kalinowski, @tarunparmar, @tbeu, @thothal, @tomsing1, @tres-pitt, @vkapartzianis, @willtudorevans, and @zauster.
readxl now embeds libxls v1.6.2 (the previous release embedded v1.5.0). The libxls project is maintained by Evan Miller and is hosted at , where you can read more in its release notes. These accumulated releases fix a number of edge cases, allowing readxl to read even more weird and wonderful .xls files.
Thanks to the 103 people who have contributed to readxl since we last blogged about it (upon the release of version 1.2.0 in December 2018) by reporting bugs and suggesting new features: @abcdef123ghi, @acvelozo, @ahbon123, @ajit555, @artinmg, @aswansyahputra, @averiperny, @batpigandme, @ben1787, @benmatthewsed, @benwatsoncpa, @benzipperer, @bhive01, @bjorn81, @boshek, @brkbrc, @Brunox13, @cderv, @DavisVaughan, @ddekadt, @dkgaraujo, @donnekgit, @druedin, @dxbhans, @elephann, @eringrand, @estern95, @fary90, @fermumen, @fndemarqui, @gaborcsardi, @gbganalyst, @ghost, @hadley, @hammao, @hannes101, @hddao, @hidekoji, @HughParsonage, @idontgetoutmuch, @j-sirgo, @jennybc, @jeromyanglim, @jimhester, @jmcurran, @josh-m-sharpe, @jwhendy, @jzadra, @kfhk, @kiernann, @ksetdekov, @kwebihaf-github, @llrs, @loureynolds, @lucasmation, @lucifersFall1n1, @luisvalenzuelar, @matthiasgomolka, @MeoWoo6, @MichaelChirico, @mine-cetinkaya-rundel, @misea, @mkoohafkan, @moodymudskipper, @msgoussi, @nacnudus, @narayanana, @nfultz, @nickschurch, @nlneas1, @nqkhanh2209, @ntsigilis, @pitakakariki, @pmallot, @qdread, @queleanalytics, @ramay, @ramiromagno, @Rindrics, @rsbivand, @rwbaer, @saanasum, @sbearrows, @Sbirch556, @seanchrismurphy, @Shicheng-Guo, @Sibojang9, @simowaves, @smsaladi, @songc-93, @SteveDeitz, @struckma, @sureshvigneshbe, @tfulge, @topepo, @ucb, @vchouraki, @wanttobenatural, @wgrundlingh, @WilDoane, @zerogetsamgow, @zhangbs92, and @zx8754.
The readr package contains the most common functions in the tidyverse for importing data. The readr package is loaded when you run library(tidyverse). The tidyverse also includes the following packages for importing specific types of data. These are not loaded with library(tidyverse). You must load them individually when you need them.
read_excel() is loaded with the readxl package, which must be loaded with library(readxl). Pass read_excel() a file path or URL that leads to the Excel file you wish to read. readxl_example("datasets.xlsx") returns a file path that leads to an example Excel spreadsheet that is conveniently installed with the readxl package.
If you want to store data in some raw, non-R-specific form and make it available to the user, put it in inst/extdata/. For example, readr and readxl each use this mechanism to provide a collection of delimited files and Excel workbooks, respectively. See section Section 7.3.
To summarize, key differences of loading the data into R with readxl() or read_csv() are that none of the variables have been coerced to the factor data type. Instead. Many of the variables were loaded as character, or string data types.
As you can see, tidyverse packages are very powerful tools for loading, cleaning, and inspecting data so that you can begin analyzing your data right away! And remember, you can load all of these packages at once with library(tidyverse).
The person who wrote the question wanted to know about functionality that was unique to a single package, such as the the ability of readxl to obtain a list of worksheets in a spreadsheet. S/he was also interested in the relative performance of each of these packages.
All four of these packages support the above listed capabilities except for the ability to specify column names as an argument to the read function, which is only supported by readxl. This is not a major deficiency since columns can be easily renamed via the colnames() Base R function once a spreadsheet has been loaded as an R data frame.
The readxl package is not only provides a high degree of control for reading a variety of spreadsheet data, but it also performs best on a load test with a large spreadsheet. However, if one is interested in using R to generate Excel spreadsheets, the packages need to be re-evaluated with an objective set of criteria suited to the types of operations used to write, as opposed to read, Excel files. Specifically, the tidyverse software to write Excel files, writexl is a very new package and therefore has significantly fewer features than xlsx, openxlsx, or XLConnect that have been available for at least 5 years.
df19127ead