the column SALE.DATE is loaded as factor with wrong figures

22 views
Skip to first unread message

wud...@gmail.com

unread,
Aug 13, 2015, 8:54:51 PM8/13/15
to R-package-xlsx
Hi!
 I used read.xlsx2 to load a "*.xls" file into R. The result is shown below:

'data.frame': 27395 obs. of  21 variables:
 $ BOROUGH                       : Factor w/ 1 level "1": 1 1 1 1 1 1 1 1 1 1 ...
 $ NEIGHBORHOOD                  : Factor w/ 40 levels "                         ",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ BUILDING.CLASS.CATEGORY       : Factor w/ 33 levels "                                            ",..: 12 12 12 12 12 12 12 12 12 12 ...
 $ TAX.CLASS.AT.PRESENT          : Factor w/ 9 levels "  ","1","1A",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ BLOCK                         : Factor w/ 1421 levels "100","1000","1003",..: 1230 1230 1230 1230 1230 1230 1230 1230 1230 1230 ...
 $ LOT                           : Factor w/ 1791 levels "1","10","100",..: 341 342 343 344 346 347 348 350 353 354 ...
 $ EASE.MENT                     : Factor w/ 1 level " ": 1 1 1 1 1 1 1 1 1 1 ...
 $ BUILDING.CLASS.AT.PRESENT     : Factor w/ 121 levels "  ","A0","A1",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ ADDRESS                       : Factor w/ 13860 levels "1 5TH AVENUE, 15E                        ",..: 8818 8818 8818 8818 8818 8818 8818 8818 8818 8818 ...
 $ APART.MENT.NUMBER             : Factor w/ 2974 levels "-           ",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ ZIP.CODE                      : Factor w/ 48 levels "0","10001","10002",..: 14 14 14 14 14 14 14 14 14 14 ...
 $ RESIDENTIAL.UNITS             : Factor w/ 121 levels "0","1","10","102",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ COMMERCIAL.UNITS              : Factor w/ 52 levels "0","1","10","11",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ TOTAL.UNITS                   : Factor w/ 132 levels "0","1","10","105",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ LAND.SQUARE.FEET              : Factor w/ 1284 levels "0","1000","10000",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ GROSS.SQUARE.FEET             : Factor w/ 1883 levels "0","1000","10000",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ YEAR.BUILT                    : Factor w/ 135 levels "0","1000","1800",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ TAX.CLASS.AT.TIME.OF.SALE     : Factor w/ 3 levels "1","2","4": 2 2 2 2 2 2 2 2 2 2 ...
 $ BUILDING.CLASS.AT.TIME.OF.SALE: Factor w/ 120 levels "A0","A1","A4",..: 89 89 89 89 89 89 89 89 89 89 ...
 $ SALE.PRICE                    : Factor w/ 5293 levels "0","1","10","100",..: 1579 1011 133 657 1002 1068 1434 1579 1770 2035 ...
 $ SALE.DATE                     : Factor w/ 353 levels "41122","41123",..: 277 275 280 274 272 278 320 290 287 281 ...

All columns are loaded correctly except the last one "SALE.DATE". In my xls file, they look like "2013-05-20", "2013-05-16". In R they are transformed into "41414", "41410". Can you help me to find the reason?

I am running RStudio v.0.99.467, Windows 7 Enterprise. The file is attached.

Many thanks for helping!

Dan

Bok1.xlsx

Adrian Dragulescu

unread,
Aug 13, 2015, 9:05:27 PM8/13/15
to R-package-xlsx, wud...@gmail.com
See the argument colClasses in read.xlsx2 and readColumns.  Need to use col.classes=c("numeric", "Date", "character", etc.) to match your data type.   See also the example.

Adrian
Reply all
Reply to author
Forward
0 new messages