Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Trouble extracting UTF-8 non- english from Excel using Apache POI

3,054 views
Skip to first unread message

Daniel Meredith

unread,
May 20, 2003, 3:21:31 PM5/20/03
to
Just as the subject says. I am having issue extracting non- english
characters from excel using Apache POI. Does anyone have experience
using POI with other languages. Specifically I need to be able to get
the text from cells and convert it to UTF-8 and then store it as such.
Currently the non-english text is being returned as question marks.

Here is the code that currently extracts the text from a cell:

private String getString(HSSFCell cell) {
String str = null;
if(cell.getCellType() == HSSFCell.CELL_TYPE_STRING)
try{
str = new String(cell.getStringCellValue().getBytes(), "UTF-8");
}catch(Exception e) {
if(Debug.isOn) Debug.out("Encoding Error: " + e.toString());
}
else if(cell.getCellType() == HSSFCell.CELL_TYPE_NUMERIC)
str = new Double(cell.getNumericCellValue()).toString();
else
str = new String();

return str.trim();
}

Any ideas?
~dnm

Jon Skeet

unread,
May 21, 2003, 3:32:16 AM5/21/03
to
Daniel Meredith <daniel_...@yahoo.com> wrote:
> Just as the subject says. I am having issue extracting non- english
> characters from excel using Apache POI. Does anyone have experience
> using POI with other languages. Specifically I need to be able to get
> the text from cells and convert it to UTF-8 and then store it as such.
> Currently the non-english text is being returned as question marks.
>
> Here is the code that currently extracts the text from a cell:

<snip>

> str = new String(cell.getStringCellValue().getBytes(), "UTF-8");

That looks pretty dodgy to me - it's fetching something as a string
already, encoding it in whatever the default platform encoding is, and
then assuming that that encoded form is the appropriate UTF-8 form. At
least, that's assuming getStringCellValue returns a string, which seems
likely.

Don't forget that strings in Java are already Unicode - they don't
*have* an encoding, as such.

(Two other quick points - there's rarely a need to use new String() -
just use "" instead, unless you really want a different reference.
Similarly, use String.valueOf(double) instead of creating a new Double
and then calling toString.)

--
Jon Skeet - <sk...@pobox.com>
http://www.pobox.com/~skeet/
If replying to the group, please do not mail me too

Tomy

unread,
May 21, 2003, 5:29:56 AM5/21/03
to comp.lang.java.programmer
Jon Skeet <sk...@pobox.com> wrote:
> (Two other quick points - there's rarely a need to use new String() -
> just use "" instead, unless you really want a different reference.
> Similarly, use String.valueOf(double) instead of creating a new Double
> and then calling toString.)

Actually from what I can see in the source which comes with j2sdk 1.4
best way is: Double.toString(double).

---
Tomy.
-----------------------
t.pet...@inet.hr


Jon Skeet

unread,
May 21, 2003, 5:58:55 AM5/21/03
to
Tomy <t.pet...@inet.hr> wrote:
> Jon Skeet <sk...@pobox.com> wrote:
> > (Two other quick points - there's rarely a need to use new String() -
> > just use "" instead, unless you really want a different reference.
> > Similarly, use String.valueOf(double) instead of creating a new Double
> > and then calling toString.)
>
> Actually from what I can see in the source which comes with j2sdk 1.4
> best way is: Double.toString(double).

It depends on the exact class library - sometimes Double.toString
(double) will be faster than String.valueOf(double), other times it'll
be the other way round. The difference will be minimal. However, the
good thing about using String.valueOf is that it can be used for *all*
types - if you change the type of the variable, you don't need to
change that line of code.

Tomy

unread,
May 21, 2003, 6:24:01 AM5/21/03
to comp.lang.java.programmer
Jon Skeet <sk...@pobox.com> wrote:
> It depends on the exact class library - sometimes Double.toString
> (double) will be faster than String.valueOf(double), other times it'll
> be the other way round. The difference will be minimal.
Yes, I agree....
Since they do the same one will probably call the other.
In Sun source String.valueOf() class Double.toString()...

> However, the
> good thing about using String.valueOf is that it can be used for *all*
> types - if you change the type of the variable, you don't need to
> change that line of code.

Agreed 100% on this one :)

--
Tomy.
-----------------------
t.pet...@inet.hr


jabbara...@gmail.com

unread,
Nov 23, 2014, 10:41:21 PM11/23/14
to
Hi Guys,

I am stuck in reading the NON English characters (Chinese, Japanese) from excel sheet using POI API .. if you have any snippet or working code. Please provide.

Appreciate your help.

Thanks

Regards
Jabbar

muthukumar...@gmail.com

unread,
Dec 3, 2014, 3:50:45 PM12/3/14
to
hi jabbar,

May i know how have you done it for german characters. I am facing issue with even german characters
0 new messages