The current discussion has raised the question of how many ways there are there to count language coverage for a bit of software.
I can think of:
1) Percentage of the Internet population covered in native language (this is what my spreadsheet attempts to do)
2) Percentage of the world population covered in native language (this doesn't take into account internet penetration, but you could say "hey, that have a browser waiting for them when they get online")
3) Percentage of the world/internet population covered by any language they speak (Not sure where you'd find the right figures to do this)
4) Number of languages covered (this is what counting packs does, although it runs into trouble with the definition of "language", and the fact that you give a language with 10,000 speakers the same weight as one with 100,000,000)
5) Percentage of countries covered, by official language (This might be a good proxy for method 3, because you would hope that everyone in a country speaks at least one of the official languages)
> The current discussion has raised the question of how many ways there > are there to count language coverage for a bit of software.
> I can think of:
> 1) Percentage of the Internet population covered in native language > (this is what my spreadsheet attempts to do)
> 2) Percentage of the world population covered in native language > (this doesn't take into account internet penetration, but you could > say "hey, that have a browser waiting for them when they get online")
> 3) Percentage of the world/internet population covered by any language > they speak > (Not sure where you'd find the right figures to do this)
Where do you get reliebla figures for internet and/or world population? What's the use of such figures? What does it use if you know that the Sorbian languages cover 0,01% of the world population if I assume 6 milliards people? What's advantage if you know that? IMHO those figures will be meaningless, they will be too vague.
> 4) Number of languages covered > (this is what counting packs does, although it runs into trouble with > the definition of "language", and the fact that you give a language > with 10,000 speakers the same weight as one with 100,000,000)
> 5) Percentage of countries covered, by official language > (This might be a good proxy for method 3, because you would hope that > everyone in a country speaks at least one of the official languages)
Number of languages you should combine with number of localesto include varieties of a language.
Why percentage? A percentage has statistical significance only, without real use. Better is the number of covered countries and not covered countries combined with locales because in a lot of countries more than 1 language are spoken.
> What's the use of such figures? What does it use if you know that the > Sorbian languages cover 0,01% of the world population if I assume 6 > milliards people? What's advantage if you know that? IMHO those figures > will be meaningless, they will be too vague.
It is one way to get a rough estimate which populations are under served and represent possible opportunities, to help prioritize future community-building efforts. (Mozilla Manifesto says Mozilla works for public benefit; the public includes all populations of the world.)
> Why percentage? A percentage has statistical significance only, without > real use.
Firefox gets free publicity in the news when it gains market share percentage points. One way these figures can be used is to find opportunities where it may gain most market share.
> Better is the number of covered countries and not covered
> countries combined with locales because in a lot of countries more than > 1 language are spoken.
Better for what purpose?
Maybe a marketing-oriented person likes comparing feature checklists. Locale checklists without populations seem simpler in that case. But it doesn't provide much guidance on how to focus resources to fill the empty check-boxes.
One possible fear: The Mozilla community has limited human resources to reach out and help support new localization communities, so for the purpose of allocating time to candidate communities, one consideration is population. This doesn't mean that the long tail of smaller communities are to be excluded, but smaller communities may get less attention.
(One way to allocate is by population: if say five 1% locales are waiting for reviews and two 2.5% locales don't have localizations yet, then maybe roughly half the person-hours could be spent on the reviews and half the person-hours spent on helping a localization community get started in the under served locales.)
Gervase Markham wrote: > The current discussion has raised the question of how many ways there > are there to count language coverage for a bit of software.
> I can think of:
> 1) Percentage of the Internet population covered in native language > (this is what my spreadsheet attempts to do)
> 2) Percentage of the world population covered in native language > (this doesn't take into account internet penetration, but you could > say "hey, that have a browser waiting for them when they get online")
> 3) Percentage of the world/internet population covered by any language > they speak > (Not sure where you'd find the right figures to do this)
> 4) Number of languages covered > (this is what counting packs does, although it runs into trouble with > the definition of "language", and the fact that you give a language > with 10,000 speakers the same weight as one with 100,000,000)
> 5) Percentage of countries covered, by official language > (This might be a good proxy for method 3, because you would hope that > everyone in a country speaks at least one of the official languages)
> Can anyone think of any more?
I don't really think that getting more metrics is worthwhile. For status reports on how our localization coverage is growing, the metrics should yield more or less similar answers.
If you have more specific questions, it's probably a good idea to pick a well-suited metric to answer that question, but that will likely be a different metric for each.
Expect that the data we have might not help in getting answers to those questions, independent of the metric we use. The data we have might be just to vague, or changing.
Gervase Markham wrote: > 1) Percentage of the Internet population covered in native language
> 2) Percentage of the world population covered in native language
Both of those usually assume people have only one native language, which applies to the majority of people but the rest might be significant when looking at 2-10% of the (internet) population not being covered and trying to find out about differences there.
Of course, counting gets hard when you want or need to respect people with multiple native languages and still respect what their first choice is between them.
Robert Kaiser wrote: > Both of those usually assume people have only one native language, which > applies to the majority of people but the rest might be significant when > looking at 2-10% of the (internet) population not being covered and > trying to find out about differences there.
> Of course, counting gets hard when you want or need to respect people > with multiple native languages and still respect what their first choice > is between them.
So you agree that the people you say have multiple native languages still have a first choice? Then what's the problem? :-)
The last US census (I think) replaced the question about native language with one about "language spoken at home". Which seems like a better question, to which (for the vast majority of people) there is just one answer. Can we in future assume that this is what we mean when we say "native language"? :-)
Axel Hecht wrote: > I don't really think that getting more metrics is worthwhile. For status > reports on how our localization coverage is growing, the metrics should > yield more or less similar answers.
I think that's clearly not the case.
If we added support for Anfillo, Ainu, Amurdag, Alsatian, Abenaki and Amanaye, then by the "number of language packs" metric, it would seem that our coverage has grown by 10%. But by the "% population covered" metric, it would hardly have grown at all. (Source: http://en.wikipedia.org/wiki/List_of_endangered_languages)
> If you have more specific questions, it's probably a good idea to pick a > well-suited metric to answer that question, but that will likely be a > different metric for each.
Right. So that would be an argument for having good figures for several different metrics.
> Expect that the data we have might not help in getting answers to those > questions, independent of the metric we use. The data we have might be > just to vague, or changing.
That's possible, but requires proof. If you have issues with the data I'm using, please raise them :-)
Gervase Markham wrote: > Axel Hecht wrote: >> I don't really think that getting more metrics is worthwhile. For status >> reports on how our localization coverage is growing, the metrics should >> yield more or less similar answers.
> I think that's clearly not the case.
> If we added support for Anfillo, Ainu, Amurdag, Alsatian, Abenaki and > Amanaye, then by the "number of language packs" metric, it would seem > that our coverage has grown by 10%. But by the "% population covered" > metric, it would hardly have grown at all. > (Source: http://en.wikipedia.org/wiki/List_of_endangered_languages)
So "less similar" in this case. Btw, "Population covered" is not going to change significantly anymore, "population not covered" might, though.
>> If you have more specific questions, it's probably a good idea to pick a >> well-suited metric to answer that question, but that will likely be a >> different metric for each.
> Right. So that would be an argument for having good figures for several > different metrics.
>> Expect that the data we have might not help in getting answers to those >> questions, independent of the metric we use. The data we have might be >> just to vague, or changing.
> That's possible, but requires proof. If you have issues with the data > I'm using, please raise them :-)
Do you have errors for your data? Time they were measured? Trends? And errors in trends? If we had all that, take your metric, and do some happy propagation of uncertainty, http://en.wikipedia.org/wiki/Propagation_of_uncertainty.
In general, I'd expect the errors to be larger for smaller languages than for big ones, bigger for poorer countries than for thoroughly industrialized (or post-industrialized) ones.
I think it would be useful to look for the populations (either Internet-connected or not) that no version of Firefox exists to cater their needs in terms of language.
If we take for example the Thai language; there is no native language build for F3 at mozilla.com. There is a http://www.firefoxthai.com/ for F2 (which uses the Firefox logos) but no indication for Firefox3. What I would think is useful, is statistics that show the percentage of the population of Thailand that speaks only Thai, and cannot find a suitable Firefox 3.
For the purposes of L10n, it would be good to see which populations (sorted by size) are affected by the lack of a version of F3 that is not available in any language they speak/read.
On Sat, Jun 14, 2008 at 9:33 AM, Gervase Markham <g...@mozilla.org> wrote: > The current discussion has raised the question of how many ways there > are there to count language coverage for a bit of software.
> I can think of:
> 1) Percentage of the Internet population covered in native language > (this is what my spreadsheet attempts to do)
> 2) Percentage of the world population covered in native language > (this doesn't take into account internet penetration, but you could > say "hey, that have a browser waiting for them when they get online")
> 3) Percentage of the world/internet population covered by any language > they speak > (Not sure where you'd find the right figures to do this)
> 4) Number of languages covered > (this is what counting packs does, although it runs into trouble with > the definition of "language", and the fact that you give a language > with 10,000 speakers the same weight as one with 100,000,000)
> 5) Percentage of countries covered, by official language > (This might be a good proxy for method 3, because you would hope that > everyone in a country speaks at least one of the official languages)
Simos Xenitellis wrote: > For the purposes of L10n, it would be good to see which populations > (sorted by size) are affected by the lack of a version of F3 that is > not available in any language they speak/read.
This is my method (3). The problem is that I don't know of a source of data which can provide the necessary figures.
For a given country, you would need to know data something like this:
UK: All Languages Spoken:
English only: 47.3% English and Bengali: 4.3% Bengali only: 0.14% English and Polish: 1.6% Polish, Latvian and Russian: 0.006% ...
Then, you could look at the languages and go through ticking off the groups for which you had at least one hit.
It would be a very long and very detailed list. I don't know if such data is available even for countries with very good censuses.
Axel Hecht wrote: > Gervase Markham wrote: >> Axel Hecht wrote: >>> I don't really think that getting more metrics is worthwhile. For status >>> reports on how our localization coverage is growing, the metrics should >>> yield more or less similar answers.
>> I think that's clearly not the case.
>> If we added support for Anfillo, Ainu, Amurdag, Alsatian, Abenaki and >> Amanaye, then by the "number of language packs" metric, it would seem >> that our coverage has grown by 10%. But by the "% population covered" >> metric, it would hardly have grown at all. >> (Source: http://en.wikipedia.org/wiki/List_of_endangered_languages)
> So "less similar" in this case.
:-) "More or less similar" is an idiom; it doesn't mean "either more similar or less similar", it means "quite similar".
> Btw, "Population covered" is not going > to change significantly anymore, "population not covered" might, though.
I don't understand what you mean by that. If "population covered" + "population not covered" = 100%, how can "population covered" not change if "population not covered" changes?
> Do you have errors for your data? Time they were measured? Trends? And > errors in trends? If we had all that, take your metric, and do some > happy propagation of uncertainty, > http://en.wikipedia.org/wiki/Propagation_of_uncertainty.
> In general, I'd expect the errors to be larger for smaller languages > than for big ones,
But such errors have less effect on the overall result, because the absolute numbers are smaller. If I say the population whose native language is Alsatian is 10,000, when in fact it's 20,000, that's not going to produce a noticeable error when the internet population is about 1 billion.
> bigger for poorer countries than for thoroughly > industrialized (or post-industrialized) ones.
You are probably right there.
But the question is: if your data is not perfect, do you just give up, or do you work with the data you have? When I said "If you have issues", I didn't mean "List all of the statistical problems it might have", I meant "provide better data if you have some, otherwise let's go with what we've got".
On Mon, Jun 16, 2008 at 11:43 AM, Gervase Markham <g...@mozilla.org> wrote: > Simos Xenitellis wrote: >> For the purposes of L10n, it would be good to see which populations >> (sorted by size) are affected by the lack of a version of F3 that is >> not available in any language they speak/read.
> This is my method (3). The problem is that I don't know of a source of > data which can provide the necessary figures.
> For a given country, you would need to know data something like this:
> UK: All Languages Spoken:
> English only: 47.3% > English and Bengali: 4.3% > Bengali only: 0.14% > English and Polish: 1.6% > Polish, Latvian and Russian: 0.006% > ...
> Then, you could look at the languages and go through ticking off the > groups for which you had at least one hit.
> It would be a very long and very detailed list. I don't know if such > data is available even for countries with very good censuses.
Gervase Markham wrote: > Axel Hecht wrote: >> Gervase Markham wrote: >>> Axel Hecht wrote: >>>> I don't really think that getting more metrics is worthwhile. For status >>>> reports on how our localization coverage is growing, the metrics should >>>> yield more or less similar answers. >>> I think that's clearly not the case.
>>> If we added support for Anfillo, Ainu, Amurdag, Alsatian, Abenaki and >>> Amanaye, then by the "number of language packs" metric, it would seem >>> that our coverage has grown by 10%. But by the "% population covered" >>> metric, it would hardly have grown at all. >>> (Source: http://en.wikipedia.org/wiki/List_of_endangered_languages) >> So "less similar" in this case.
> :-) "More or less similar" is an idiom; it doesn't mean "either more > similar or less similar", it means "quite similar".
To me, it's equivalent norms, which is, when one grows, the other grows and vice versa.
>> Btw, "Population covered" is not going >> to change significantly anymore, "population not covered" might, though.
> I don't understand what you mean by that. If "population covered" + > "population not covered" = 100%, how can "population covered" not change > if "population not covered" changes?
If you count in %, both have to change.
On the other hand, say we take 90% for one and 10% for the other, an additional language with 2% changes one by a mere 2.2%, which it changes the other by a whopping 20%.
Thus, relatively, covered population is hardly going to change by any language we get, uncovered population on the other hand might.
>> Do you have errors for your data? Time they were measured? Trends? And >> errors in trends? If we had all that, take your metric, and do some >> happy propagation of uncertainty, >> http://en.wikipedia.org/wiki/Propagation_of_uncertainty.
>> In general, I'd expect the errors to be larger for smaller languages >> than for big ones,
> But such errors have less effect on the overall result, because the > absolute numbers are smaller. If I say the population whose native > language is Alsatian is 10,000, when in fact it's 20,000, that's not > going to produce a noticeable error when the internet population is > about 1 billion.
It does when you start comparing it to other small languages. Which you to some extent did in your blog post.
>> bigger for poorer countries than for thoroughly >> industrialized (or post-industrialized) ones.
> You are probably right there.
> But the question is: if your data is not perfect, do you just give up, > or do you work with the data you have? When I said "If you have issues", > I didn't mean "List all of the statistical problems it might have", I > meant "provide better data if you have some, otherwise let's go with > what we've got".
Neither. You have to ask appropriate questions for your data to answer, and if it's really noisy or uncertain data, you have to ask questions that work well with fuzzy answers.
For example "Which language should Microsoft do next?" is likely a bogus question, given that you answer was Balochi. That is probably affected by each and every discaimer you gave on the assumptions you made, and has likely uncertain initial data, too. I'm running on the assumption that the next runner up wasn't at just 0.10%.
Gervase Markham wrote: > So you agree that the people you say have multiple native languages > still have a first choice? Then what's the problem? :-)
The problem is that you can reach them equally well in all of those languages, but they still prefer one. So if you ask "how many people do we reach?" you only have to serve them any of those languages (e.g. Sorbs, who all grow up bilingually). That e.g. means you can wipe out minority languages whose native speakers are all bilingual from your statistics (like Sorbian, or I guess also Gaelic), and you can wipe out language variants, as Canadians, British, Irish, and South African people probably can all be reached with en-US. If you want to serve them in their preferred language, then you need to look into variants as well as minority languages whose speakers are all bilingual, and the picture gets both more difficult but also more interesting.
So, the big problem is that you can't say "Microsoft doesn't reach Sorbs" just because they don't offer Sorbian, as they reach them pretty well with German (probably as well as one reaches Brits with en-US, actually). What you can say is that we serve Sorbs better by offering that language than by German, but we actually reach them with both.
> The last US census (I think) replaced the question about native language > with one about "language spoken at home". Which seems like a better > question, to which (for the vast majority of people) there is just one > answer. Can we in future assume that this is what we mean when we say > "native language"? :-)
So, you mean French _and_ German _and_ their Austrian dialect for a friend of mine, as she speaks all three of them at home? ;-)
Robert Kaiser wrote: > Gervase Markham wrote: >> So you agree that the people you say have multiple native languages >> still have a first choice? Then what's the problem? :-)
> The problem is that you can reach them equally well in all of those > languages, but they still prefer one. So if you ask "how many people do > we reach?" you only have to serve them any of those languages (e.g. > Sorbs, who all grow up bilingually). That e.g. means you can wipe out > minority languages whose native speakers are all bilingual from your > statistics (like Sorbian, or I guess also Gaelic),
I think Axel and Pascal would eat me alive if I tried that.
> and you can wipe out > language variants, as Canadians, British, Irish, and South African > people probably can all be reached with en-US.
I did make that optimisation.
> If you want to serve them in their preferred language, then you need to > look into variants as well as minority languages whose speakers are all > bilingual, and the picture gets both more difficult but also more > interesting.
I think that if you talk about preferred language, the picture gets less difficult. "What language do you speak at home?" is a question that anyone can answer, and many countries have stats for. "What are all the languages you speak?" is not a question I've found data for, for any country.
> So, the big problem is that you can't say "Microsoft doesn't reach > Sorbs" just because they don't offer Sorbian, as they reach them pretty > well with German (probably as well as one reaches Brits with en-US, > actually).
Quite so. Which is why I don't say that Microsoft doesn't reach Sorbs. :-)
> What you can say is that we serve Sorbs better by offering > that language than by German, but we actually reach them with both.
That is true. We are again back to my point that I am putting 90% and 100% solutions in the same basket and contrasting them with the 0% solution.
> So, you mean French _and_ German _and_ their Austrian dialect for a > friend of mine, as she speaks all three of them at home? ;-)
Simos Xenitellis wrote: > For example, for Thailand, the page is > http://www.ethnologue.com/show_country.asp?name=TH > which says that the vast majority of the population (>93%) speaks Thai > or dialects of Thai.
Right. But it doesn't tell us what you wanted to know, which was "statistics that show the percentage of the population of Thailand that speaks *only* Thai".
> So, you mean French _and_ German _and_ their Austrian dialect for a > friend of mine, as she speaks all three of them at home? ;-)
Yes, and I know a German who speaks Upper Sorbian, Lower Sorbian, Esperanto and Lithuanian. He is professor for Sorabistics in Lipsia. AFAIK he speaks with wife Lithuanian at home though she is esperantist as well. :-)
Gervase Markham wrote: > Axel Hecht wrote: >> For example "Which language should Microsoft do next?" is likely a bogus >> question, given that you answer was Balochi.
> That is, I think, because I worked that out by eye and I can't count. > The right answer is Belarusian.
Huh? Anyway, looking at the snapshot, MS neither has bal nor be, which you attribute 0.511 and 0.452 % to, resp., which is a 13% difference. 10-ish % seems to be *very* low as an error bar, so I don't see how your data should make a call on whether it'd be bal or be.
Thus, "Which locale should Microsoft do next" is an ill-posed question for the data you have.
Axel Hecht wrote: > Huh? Anyway, looking at the snapshot, MS neither has bal nor be, which > you attribute 0.511 and 0.452 % to, resp.,
Try the new one. There was a bug: bal is now 0.047%, and be is 0.450% - a factor of 10 difference.
Next after Belarusian is Oriya, 5478000 vs. 2085318 - so more than half. With the data I have, Belarusian is the clear answer. Of course, if you want to improve the data, that could possibly change. :-)
On Mon, Jun 16, 2008 at 5:44 PM, Gervase Markham <g...@mozilla.org> wrote: > Simos Xenitellis wrote: >> For example, for Thailand, the page is >> http://www.ethnologue.com/show_country.asp?name=TH >> which says that the vast majority of the population (>93%) speaks Thai >> or dialects of Thai.
> Right. But it doesn't tell us what you wanted to know, which was > "statistics that show the percentage of the population of Thailand that > speaks *only* Thai".
Is the issue that we do not know how many people in Thailand speak English, apart from speaking Thai? Or, is the issue, how many people in Thailand speak Thai, but no any other minority languages?
In both questions above, my view is that if we are to go into those directions, we miss the point. The point is that in the example of Thailand, over 93% of the people speak Thai (and other speak probably Thai + some other minority language). Considering that Thailand has a population of 65 million people, we have a chunk of over 60 million people with no Thai Firefox 3.
Simos Xenitellis wrote: > Is the issue that we do not know how many people in Thailand speak > English, apart from speaking Thai? Or, is the issue, how many people > in Thailand speak Thai, but no any other minority languages?
> In both questions above, my view is that if we are to go into those > directions, we miss the point. The point is that in the example of > Thailand, over 93% of the people speak Thai (and other speak probably > Thai + some other minority language). Considering that Thailand has a > population of 65 million people, we have a chunk of over 60 million > people with no Thai Firefox 3.
You are moving backwards and forwards between two positions :-)
To recap: there are two ways that it's possible to count. From the point of view of the user:
1) "Firefox is supported in at least one of the languages I speak."
This is the data you originally asked about. This would produce higher percentages because e.g. at the moment, we have to say that we don't serve any Thai people because we don't have an official localization in Thai, but if we used this method, we could say we served all the Thais who also speak English. But my point is that there is no data available that gives us a list of all the languages that each person in Thailand speaks. So we can't calculate this figure.
2) "Firefox is supported in my native (spoken-at-home) language."
It seems to me that this is what you have switched to talking about in your message above. This is what my spreadsheet shows us. And it does indeed tell us that there are 8.57 million people (net population, not total population) in Thailand with no Firefox 3 in their native Thai language.
Gervase Markham wrote: > Robert Kaiser wrote: >> Gervase Markham wrote: >>> So you agree that the people you say have multiple native languages >>> still have a first choice? Then what's the problem? :-) >> The problem is that you can reach them equally well in all of those >> languages, but they still prefer one. So if you ask "how many people do >> we reach?" you only have to serve them any of those languages (e.g. >> Sorbs, who all grow up bilingually). That e.g. means you can wipe out >> minority languages whose native speakers are all bilingual from your >> statistics (like Sorbian, or I guess also Gaelic),
> I think Axel and Pascal would eat me alive if I tried that.
>> and you can wipe out >> language variants, as Canadians, British, Irish, and South African >> people probably can all be reached with en-US.
> I did make that optimisation.
You just said that you are measuring the same thing differently for those, so your measurement is inconsistent. :p
> I think that if you talk about preferred language, the picture gets less > difficult. "What language do you speak at home?" is a question that > anyone can answer, and many countries have stats for. "What are all the > languages you speak?" is not a question I've found data for, for any > country.
What about ethnologue.com?
At least http://www.ethnologue.com/show_country.asp?name=AT clearly shows for me that most of the 8,174,762 people in Austria natively speak both Bavarian (6,983,298) and Standard German (7,500,000) - of course it doesn't tell what part natively speaks only one of those languages that are listed on the page, or which two (or three) of them. (Note that in the case of the those two languages, both are variants if German, we cover all those people with our Standard German L10n currently, as all speakers of Bavarian, Alemannisch and Walser at least learn Standard German as a child.)
And the statistics you mean are about "What language do you prefer to speak at home?" as the question "What language do you speak at home?" would come out with multiple languages for many people and leading to the more difficult picture.
Robert Kaiser wrote: > You just said that you are measuring the same thing differently for > those, so your measurement is inconsistent. :p
I don't think so. Saying that an en-US speaker and an en-GB speaker speak the same language is not the same thing as saying that someone whose first language is Hindi and whose second language is English can be served with an English build (for example).
>> I think that if you talk about preferred language, the picture gets less >> difficult. "What language do you speak at home?" is a question that >> anyone can answer, and many countries have stats for. "What are all the >> languages you speak?" is not a question I've found data for, for any >> country.
> What about ethnologue.com?
> At least http://www.ethnologue.com/show_country.asp?name=AT clearly > shows for me that most of the 8,174,762 people in Austria natively speak > both Bavarian (6,983,298) and Standard German (7,500,000) - of course it > doesn't tell what part natively speaks only one of those languages that > are listed on the page, or which two (or three) of them.
Exactly. And this is precisely the information we would need.
> And the statistics you mean are about "What language do you prefer to > speak at home?" as the question "What language do you speak at home?" > would come out with multiple languages for many people and leading to > the more difficult picture.
I don't agree. I think "language spoken at home" is a single language for 99.9% of people.
On Sat, 2008-06-14 at 09:33 +0100, Gervase Markham wrote: > The current discussion has raised the question of how many ways there > are there to count language coverage for a bit of software.
> I can think of:
> 1) Percentage of the Internet population covered in native language > (this is what my spreadsheet attempts to do)
What might be much harder but which we've used a bit in South Africa is to look at language groups. Xhosa, Zulu, Ndebele and Swati are all Nguni languages. There are 20 mill Zulu speakers but a Zulu Firefox could be used and understood by the other +- 15million who speak another Nguni language.
But getting data on this is hard.
> 2) Percentage of the world population covered in native language > (this doesn't take into account internet penetration, but you could > say "hey, that have a browser waiting for them when they get online")
Considering the cellphone revolution in certainly Africa, this figure does make sense.
> 3) Percentage of the world/internet population covered by any language > they speak > (Not sure where you'd find the right figures to do this)
> 4) Number of languages covered > (this is what counting packs does, although it runs into trouble with > the definition of "language", and the fact that you give a language > with 10,000 speakers the same weight as one with 100,000,000)
I think this count still gives an indication of how our process can scale, the power of openness, etc. But I think it really only makes sense when compared to other browsers and we can show change over time.
> 5) Percentage of countries covered, by official language > (This might be a good proxy for method 3, because you would hope that > everyone in a country speaks at least one of the official languages)
It is also a great influencer for adoption within Government and education so a good measure. Also relatively easy to administer as official languages are pretty static. This can be deceptive though in certain countries where official languages are not reflective of the spoken languages in the country and may cover a small minority of the country.