wikipedia dump 20230801

Skip to first unread message

Aug 2, 2023, 5:11:27 AM8/2/23
to aarddict
The dumps of the wikipedias as of 20230801 are smaller then the previous ones. I know you guys are monitoring that stuff :)

The generated slob file of 20230801 is 7.3GB instead of 8.3GB of the 20230701 one. The content of dewiki-20230701 was
  blob count: 3023063
  ref count: 5039666
whilst the content of dewiki-20230801 is
  blob count: 2888856
  ref count: 4752407
Same compression and blob size of course.

I do not know the reason of this change, but I guess Wikipedia cleaned up some articles. if anybody has a clue, please let me know.


Aug 13, 2023, 4:06:06 AM8/13/23
to aarddict
The analysis of the file size of the wikipedias and the blob count does not give a clear picture:
alswiki20230601 75148kB blob count: 29920
alswiki20230701 77172kB blob count: 30691
alswiki20230801 77128kB blob count: 30308
dewiki20230601 8484892kB blob count: 2988784
dewiki20230701 8734508kB blob count: 3023063
dewiki20230801 7694308kB blob count: 2888856
enwiki20230601 21656976kB blob count: 6821959
enwiki20230701 22132884kB blob count: 6875121
enwiki20230801 22534440kB blob count: 6850989
arwiki20230601 3189792kB blob count: 1443508
arwiki20230701 3020992kB blob count: 1340679
arwiki20230801 2663632kB blob count: 1249926
eswiki20230601 5926568kB blob count: 1937275
eswiki20230701 5575640kB blob count: 1902646
eswiki20230801 6040796kB blob count: 1943529
elwiki20230601 703072kB blob count: 225960
elwiki20230701 1038652kB blob count: 266727
elwiki20230801 894704kB blob count: 253935
fawiki20230601 2132420kB blob count: 1127309
fawiki20230701 1938144kB blob count: 1104685
fawiki20230801 1351468kB blob count: 985221
fiwiki20230601 1165320kB blob count: 610406
fiwiki20230701 1097420kB blob count: 589479
fiwiki20230801 975780kB blob count: 560332
frwiki20230601 9136212kB blob count: 2671310
frwiki20230701 0kB
frwiki20230801 8558008kB blob count: 2601013
iswiki20230601 65884kB blob count: 58553
iswiki20230701 66516kB blob count: 59228
iswiki20230801 65688kB blob count: 57825
itwiki20230601 0kB
itwiki20230701 5617688kB blob count: 1950362
itwiki20230801 5655348kB blob count: 1955749
jawiki20230601 6375684kB blob count: 1492389
jawiki20230701 5914140kB blob count: 1450596
jawiki20230801 5451056kB blob count: 1415867
nlwiki20230601 3048684kB blob count: 2252014
nlwiki20230701 2435420kB blob count: 2132932
nlwiki20230801 2645444kB blob count: 2182398
ptwiki20230601 3080460kB blob count: 1173835
ptwiki20230701 3108420kB blob count: 1175021
ptwiki20230801 2933592kB blob count: 1148651
ruwiki20230601 7768016kB blob count: 2065184
ruwiki20230701 8138608kB blob count: 2124846
ruwiki20230801 7107588kB blob count: 1998060
tawiki20230601 292892kB blob count: 164004
tawiki20230701 288872kB blob count: 161987
tawiki20230801 314288kB blob count: 166988
ukwiki20230601 3411648kB blob count: 1302485
ukwiki20230701 3497236kB blob count: 1312845
ukwiki20230801 3802468kB blob count: 1337362
zhwiki20230601 2657820kB blob count: 1276168
zhwiki20230701 4246944kB blob count: 1469929
zhwiki20230801 3772228kB blob count: 1422380
simplewiki20230601 353528kB blob count: 270114
simplewiki20230701 284216kB blob count: 235767
simplewiki20230801 288380kB blob count: 241287

It looks fine for me...

Aug 13, 2023, 4:09:46 AM8/13/23
to aarddict
Whereas wiktionaries see significant reduction in articles:

dewiktionary20230601 384220kB blob count: 1142749
dewiktionary20230701 380912kB blob count: 1139295
dewiktionary20230801 30588kB blob count: 61761
enwiktionary20230601 2710568kB blob count: 7749872
enwiktionary20230701 2675976kB blob count: 8528256
enwiktionary20230801 445856kB blob count: 426905
arwiktionary20230601 13628kB blob count: 80485
arwiktionary20230701 13760kB blob count: 80766
arwiktionary20230801 3124kB blob count: 9969
eswiktionary20230601 145524kB blob count: 928695
eswiktionary20230701 145448kB blob count: 930991
eswiktionary20230801 17612kB blob count: 11506
elwiktionary20230601 182572kB blob count: 825999
elwiktionary20230701 185280kB blob count: 828192
elwiktionary20230801 8540kB blob count: 10416
fiwiktionary20230601 98092kB blob count: 463419
fiwiktionary20230701 99892kB blob count: 470343
fiwiktionary20230801 19100kB blob count: 25651
frwiktionary20230601 1028104kB blob count: 4886356
frwiktionary20230701 1048248kB blob count: 4945711
frwiktionary20230801 453812kB blob count: 644626
itwiktionary20230701 96280kB blob count: 580228
itwiktionary20230801 8604kB blob count: 13892
tawiktionary20230601 65692kB blob count: 410253
tawiktionary20230701 66136kB blob count: 411695
tawiktionary20230801 3608kB blob count: 3476

On Wednesday, August 2, 2023 at 11:11:27 AM UTC+2 wrote:
Reply all
Reply to author
0 new messages