anyway to save memory for voldemort readonly store

101 views
Skip to first unread message

Xiao Zhou

unread,
Jun 3, 2014, 5:51:47 PM6/3/14
to project-...@googlegroups.com
I have a readonly store which used 78G memory per node, mostly for index files I think.
Is there any easy way to reduce the memory used?
If we increase the replication factor, with the index file and memory increase the same amount?
For example , if we increase the replication from 2 to 4 , will it require 2 times as much memory.
Thanks,

Brendan Harris (a.k.a. stotch on irc.oftc.net)

unread,
Jun 4, 2014, 12:15:47 AM6/4/14
to project-...@googlegroups.com
Hi Xiao,

Can you please attach your jvm arguments that you use to launch voldemort, your server.properties and your cluster.xml? Be sure to anonymize/mask any sensitive information, like host names before attaching them.

Also, how many stores do you have on the cluster and about how many records are in those stores?

Thanks,

Brendan

Xiao Zhou

unread,
Jun 4, 2014, 1:29:25 PM6/4/14
to project-...@googlegroups.com
We have only one store with around 3 billion records. Thanks,
java -Dlog4j.configuration=file:///usr/local/voldemort/src/java/log4j.properties -Xmx4G -server -Dcom.sun.management.jmxremote voldemort.server.VoldemortServer ./ ./config

<cluster>
   <name>productioncluster</name>
   <server>
    <id>0</id>
    <host>****</host>
    <http-port>8081</http-port>
    <socket-port>6666</socket-port>
    <admin-port>6667</admin-port>
    <partitions>0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60,64,68,72,76,80,84,88,92,96,100,104,108,112,116,120,124,128,132,136,140,144,148,152,156,160,164,168,172,176,180,184,188,192,196,200,204,208,212,216,220,224,228,232,236,240,244,248,252,256,260,264,268,272,276,280,284,288,292,296,300,304,308,312,316,320,324,328,332,336,340,344,348,352,356,360,364,368,372,376,380,384,388,392,396,400,404,408,412,416,420,424,428,432,436,440,444,448,452,456,460,464,468,472,476,480,484,488,492,496,500,504,508,512,516,520,524,528,532,536,540,544,548,552,556,560,564,568,572,576,580,584,588,592,596,600,604,608,612,616,620,624,628,632,636,640,644,648,652,656,660,664,668,672,676,680,684,688,692,696,700,704,708,712,716,720,724,728,732,736,740,744,748,752,756,760,764,768,772,776,780,784,788,792,796,800,804,808,812,816,820,824,828,832,836,840,844,848,852,856,860,864,868,872,876,880,884,888,892,896,900,904,908,912,916,920,924,928,932,936,940,944,948,952,956,960,964,968,972,976,980,984,988,992,996,1000,1004,1008,1012,1016,1020,1024,1028,1032,1036,1040,1044,1048,1052,1056,1060,1064,1068,1072,1076,1080,1084,1088,1092,1096,1100,1104,1108,1112,1116,1120,1124,1128,1132,1136,1140,1144,1148,1152,1156,1160,1164,1168,1172,1176,1180,1184,1188,1192,1196,1200,1204,1208,1212,1216,1220,1224,1228,1232,1236,1240,1244,1248,1252,1256,1260,1264,1268,1272,1276,1280,1284,1288,1292,1296,1300,1304,1308,1312,1316,1320,1324,1328,1332,1336,1340,1344,1348,1352,1356,1360,1364,1368,1372,1376,1380,1384,1388,1392,1396,1400,1404,1408,1412,1416,1420,1424,1428,1432,1436,1440,1444,1448,1452,1456,1460,1464,1468,1472,1476,1480,1484,1488,1492,1496,1500,1504,1508,1512,1516,1520,1524,1528,1532,1536,1540,1544,1548,1552,1556,1560,1564,1568,1572,1576,1580,1584,1588,1592,1596,1600,1604,1608,1612,1616,1620,1624,1628,1632,1636,1640,1644,1648,1652,1656,1660,1664,1668,1672,1676,1680,1684,1688,1692,1696,1700,1704,1708,1712,1716,1720,1724,1728,1732,1736,1740,1744,1748,1752,1756,1760,1764,1768,1772,1776,1780,1784,1788,1792,1796,1800,1804,1808,1812,1816,1820,1824,1828,1832,1836,1840,1844,1848,1852,1856,1860,1864,1868,1872,1876,1880,1884,1888,1892,1896,1900,1904,1908,1912,1916,1920,1924,1928,1932,1936,1940,1944,1948,1952,1956,1960,1964,1968,1972,1976,1980,1984,1988,1992,1996</partitions>
  </server>
  <server>
    <id>1</id>
    <host>****</host>
    <http-port>8081</http-port>
    <socket-port>6666</socket-port>
    <admin-port>6667</admin-port>
    <partitions>1,5,9,13,17,21,25,29,33,37,41,45,49,53,57,61,65,69,73,77,81,85,89,93,97,101,105,109,113,117,121,125,129,133,137,141,145,149,153,157,161,165,169,173,177,181,185,189,193,197,201,205,209,213,217,221,225,229,233,237,241,245,249,253,257,261,265,269,273,277,281,285,289,293,297,301,305,309,313,317,321,325,329,333,337,341,345,349,353,357,361,365,369,373,377,381,385,389,393,397,401,405,409,413,417,421,425,429,433,437,441,445,449,453,457,461,465,469,473,477,481,485,489,493,497,501,505,509,513,517,521,525,529,533,537,541,545,549,553,557,561,565,569,573,577,581,585,589,593,597,601,605,609,613,617,621,625,629,633,637,641,645,649,653,657,661,665,669,673,677,681,685,689,693,697,701,705,709,713,717,721,725,729,733,737,741,745,749,753,757,761,765,769,773,777,781,785,789,793,797,801,805,809,813,817,821,825,829,833,837,841,845,849,853,857,861,865,869,873,877,881,885,889,893,897,901,905,909,913,917,921,925,929,933,937,941,945,949,953,957,961,965,969,973,977,981,985,989,993,997,1001,1005,1009,1013,1017,1021,1025,1029,1033,1037,1041,1045,1049,1053,1057,1061,1065,1069,1073,1077,1081,1085,1089,1093,1097,1101,1105,1109,1113,1117,1121,1125,1129,1133,1137,1141,1145,1149,1153,1157,1161,1165,1169,1173,1177,1181,1185,1189,1193,1197,1201,1205,1209,1213,1217,1221,1225,1229,1233,1237,1241,1245,1249,1253,1257,1261,1265,1269,1273,1277,1281,1285,1289,1293,1297,1301,1305,1309,1313,1317,1321,1325,1329,1333,1337,1341,1345,1349,1353,1357,1361,1365,1369,1373,1377,1381,1385,1389,1393,1397,1401,1405,1409,1413,1417,1421,1425,1429,1433,1437,1441,1445,1449,1453,1457,1461,1465,1469,1473,1477,1481,1485,1489,1493,1497,1501,1505,1509,1513,1517,1521,1525,1529,1533,1537,1541,1545,1549,1553,1557,1561,1565,1569,1573,1577,1581,1585,1589,1593,1597,1601,1605,1609,1613,1617,1621,1625,1629,1633,1637,1641,1645,1649,1653,1657,1661,1665,1669,1673,1677,1681,1685,1689,1693,1697,1701,1705,1709,1713,1717,1721,1725,1729,1733,1737,1741,1745,1749,1753,1757,1761,1765,1769,1773,1777,1781,1785,1789,1793,1797,1801,1805,1809,1813,1817,1821,1825,1829,1833,1837,1841,1845,1849,1853,1857,1861,1865,1869,1873,1877,1881,1885,1889,1893,1897,1901,1905,1909,1913,1917,1921,1925,1929,1933,1937,1941,1945,1949,1953,1957,1961,1965,1969,1973,1977,1981,1985,1989,1993,1997</partitions>
  </server>
  <server>
    <id>2</id>
    <host>****</host>
    <http-port>8081</http-port>
    <socket-port>6666</socket-port>
    <admin-port>6667</admin-port>
    <partitions>2,6,10,14,18,22,26,30,34,38,42,46,50,54,58,62,66,70,74,78,82,86,90,94,98,102,106,110,114,118,122,126,130,134,138,142,146,150,154,158,162,166,170,174,178,182,186,190,194,198,202,206,210,214,218,222,226,230,234,238,242,246,250,254,258,262,266,270,274,278,282,286,290,294,298,302,306,310,314,318,322,326,330,334,338,342,346,350,354,358,362,366,370,374,378,382,386,390,394,398,402,406,410,414,418,422,426,430,434,438,442,446,450,454,458,462,466,470,474,478,482,486,490,494,498,502,506,510,514,518,522,526,530,534,538,542,546,550,554,558,562,566,570,574,578,582,586,590,594,598,602,606,610,614,618,622,626,630,634,638,642,646,650,654,658,662,666,670,674,678,682,686,690,694,698,702,706,710,714,718,722,726,730,734,738,742,746,750,754,758,762,766,770,774,778,782,786,790,794,798,802,806,810,814,818,822,826,830,834,838,842,846,850,854,858,862,866,870,874,878,882,886,890,894,898,902,906,910,914,918,922,926,930,934,938,942,946,950,954,958,962,966,970,974,978,982,986,990,994,998,1002,1006,1010,1014,1018,1022,1026,1030,1034,1038,1042,1046,1050,1054,1058,1062,1066,1070,1074,1078,1082,1086,1090,1094,1098,1102,1106,1110,1114,1118,1122,1126,1130,1134,1138,1142,1146,1150,1154,1158,1162,1166,1170,1174,1178,1182,1186,1190,1194,1198,1202,1206,1210,1214,1218,1222,1226,1230,1234,1238,1242,1246,1250,1254,1258,1262,1266,1270,1274,1278,1282,1286,1290,1294,1298,1302,1306,1310,1314,1318,1322,1326,1330,1334,1338,1342,1346,1350,1354,1358,1362,1366,1370,1374,1378,1382,1386,1390,1394,1398,1402,1406,1410,1414,1418,1422,1426,1430,1434,1438,1442,1446,1450,1454,1458,1462,1466,1470,1474,1478,1482,1486,1490,1494,1498,1502,1506,1510,1514,1518,1522,1526,1530,1534,1538,1542,1546,1550,1554,1558,1562,1566,1570,1574,1578,1582,1586,1590,1594,1598,1602,1606,1610,1614,1618,1622,1626,1630,1634,1638,1642,1646,1650,1654,1658,1662,1666,1670,1674,1678,1682,1686,1690,1694,1698,1702,1706,1710,1714,1718,1722,1726,1730,1734,1738,1742,1746,1750,1754,1758,1762,1766,1770,1774,1778,1782,1786,1790,1794,1798,1802,1806,1810,1814,1818,1822,1826,1830,1834,1838,1842,1846,1850,1854,1858,1862,1866,1870,1874,1878,1882,1886,1890,1894,1898,1902,1906,1910,1914,1918,1922,1926,1930,1934,1938,1942,1946,1950,1954,1958,1962,1966,1970,1974,1978,1982,1986,1990,1994,1998</partitions>
  </server>
  <server>
    <id>3</id>
    <host>10.1.40.?</host>
    <http-port>8081</http-port>
    <socket-port>6666</socket-port>
    <admin-port>6667</admin-port>
    <partitions>3,7,11,15,19,23,27,31,35,39,43,47,51,55,59,63,67,71,75,79,83,87,91,95,99,103,107,111,115,119,123,127,131,135,139,143,147,151,155,159,163,167,171,175,179,183,187,191,195,199,203,207,211,215,219,223,227,231,235,239,243,247,251,255,259,263,267,271,275,279,283,287,291,295,299,303,307,311,315,319,323,327,331,335,339,343,347,351,355,359,363,367,371,375,379,383,387,391,395,399,403,407,411,415,419,423,427,431,435,439,443,447,451,455,459,463,467,471,475,479,483,487,491,495,499,503,507,511,515,519,523,527,531,535,539,543,547,551,555,559,563,567,571,575,579,583,587,591,595,599,603,607,611,615,619,623,627,631,635,639,643,647,651,655,659,663,667,671,675,679,683,687,691,695,699,703,707,711,715,719,723,727,731,735,739,743,747,751,755,759,763,767,771,775,779,783,787,791,795,799,803,807,811,815,819,823,827,831,835,839,843,847,851,855,859,863,867,871,875,879,883,887,891,895,899,903,907,911,915,919,923,927,931,935,939,943,947,951,955,959,963,967,971,975,979,983,987,991,995,999,1003,1007,1011,1015,1019,1023,1027,1031,1035,1039,1043,1047,1051,1055,1059,1063,1067,1071,1075,1079,1083,1087,1091,1095,1099,1103,1107,1111,1115,1119,1123,1127,1131,1135,1139,1143,1147,1151,1155,1159,1163,1167,1171,1175,1179,1183,1187,1191,1195,1199,1203,1207,1211,1215,1219,1223,1227,1231,1235,1239,1243,1247,1251,1255,1259,1263,1267,1271,1275,1279,1283,1287,1291,1295,1299,1303,1307,1311,1315,1319,1323,1327,1331,1335,1339,1343,1347,1351,1355,1359,1363,1367,1371,1375,1379,1383,1387,1391,1395,1399,1403,1407,1411,1415,1419,1423,1427,1431,1435,1439,1443,1447,1451,1455,1459,1463,1467,1471,1475,1479,1483,1487,1491,1495,1499,1503,1507,1511,1515,1519,1523,1527,1531,1535,1539,1543,1547,1551,1555,1559,1563,1567,1571,1575,1579,1583,1587,1591,1595,1599,1603,1607,1611,1615,1619,1623,1627,1631,1635,1639,1643,1647,1651,1655,1659,1663,1667,1671,1675,1679,1683,1687,1691,1695,1699,1703,1707,1711,1715,1719,1723,1727,1731,1735,1739,1743,1747,1751,1755,1759,1763,1767,1771,1775,1779,1783,1787,1791,1795,1799,1803,1807,1811,1815,1819,1823,1827,1831,1835,1839,1843,1847,1851,1855,1859,1863,1867,1871,1875,1879,1883,1887,1891,1895,1899,1903,1907,1911,1915,1919,1923,1927,1931,1935,1939,1943,1947,1951,1955,1959,1963,1967,1971,1975,1979,1983,1987,1991,1995</partitions>
  </server>
</cluster>

<stores>
  <store>
    <name>readonlyusers</name>
    <persistence>read-only</persistence>
    <description>Readonly store</description>
    <routing>client</routing>
    <replication-factor>2</replication-factor>
    <required-reads>1</required-reads>
    <required-writes>1</required-writes>
    <key-serializer>
      <type>string</type>
    </key-serializer>
    <value-serializer>
      <type>protobuf</type>
      <schema-info>java=com.audiencescience.data.protobuf.UserProfile$Profile</schema-info>
    </value-serializer>
    <retention-days>2</retention-days>
  </store>
</stores>

Brendan Harris (a.k.a. stotch on irc.oftc.net)

unread,
Jun 9, 2014, 3:02:38 PM6/9/14
to project-...@googlegroups.com
Hi Xiao,

Sorry for the delay, but I was out of town on vacation. Could you please also include your server.properties?

Thanks,

Brendan

Xiao Zhou

unread,
Jun 10, 2014, 1:07:57 PM6/10/14
to project-...@googlegroups.com
thanks,
# The ID of *this* particular cluster node
node.id={{ nodeid }}
max.threads=100
http.enable=true
socket.enable=true
jmx.enable=true
enable.readonly.engine=true
file.fetcher.class=voldemort.store.readonly.fetcher.HdfsFetcher
data.directory=/data/voldemort
hdfs.fetcher.tmp.dir=/data/tmp
readonly.hadoop.config.path=
readonly.keytab.path=

Brendan Harris (a.k.a. stotch on irc.oftc.net)

unread,
Jun 11, 2014, 12:49:28 PM6/11/14
to project-...@googlegroups.com
Hi Xiao,

This is our basic config for our read-only clusters, which support hundreds of stores and billions of keys:
admin.streams.buffer.size=1024
bdb.enable=false
data.directory=${VOLD_DATA_HOME}
enable.grandfather=false 
enable.nio.connector=true  
enable.readonly.engine=true
enable.repair=false
enable.server.routing=false 
enable.verbose.logging=false
file.fetcher.class=voldemort.store.readonly.fetcher.HdfsFetcher
hdfs.fetcher.buffer.size=16MB
hdfs.fetcher.tmp.dir=${VOLD_HOME}/voldemort
http.enable=false
jmx.enable=true
nio.connector.selectors=50
slop.enable=false
socket.buffer.size=65000
socket.enable=true
storage.configs=voldemort.store.readonly.ReadOnlyStorageConfiguration
voldemort.home=${VOLD_HOME}

It restricts the nodes to running _only_ the read-only storage engine, turns off the http request handler and the slop pusher, shrinks down the admin socket buffer significantly, increases the hdfs fetcher buffer for over WAN fetches (you might want to just remove that setting if you're not fetching over a WAN) and basically gets rid of a lot of useless features that read-only clusters cannot use.

Additionally, the data retention job does not work on read-only stores, so you should remove "<retention-days>2</retention-days>" from your store config (and restart your servers) to remove the DataRetentionJob from the voldemort scheduler.

But all of these changes will only free up on-heap memory (within your 4G JVM heap), not the off-heap memory. The off-heap memory is consumed by voldemort for the index files. You can estimate the amount of memory needed by adding up the on-disk size of all of your *.index files in all of your stores.

Other off-heap memory use by voldemort will be in the OS page cache for the data files, but that should not reflect in the memory stats as memory allocated to your voldemort process. That should show in vmstat as cached memory.

Our voldemort hosts that host read-only stores consume about 20 - 50 gigs of virtual memory, depending on the cluster. And each of our clusters host multiple (sometimes hundres of) stores with billions of keys. So, when you say that voldemort is taking up 78G of memory, can you paste to me the output from the command you're using to see that?

Thanks,

Brendan

Xiao Zhou

unread,
Jun 11, 2014, 8:08:07 PM6/11/14
to project-...@googlegroups.com
Yes, the memory we are using are about the size of all the index files. Is the anyway to limit the index we are loading?
  
If we do not say the retention days in the store config, who is going to do the clean job? The swap job? How can I tell it how many days data I should keep?

Thanks,

Brendan Harris (a.k.a. stotch on irc.oftc.net)

unread,
Jun 11, 2014, 9:23:09 PM6/11/14
to project-...@googlegroups.com


On Wednesday, June 11, 2014 5:08:07 PM UTC-7, Xiao Zhou wrote:
Yes, the memory we are using are about the size of all the index files. Is the anyway to limit the index we are loading?

Your key is probably a very large string. If you could compress it in your code in some way, that would probably shrink the indexes a lot. Otherwise there is no other way to shrink the index sizes that I am aware of.
 
  
If we do not say the retention days in the store config, who is going to do the clean job? The swap job? How can I tell it how many days data I should keep?

Your swap job will write a whole new data set and (by default) keep one historical copy and delete all of the older copies.

Brendan

Xiao Zhou

unread,
Jun 12, 2014, 5:56:21 PM6/12/14
to project-...@googlegroups.com
I remember from some document that voldemort md5 the original key to create the index file, is this still true? If is use md5 , the index file won't decrease even we use a simpler key.
Is it ok to remove the one historical copy after swap because of disk space limitation. 

Brendan Harris (a.k.a. stotch on irc.oftc.net)

unread,
Jun 12, 2014, 7:43:08 PM6/12/14
to project-...@googlegroups.com

On Thursday, June 12, 2014 2:56:21 PM UTC-7, Xiao Zhou wrote:
I remember from some document that voldemort md5 the original key to create the index file, is this still true? If is use md5 , the index file won't decrease even we use a simpler key.

Yes, you're right about that. And the keys are stored in the data files, anyway, which are not memlocked, so this would not by you anything regardless.
 
Is it ok to remove the one historical copy after swap because of disk space limitation. 

It is if you think you can push a new data set fast enough if the previously pushed data set is corrupt or has some bad data in it. But this will only free up disk space for you, not RAM. When you said "memory," I was thinking RAM. Were you talking about saving on used disk space or used RAM?

One thing I just remembered is that you can turn off memlocking of the indexes. Set "readonly.mlock.index" to "false" in your properties and then the memory that your indexes consume can be paged out in favor of other processes.

Brendan
Reply all
Reply to author
Forward
0 new messages