关于TrackerServer的疑问

D.Y Feng

unread,

Feb 26, 2014, 2:50:00 AM2/26/14

to dpark...@googlegroups.com

如果我们不用TrackerServer而用redis的话，性能应该更高吧？而且redis的key还能设置过期时间，可以清理一段时间之前的key。现在用TrackerServer内存在不断增长，虽然不多，但一天也能增长个100多M，如果我7×24跑的，那很难令人接受。

--

DY.Feng(叶毅锋)
yyfeng88625@twitter
Department of Applied Mathematics
Guangzhou University,China
dyf...@stu.gzhu.edu.cn

Davies Liu

unread,

Feb 26, 2014, 12:24:27 PM2/26/14

to dpark...@googlegroups.com

用redis的话，会有更多的问题，比如第三方依赖，如何管理redis等，而且也会有泄漏（不好确定如何设定过期时间）。

目前的简单办法，就是每天定时重启一下？

> --
> You received this message because you are subscribed to the Google Groups
> "DPark Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to dpark-users...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

--
- Davies

D.Y Feng

unread,

Mar 4, 2014, 7:54:07 PM3/4/14

to dpark...@googlegroups.com

最近我小改了下，把dependency里的rdd改为weakref，这样rdd就只有被DStream.generatedRDDs引用了（这点ReducedWindowedDStream要小改下），再定期清理下 DAGScheduler.shuffleToMapStage。这样基本上内存就稳定了，单元测试也通过了。大概思路是这样，纯属乱改。其实我对Stage、Shuffle这两个概念还是非常含糊不清，即使是spark对这个也没有说清，不知道有什么资料可以借鉴下。

Windreamer

unread,

Mar 4, 2014, 8:04:02 PM3/4/14

to dpark...@googlegroups.com

可以参考这篇spark的论文

http://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf

---
Windreamer

Reply all

Reply to author

Forward