Heap Sort in FP

liuxinyu

unread,

Sep 6, 2010, 5:27:22 AM9/6/10

to TopLanguage

Hi,

刚才有朋友在B-tree那个thread里问Heap Sort的FP实现能否达到O(lgN)。

我略微查了一下，Haskell的mail list里，在1997年的时候有人讨论过这个：
http://www.mail-archive.com/has...@haskell.org/msg01788.html

有人还给出了对照的merge sort:
http://www.mail-archive.com/has...@haskell.org/msg01816.html

但是Chris Okasaki指出，实际上Jon用的不是Binary Heap，而是FP中使用的multi-pass binary
heap。
在Chris Okasaki的博士论文和他的书中，有一系列对Heap的FP实现，包括Leftist Heap, Binomial Heap,
Pairing Heap等等。

在标准的Haskell库中Hackage给出的heapsort正是采用了pairing heap：
http://hackage.haskell.org/packages/archive/fgl/5.4.2.2/doc/html/src/Data-Graph-Inductive-Internal-Heap.html#heapsort

我先占个位置，然后有时间的话，会给出详细的介绍。

--
刘
https://sites.google.com/site/algoxy/home

liuxinyu

unread,

Sep 6, 2010, 5:34:24 AM9/6/10

to TopLanguage

Hi,

1986年时，前辈们的论文:
http://www.cs.cmu.edu/~sleator/papers/pairing-heaps.pdf

--
liu

On Sep 6, 5:27 pm, liuxinyu <liuxiny...@gmail.com> wrote:
> Hi,
>
> 刚才有朋友在B-tree那个thread里问Heap Sort的FP实现能否达到O(lgN)。
>
> 我略微查了一下，Haskell的mail list里，在1997年的时候有人讨论过这个：

> http://www.mail-archive.com/hask...@haskell.org/msg01788.html
>
> 有人还给出了对照的merge sort:
> http://www.mail-archive.com/hask...@haskell.org/msg01816.html

>
> 但是Chris Okasaki指出，实际上Jon用的不是Binary Heap，而是FP中使用的multi-pass binary
> heap。
> 在Chris Okasaki的博士论文和他的书中，有一系列对Heap的FP实现，包括Leftist Heap, Binomial Heap,
> Pairing Heap等等。
>

> 在标准的Haskell库中Hackage给出的heapsort正是采用了pairing heap：http://hackage.haskell.org/packages/archive/fgl/5.4.2.2/doc/html/src/...
>
> 我先占个位置，然后有时间的话，会给出详细的介绍。
>
> --
> 刘https://sites.google.com/site/algoxy/home

Doyle

unread,

Sep 6, 2010, 10:59:44 PM9/6/10

to pon...@googlegroups.com

看看代码,等详细介绍

2010/9/6 liuxinyu <liuxi...@gmail.com>

jigsaw

unread,

Sep 17, 2010, 7:03:01 AM9/17/10

to TopLanguage

谢谢楼主推荐。
这算法漂亮的地方有2点：1 为纯FP而生的，用纯fp写起来非常自然 2 除了delete min，其它操作都是O(1)
但我用c写了一遍，发现效率比最简单的数组实现的binary heap低。
排除了malloc的干扰之后，gprof了一下，瓶颈在于：
1 每次insert时，node里2或3个指针赋值很费时间。而数组实现，尽管是logn的插入复杂度，但是每次swap的代价很小
2 每次delete min时，剪切节点时对每个节点有2或3个指针的赋值。

看起来是代价很小的操作，改变几个指针而已，却成为了瓶颈。看来数组真的是很猛，可以填平logn和1的差距。

我的实现里，每个node用了3个指针，而论文提出的可以用2个。论文已经说了，用2个指针是时间换空间，会给复杂度带来一个constant
factor。但不知道在实际运行中，会不会因为减少了给指针赋值的次数而反而带来好处。

更进一步的想法是，在paring步骤时，可以并发的操作，而paring正是gprof中显示最重的一个操作。如果能利用这个特性，也许可以带来显著
的加速度。但如果没有编译器/语言/甚至硬件的帮助，实现起来。。。坦率的说我不知道具体应该怎么实现。

不知道楼主有没有兴趣继续这个话题

jigsaw

unread,

Sep 17, 2010, 5:24:51 PM9/17/10

to TopLanguage

微调了下pair和merge部分，速度超过了binary heap；跟nginx的rbtree比较了一下，插入100万/删除100万随机数据，比rbtree快~50%。

gporf显示paring heap的瓶颈还是pair部分；而rbtree则是insert。这个跟纸上谈兵的预测是一致的。这个插入O(1)的本事的确是很少见。

haproxy搞了个很畸形的elastic tree（记得是这个名字），不仅算法畸形，而且wiki上的词条更畸形，无爱。

如果要打并发的主意，就是每次循环做N对pair，开N个县城。用pthread很不方便，openmp也许简单些，没有细想，因为这些都是user space的东西，对我来说没用。

如果直接用汇编写的话，可能可以偷到一些性能，但是编程的代价大了点，目前看不出有那个必要。

这类不依赖数组的树，有个共同的问题，就是依赖于allocator或者gc的性能。

allocator的问题是对多核不友好。目前kernel的slab对多核不友好。solaris这方面是有成熟例子的，但不知道为什么这么多年了都没有影响到linux kernel。这方面奥利弗-杨可能有话要说？

我发现kernel的实现都很保守，稍微先进点的东西在kernel里是找不到例子的。不说别的，就2.6的tcp/ip层就灰常的不被待见。高性能的tcp/ip层要么是基于2.4，要么是基于bsd。

最后，paring heap有些升级版，但是不打算研究了。

2010/9/17 jigsaw <jig...@gmail.com>

oliver yang

unread,

Sep 18, 2010, 9:28:36 AM9/18/10

to pon...@googlegroups.com

在 2010年9月18日上午5:24，jigsaw <jig...@gmail.com> 写道：
> 微调了下pair和merge部分，速度超过了binary
> heap；跟nginx的rbtree比较了一下，插入100万/删除100万随机数据，比rbtree快~50%。
> gporf显示paring
> heap的瓶颈还是pair部分；而rbtree则是insert。这个跟纸上谈兵的预测是一致的。这个插入O(1)的本事的确是很少见。
> haproxy搞了个很畸形的elastic tree（记得是这个名字），不仅算法畸形，而且wiki上的词条更畸形，无爱。
> 如果要打并发的主意，就是每次循环做N对pair，开N个县城。用pthread很不方便，openmp也许简单些，没有细想，因为这些都是user
> space的东西，对我来说没用。
> 如果直接用汇编写的话，可能可以偷到一些性能，但是编程的代价大了点，目前看不出有那个必要。
> 这类不依赖数组的树，有个共同的问题，就是依赖于allocator或者gc的性能。
> allocator的问题是对多核不友好。目前kernel的slab对多核不友好。solaris这方面是有成熟例子的，但不知道为什么这么多年了都没有影响到linux
> kernel。这方面奥利弗-杨可能有话要说？

Linux内核已经很久没有研究，不清楚内存管理方面现在的动向。

Solaris为了防止多核的在内存分配上的竞争，实现了一个per-CPU的cache, 叫做magazine layer.

有一个paper里面讲在实现slab以后如何去解决多CPU的全局锁竞争问题的，作者之一就是ZFS和Slab的创造者：Jeff Bonwick
and Jonathan Adams - Magazines and vmem: Extending the Slab Allocator
to Many CPUs and Arbitrary Resources

--
Cheers,

Oliver Yang

Twitter: http://twitter.com/yangoliver
Blog: http://blog.csdn.net/yayong
--------------------------------------------------------------------
An OpenSolaris Developer

jigsaw

unread,

Sep 18, 2010, 9:42:36 AM9/18/10

to pon...@googlegroups.com

嗯，我说的就是这个magazine。

不过刚看了下，linux kernel里也是这个设计。

2010/9/18 oliver yang <yango...@gmail.com>

oliver yang

unread,

Sep 18, 2010, 9:44:55 AM9/18/10

to pon...@googlegroups.com

在 2010年9月18日下午9:42，jigsaw <jig...@gmail.com> 写道：
> 嗯，我说的就是这个magazine。
> 不过刚看了下，linux kernel里也是这个设计。

呵呵，这个想法很简单，也不难实现。

所以说Solaris对Linux的贡献很大的。

jigsaw

unread,

Sep 18, 2010, 9:50:54 AM9/18/10

to pon...@googlegroups.com

solaris跟bsd的代码风格很像，有学院风，注释很漂亮，合我的口味。linux的山寨气比较重，代码的目录结构也比较乱，不是很喜欢。但linux是主流，没办法。

厄。。。跑题了

liuxinyu

unread,

Sep 19, 2010, 2:06:25 AM9/19/10

to TopLanguage

Hi,

强调一下，pairing heap的deleteMin的复杂度只是一个猜测，至今没有得到严格证明。
其实大多数叫客户对heap的定义容易让读者认为，只有``算法导论''那个以数组形式表现的数据结构才是heap。
STL中的heap, python中的heapq都是这样实现的。

然而广义的heap定义不是这样的，这种数据结构严格来讲叫做 implicit heap by array。
Knuth在TAOCP中，给出了leftist heap，我觉得把这几种heap结合起来看，就会有更全面的理解：
1. implicit heap by array (CLRS中介绍的heap)
2. binomial heap (CLRS中介绍的另一种heap)
3. leftist heap (TAOCP中介绍的heap，可以用FP实现)
4. Fibonacci heap (CLRS中有介绍，理论意义大于实际意义）
5. Pairing heap (理论证明尚不完善，绝大多数FP实现使用这种heap)

我现在越来越忙，感觉实在没法向过去那样不断post东西出来。只能尽力而已了。

--
刘

jigsaw

unread,

Sep 23, 2010, 6:41:43 PM9/23/10

to pon...@googlegroups.com

md好久没写scheme，手生了。。。

Scheme:

(define (make-root x)

(cons x '()))

(define (find-min root)

(car root))

(define (meld n m)

(cond ((null? n) m)

((null? m) n)

((if (> (find-min n) (find-min m))

(append (make-root (find-min m))

(list n)

(cdr m))

(append (make-root (find-min n))

(list m)

(cdr n))))))

(define (insert root x)

(meld root (make-root x)))

(define (delete-min root)

(define (delete-min-iter result root)

(cond ((null? root) result)

((null? (cdr root)) (meld result (car root)))

((let ((m (car root))

(n (cadr root)))

(delete-min-iter (meld m n) (cddr root))))))

(delete-min-iter '() (cdr root)))

;(define xxx (insert (insert (insert (insert (make-root 10) 9) 12) 7) 19))

;(delete-min xxx)

;(delete-min (delete-min (delete-min (delete-min (delete-min xxx)))))

2010/9/19 liuxinyu <liuxi...@gmail.com>:

liuxinyu

unread,

Sep 25, 2010, 5:22:53 AM9/25/10

to TopLanguage

Hi,

今天抽出点时间。

我们慢慢来，先复习下算法导论（CLRS）中的binary heap。
导论课本中给出的binary heap实际是recursive实现的。如果大家翻看STL的heap或者python的heapq，就会发现实现上
略有差别。尤其是heap sort的实现。

正如在前面post中提到的，我们打算把各种heap统一对比来看。所以这里给出一个区别于CLRS的heap定义（广义定义）：
一个heap（堆）是这样一个数据结构，
* 它有一个顶，保存着最小（大）值；
* 它可以支持弹出堆顶元素的操作，弹出后的堆仍然保持的堆的特性，也就是堆顶元素最小（大）；
* 它支持插入操作，插入一个新值后，仍然保持堆的特性，也就是堆顶元素最小（大）；

还有其他一些可选操作，如：
* 可以把两个堆合并(merge)成一个堆，合并后，仍然保持堆的特性，也就是堆顶元素最小（大）

根据这个定义，我们发现可以用多种更基本的数据结构来实现堆，只要它满足上述的定义即可。其中最简单直接的一种解法是树，只要保持树根永远最小（大）就
可以了。
注意，这里我没有说一定是二叉树，凡是用二叉树实现的heap，称之为binary heap。

CLRS中第6章给出的binary heap，实际上严格讲应该叫做implicit binary heap by array，也就是用
array实现的隐式heap。
其本质思想是把array的index对应到二叉树的节点上的一个map，这个map用c++和python这样以0作为数组起始index的语言来
说，可以定义如下（这里用python)
def parent(i):
return (i+1)//2-1

def left(i):
return 2*i+1

def right(i):
return 2*(i+1)
注意，这里的定义和CLRS的有区别，原因是CLRS的数组从1开始。//表示取整除法。当然，在实际代码中，为了增加速度，可以使用移位运算符>>或
<<，如C++/STL和python
的heapq都是如此。

这一map定义了如何从第i个数组元素，找到其对应的二叉树上的父节点，左子节点，和右子节点。
举个例子，数组[16, 4, 10, 14, 7, 9, 3, 2, 8, 1]对应了二叉树：
((((2 14 8)) 4 ((1 7 .))) 16 (9 10 3)) 这里的二叉树使用in-order来表示。

为了generic，我抽象了两个比较运算：
MIN_HEAP = lambda a, b: a < b
MAX_HEAP = lambda a, b: a > b
也就是小于运算用于min-heap(小顶堆)，大于运算用于max-heap（大顶堆）

有了这个隐式数组表示，就可以实现最核心的heapify函数了，这个算法用递归的表达非常直观：
如果数组中第i个元素对应到某个子树的树根节点，那么这个子树如果满足heap的性质，则树根节点必须子树中所有元素中最小（大）的。
为此：
平凡情况：若此节点是一个叶子节点，退出；
一般情况：检查此节点的左右子节点，找出最小（大）的一个，和此节点交换，然后递归heapify交换后的子树。

CLRS上给出的MAX-HEAPIFY算法就是这个递归算法的 pseudo code 的描述，可以把它改为纯imperative实现，消除这个
递归，如下：

def heapify(x, i):
n = len(x)
while True:
l = left(i)
r = right(i)
largest = i
if l < n and x[l] < x[i]:
largest = l
if r < n and x[r] < x[largest]:
largest = r
if largest != i:
(x[i], x[largest])=(x[largest], x[i])
i = largest
else:
break

注意，这里用的是小顶堆。为了通用，我们可以引入less_p，就是把用来判断大小的关系运算抽象出来：
# min-heapify by default
def heapify(x, i, less_p = MIN_HEAP):
n = len(x)
while True:
l = left(i)
r = right(i)
largest = i
if l < n and less_p(x[l], x[i]):
largest = l
if r < n and less_p(x[r], x[largest]):
largest = r
if largest != i:
(x[i], x[largest])=(x[largest], x[i])
i = largest
else:
break

为了验证其正确性，我们可以使用CLRS的figure 6.2来测试下：
class TestHeap:
def __init__(self):
print "Implicit binary heap by array testing"

def run(self):
self.test_heapify()

def __assert(self, p):
if p:
print "OK"
else:
print "Fail!"

def test_heapify(self):
# CLRS Figure 6.2
l = [16, 4, 10, 14, 7, 9, 3, 2, 8, 1]
heapify(l, 1, MAX_HEAP)
print l
self.__assert(l == [16, 14, 10, 8, 7, 9, 3, 2, 4, 1])

if __name__ == "__main__":
TestHeap().run()

运行的话，此程序输出：
Implicit binary heap by array testing
[16, 14, 10, 8, 7, 9, 3, 2, 4, 1]
OK

未完，待续
--
刘
https://sites.google.com/site/algoxy/home

liuxinyu

unread,

Sep 25, 2010, 5:39:40 AM9/25/10

to TopLanguage

Hi,

继续，有了最核心的heapify，把一个无序的array，转变成implicit heap by array就非常容易了。
我们注意，由于是binary heap，所以树的每层的节点数目是：
1, 2, 4, 8, ... 2^(i-1)

对任何叶子节点，做heapify不会有任何作用个，所以我们可以忽略全部叶子节点。
为此，针对长度为n的数组，我们只要从第2/n个元素，到第1个元素，依次做一遍heapify就可以了：

# build min heap by default
def build_heap(x, less_p = MIN_HEAP):
n = len(x)
for i in reversed(range(n//2)):
heapify(x, i, less_p)

现在我们来verify一下，case选用CLRS的figure 6.3：
def test_build_heap(self):
# CLRS Figure 6.3
l = [4, 1, 3, 2, 16, 9, 10, 14, 8, 7]
build_heap(l, MAX_HEAP)

print l
self.__assert(l == [16, 14, 10, 8, 7, 9, 3, 2, 4, 1])

运行输出：

Implicit binary heap by array testing
[16, 14, 10, 8, 7, 9, 3, 2, 4, 1]
OK
[16, 14, 10, 8, 7, 9, 3, 2, 4, 1]
OK

下面我们实现heap的另外一个重要操作delete min。也就是弹出堆顶元素。
这个操作看似简单，只要返回，并移除数组的首元素即可，但是为了弹出后保持堆的性质，必须对剩余元素做一次heapify
# default apply to min-heap
def heap_pop(x, less_p = MIN_HEAP):
top = x.pop(0)
if x!=[]:
heapify(x, 0, less_p)
return top

有了这些铺垫，就可以利用heap来实现排序了。
CLRS上的思路是，建立大顶堆，然后每次把堆顶元素交换到数组末尾。这一思路非常好。
但是，为了统一覆盖所有不同底层数据结构的各种堆，我们把堆排序策略变一下（当然效率没有CLRS上的那个快）。
1, 建立空小顶堆，将待排序元素全部放到（例如使用insert）堆上；
2，弹出堆顶元素（也就是执行一次delete min），放到结果中；
3，重复步骤2直到堆为空。

所以，结果序列中依次被放入了，最小，次小，次次小....的全部元素，从而实现了排序。
python中的heapq的测试程序中采用了类似的方法。
# default heap sort less to greater
def heap_sort(x, less_p = MIN_HEAP):
res = []
build_heap(x, less_p)
while x!=[]:
res.append(heap_pop(x, less_p))
return res

我们看一下测试代码和结果：
def test_heap_sort(self):
# CLRS Figure 6.4
l = [16, 14, 10, 8, 7, 9, 3, 2, 4, 1]
res = heap_sort(l)
print res
self.__assert(res == [1, 2, 3, 4, 7, 8, 9, 10, 14, 16])

运行结果为：
[1, 2, 3, 4, 7, 8, 9, 10, 14, 16]
OK

未完待续

--
刘
https://sites.google.com/site/algoxy/home

liuxinyu

unread,

Sep 25, 2010, 5:52:29 AM9/25/10

to TopLanguage

Hi,

针对排序这种情况，我们当然可以利用build_heap，一次性把堆建好，可是针对实际中的更多情况。
一个堆通常允许添加新的元素，或者改变其中某些元素的值。

我们知道堆是实现priority queue的一个手段直接，一个带优先级的队列的应用场景包括作业调度。
我们可以把一个优先级很高的作业插入队列中，或者提高一个作业的优先级，是的下次schedule的时候，
某个作业有限得到执行的机会。

为此我们需要实现堆的插入操作：
# insert a key to min-heap by default
def heap_insert(x, key, less_p = MIN_HEAP):
i = len(x)
x.append(key)
heap_fix(x, i, less_p)

def heap_fix(x, i, less_p = MIN_HEAP):
while i>0 and less_p(x[i],x[parent(i)]):
(x[parent(i)], x[i]) = (x[i], x[parent(i)])
i = parent(i)

采用array实现堆时，这个算法的思路是，把新的元素加到最尾，于是其变为一个叶子节点。
然后我们自底向上（从叶子向根）回溯，发现其不满足小顶堆的性质，就交换其父节点和当前节点。
这一过程很形象的像是一个物体从水底浮上来的过程。

注意，这个思路和CLRS的略有不同（本质是一致的），CLRS插入一个负无穷节点，然后调用increase key来做。
其实increase key,对于小顶堆来说是decrease key，它可以用类似的向上浮动的算法实现如下：
def heap_decrease_key(x, i, key, less_p = MIN_HEAP):
if less_p(key, x[i]):
x[i] = key
heap_fix(x, i, less_p)

最后，我给一个利用堆求top k最大元素的算法：
def top_k(x, k, less_p = MIN_HEAP):
build_heap(x, less_p)
return [heap_pop(x, less_p) for i in range(min(k, len(x)))]

imperative的先到这里，我后面会给出纯FP的heap，包括
binomial heap, leftist heap, 和pairing heap。
不过时间不保证，各位可能要多等一阵。

--
刘
https://sites.google.com/site/algoxy/home

On Sep 25, 5:39 pm, liuxinyu <liuxiny...@gmail.com> wrote:
> Hi,
>

liuxinyu

unread,

Sep 26, 2010, 4:20:35 AM9/26/10

to TopLanguage

Hi,

今天我们进入Leftist Heap的世界。
翻看一下Rossetta code中heap sort的各种语言实现：
http://rosettacode.org/wiki/Sorting_algorithms/Heapsort
除了Haskell外，全部其他语言都是用了Binary Heap，可见其影响之大。
R.W. Floyd给出的这个binary heap的sort方法，将性能提高了25%。
http://en.wikipedia.org/wiki/Heapsort

正因如此，在实现一个真正的FP的heap时，在列表的随机访问的复杂度退化成O(N)时，就不得不使得我们思考，是否存在其他数据结构？
基于我昨天给出的广义heap定义，不使用implicit heap by array这个思路，树就是我们非常自然想到的一个底层数据结构。

Leftist Heap是使用Leftist tree来实现的heap. Leftist tree由C. A. Crane在1972年给出，
Knuth在其TAOCP中有所介绍。
我这里先给出一些网络上能找到的资料：
WIKI:
http://en.wikipedia.org/wiki/Leftist_tree
on line book:
http://www.brpreiss.com/books/opus4/html/page362.html#SECTION0012300000000000000000

liuxinyu

unread,

Sep 28, 2010, 5:10:33 AM9/28/10

to TopLanguage

Hi,

今天继续Leftist Heap

既然我们可以用array来implicit表示二叉树，从而实现Heap，那么可不可以直接用显示的（explicit）的二叉树来实现Heap
呢？
答案是可以的，但是有几个难题必须得解决。

第一个难题是heap pop，也就是delete Min操作。考虑下面的显示二叉树：

(left value right)

其中value小于所有left中节点上的值，且value小于所有right中节点上的值。

那么heap pop后，value被弹出去了，剩下了两个子树left和right，我们需要merge left和right从而得到一个新的二叉
树，且其根节点的值为最小值，以满足heap的性质。

如何merge left和right呢？我们知道left和right也是用显示二叉树表示的heap，有两个平凡解可以立刻的出来：
merge NIL right = right
merge left NIL = left
难点是，如果left和right都不为空怎么merge。
由于left和right都满足heap的性质，故而left和right的顶部的值分别是各自的最小值。我们只要比较这两个值，找一个更小的作为新的
顶部，然后把较大值所在的树的merge到一个子树中即可。举个例子，假设left = (a x b); right = (a' y b')，其中
a, a', b, b'分别为子树，如果x<y我们可以有两个备选方案：
方案一：
(merge(a, right), x b)
方案二：
(a, x (merge b, right))
两个方案都可以，为了简单，我们一律在右侧子树上做进一步的merge。但是这个简单方案有个缺点：就是树可能变得极不平衡。如果树非常容易退化成瘸腿
的链表，
那么复杂度就从O(lg N)退化成O(N)了。

为了防止这种情况，必须想出一个保护树平衡的办法，Leftist tree提出了这样的一个思路。首先我们需要为每个节点定义一个rank，rank
的含义是，这个节点到最近的一个叶子节点的距离。例如树：
((6, 5, .), 4, 8)
其中根节点4到最近的叶子节点8的的距离是2（我们把NIL定义为叶子节点）。节点6和8都是叶子节点，所以其rank是1，节点5虽然有左侧子树，但
是其右侧为NIL，所以其道最近的叶子节点的距离仍然是1，故而rank为1.

有了rank，就可以定义这样的一个策略:
1: 每次我们都merge到右侧;得到一个新的右侧子树，其rank为RR
2：比较左右子树的rank, 若左侧子树的rank为RL，比较RL, RR，若RL<RR，则交换左右子树。

为什么这个策略能够使得树逐渐平衡呢？这是因为我们一般把新的树merge到右侧，于是右侧的rank逐渐增长，当期超过左侧的rank时，再向右侧增
加就不平衡了，于是我们交换左右。这样长期下来，就能够保证左右平衡。

现在我们来看看Haskell下这个merge是怎么实现的。

首先我们给出Leftist Heap的定义：
data LHeap a = E -- Empty
| Node Int a (LHeap a) (LHeap a) -- rank, element, left,
right
deriving (Eq, Show)

然后我们定义rank函数如下：
rank::LHeap a -> Int
rank E = 0
rank (Node r _ _ _) = r

接下来就可以定义merge了：
merge::(Ord a)=>LHeap a -> LHeap a -> LHeap a
merge E h = h
merge h E = h
merge h1@(Node _ x l r) h2@(Node _ y l' r') =
if x < y then makeNode x l (merge r h2)
else makeNode y l' (merge h1 r')

其中makeNode比较左右的rank,并可能交换左右子树：
makeNode::a -> LHeap a -> LHeap a -> LHeap a
makeNode x a b = if rank a < rank b then Node (rank a + 1) x b a
else Node (rank b + 1) x a b

略微解释下为何rank要+1，
若左右子树left, right中较小的rank为R，则新树(left, x, right)的rank，由于在顶部增加了一个x，故而其rank
必然为R+1。

有了merge，deleteMin就可以轻易实现了：
deleteMin :: (Ord a) => LHeap a -> LHeap a
deleteMin (Node _ _ l r) = merge l r

接下来的难题由于有了merge迎刃而解。
难题二，在implicity heap by array的insert中，我们把新节点加入到叶子，然后逐渐向上比较，浮动。但是显示二叉树不遍历
的话
就无法找出最后一个叶子的位置。解决方法是用merge：

insert::(Ord a)=> LHeap a -> a -> LHeap a
insert h x = merge (Node 1 x E E) h

当向heap中增加一个新值时，我们首先利用这个新值，建立一个只有一个节点的树，然后把它merge到heap中。

为了方便，我定义了一个fromList函数，可以把一个list变成heap:
fromList :: (Ord a) => [a] -> LHeap a
fromList xs = foldl insert E xs

原理就是一次把list中的每个元素加入到heap中。

接下来就可以实现heap sort了。
heapSort :: (Ord a) => [a] -> [a]
heapSort = hsort . fromList where
hsort E = []
hsort h = (findMin h):(hsort $ deleteMin h)

我们把带排序序列变成heap，然后每次弹出最小的到结果即可。

测试一下：
testHeapSort = heapSort [16, 14, 10, 8, 7, 9, 3, 2, 4, 1]

输出：
*LeftistHeap> testHeapSort
[1,2,3,4,7,8,9,10,14,16]

我估计我是头一个以显示二叉树实现Heap的角度介绍Leftist Heap的。有些猜测（例如平衡策略）可能想当然了。欢迎大家指正。

先到这里，后面我介绍binomial heap。
--
刘
https://sites.google.com/site/algoxy/home

> on line book:http://www.brpreiss.com/books/opus4/html/page362.html#SECTION00123000...
>
> 未完待续
>
> --
> 刘https://sites.google.com/site/algoxy/home

机械唯物主义 : linjunhalida

unread,

Sep 28, 2010, 5:12:54 AM9/28/10

to pon...@googlegroups.com

有没有haskell-cn群组呢? 记得有erlang群组.

感觉有很多有意思的问题可以以haskell的方式来交流.

liuxinyu

unread,

Sep 29, 2010, 6:06:35 AM9/29/10

to TopLanguage

刚才google搜了下：
http://groups.google.com/group/haskellcn

我没用过，不知道其质量如何。

On Sep 28, 5:12 pm, 机械唯物主义 : linjunhalida <linjunhal...@gmail.com>
wrote:
> 有没有haskell-cn群组呢? 记得有erlang群组.
> 感觉有很多有意思的问题可以以haskell的方式来交流.

liuxinyu

unread,

Sep 29, 2010, 9:41:44 PM9/29/10

to TopLanguage

Hi,

今天继续介绍其他Heap。

一个自然的问题是，既然已经可以用数组实现隐式的binary heap，也可以用显示的特殊二叉树Leftist tree实现binary
heap。并且这两种heap sort的算法复杂度都达到了O(N*lg N)，为啥还要搞其他heap？

这是因为heap的用途，不仅仅在于heap sort，heap sort以及寻找N-largest(smallest)仅仅以heap的应用之
一。heap可以用来实现priority queue，更大的应用在于graph中。所以我们希望能够尽量提高一些heap常见操作的性能

我们来观察一下binary heap基本操作中一些算法复杂度：

merge:
implicit array: O(N)
leftist heap: O(lg N)

insert:
implicit array: O(lg N)
leftist heap: O(lg N)

min:
implicit array: O(1)
leftist heap: O(1)

delete min(pop)
implicit array: O(lg N)
leftist heap: O(lg N)

为了，很多人进行了多年的努力，目前最好的结果是，除了delete min是O(lg N)外，其余操作都可以达到O(1)。这是一个巨大的成就。
因此了解这个成就背后的故事，是个很有意义的事情。

先到这里，回头再写。

--
刘
https://sites.google.com/site/algoxy/home

On Sep 28, 5:10 pm, liuxinyu <liuxiny...@gmail.com> wrote:
> Hi,
>

liuxinyu

unread,

Sep 29, 2010, 10:36:38 PM9/29/10

to TopLanguage

Hi,

刚才仔细寻找了一下Leftist Tree的软肋。

几乎和Quick Sort一样，面对有序序列，Leftist Tree迅速退化成了瘸腿链表。例如：

*LeftistHeap> fromList [16, 10, 14, 8, 9, 7, 2, 3, 1, 4]
Node 2 1 (Node 2 2 (Node 1 7 (Node 2 8 (Node 2 10 (Node 1 16 E E)
(Node 1 14 E E)) (Node 1 9 E E)) E) (Node 1 3 E E)) (Node 1 4 E E)

看起来很平衡

*LeftistHeap> fromList [16, 14, 10, 8, 7, 9, 3, 2, 4, 1]
输出：
Node 1 1 (Node 2 2 (Node 1 3 (Node 2 7 (Node 1 8 (Node 1 10 (Node 1 14
(Node 1 16 E E) E) E) E) (Node 1 9 E E)) E) (Node 1 4 E E)) E) E)
(Node 1 3 E E)) (Node 1 4 E E)
结果已经非常接近瘸腿链表了

再看极端情况：

*LeftistHeap> fromList [1..10]
Node 3 1 (Node 2 3 (Node 1 4 E E) (Node 1 5 E E)) (Node 2 2 (Node 2 7
(Node 1 8 E E) (Node 1 9 E E)) (Node 1 6 (Node 1 10 E E) E))

完全退化成链表了。

--
刘
https://sites.google.com/site/algoxy/home

liuxinyu

unread,

Sep 30, 2010, 3:40:39 AM9/30/10

to TopLanguage

Hi,

开始在Algoxy上写了。
https://sites.google.com/site/algoxy/bheap

源代码在GPL下开放。文章在FDL下开放。

liuxinyu

unread,

Oct 9, 2010, 2:13:26 AM10/9/10

to TopLanguage

Hi,

今天介绍Binomial Heap，翻译成中文是"二项式堆"。
在《算法导论》的第19章，有对Binomial Heap的介绍。这一章的最难点就是把两个Binomial Heap中的Binomail
Tree按照类似 merge sort的算法，
链接成一个link list，然后进行union操作，里面涉及4个复杂的case。

在Chris Okasaki的《Purely Functional data structures》中给出了用纯FP实现Binomial
heap的思路，对比CLRS中的imperative思路，这个
方法特别的简洁。

首先介绍Binomial Heap的概念：

Binomial Heap看起来比Leftist Heap和Implicit Binary heap by array都复杂，然而他却在
merge操作上非常占有优势，binary heap为了维持
Heap property, 通常merge的复杂度达到O(N)，而Binomial Heap可以有O(lg N)。

之所以叫二项式堆，是因为他和二项式定理的紧密联系，我们回忆一下著名的杨辉三角：
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
...

Binomial Heap是由一些列Binomail Tree组成的，Binomial Tree的定义是递归的，非常有趣：

一个Rank为N的Binomail Tree，是由两个Rank为N-1的Binomail Tree组成的，组成方法是把root值较小的一个作为
另一个tree的第一个子节点；
Rank为0的Binomail tree只有一个节点。

有了这个递归定义，就可以列出rank为0,1,2,...的binomail tree的形状了：

x, (x, x), (x, (x, x), x), (x, (x, (x, x), x), (x, x), x), ...

如果看这些x和括号不爽，可以参考CLRS的图19.2(b)或者http://en.wikipedia.org/wiki/
Binomial_heap中的图。

我们看binomail tree中任何一层的节点数恰好符合杨辉三角形。例如rank 为3的tree，root只有1个节点，第二层有3个节点，第
3层有3个节点，第4层有1个节点。
再隐身一步，任何rank为r的binomail tree，由于符合二项式定理的展开项，必然有2^r个节点。

一个Binomail Heap，是由若干Binomail tree，组成的列表，其中任何一个rank的tree最多只有一个。
这个有趣的定义，会引出一个有趣的结论：含有N个节点的Binomail Heap，如果把N表示成二进制形式a0,a1,a2, ... am，
则若ai为0，则没有rank为2^i的tree，若为1，则有。

例如含有5个元素的Binomial Heap，N=5,写成二进制是（LSB)101（MSB）则其含有2个Binomial tree，一个
rank为0，另一个rank为2

先到这里，待续
--
刘
https://sites.google.com/site/algoxy

liuxinyu

unread,

Oct 9, 2010, 3:06:54 AM10/9/10

to TopLanguage

Hi,

现在进入代码时间：

针对Binomail Tree的定义，我们给出其Haskell的定义代码：
data BiTree a = Node { rank :: Int
, root :: a
, children :: [BiTree a]} deriving (Eq, Show)

一个Binomail Tree包含3个部分，一个是rank，一个是root上的值（可比较），另一个是其子树列表。
并且规定，一个Binomail tree的root上的值，是所有元素中的最小（大）值。

而Binomail Heap则仅仅是Binomail Tree的一个列表：
type BiHeap a = [BiTree a]

这里有2个隐式限制条件：
1，这个列表中的tree的rank单调增
2, 任何rank只最多出现一次

如果标记rank为k的Binomial Tree为B_{k}，则递归定义中将两个B_{k}组成一个B_{k+1}的描述可用下面代码来表示：
-- Link 2 trees with SAME rank R to a new tree of rank R+1
link :: (Ord a) => BiTree a -> BiTree a -> BiTree a
link t1@(Node r x c1) t2@(Node _ y c2) =
if x<y then Node (r+1) x (t2:c1)
else Node (r+1) y (t1:c2)

我们比较两个树的根节点元素的大小，将较小的一个作为新的根，然后将较大的一个作为第一个儿子，并且将rank增加1.

有了link，并且由于Binmial Heap是rank单调增的一系列tree的列表组成的这个事实，
就可以实现把一个rank不小于Binomail Heap中第一个tree的新tree插入到其中的函数：
-- Insert a Binomial tree into a Binomial heap
-- Implicit condition: the rank of tree is either lower or equal to
the first
-- element tree in the heap.
insertTree :: (Ord a) => BiHeap a -> BiTree a -> BiHeap a
insertTree [] t = [t]
insertTree ts@(t':ts') t = if rank t < rank t' then t:ts
else insertTree ts' (link t t')

方法很简单，先判断，待插入的树的rank是否小于heap中的第一个tree，如果小，则直接添加到列表开头。
否则说明两个rank相等，我们用link生成一个rank增加1的新tree，然后递归插入到剩余列表中。
注意这个link出的新tree，依然满足rank < or =的隐式限制条件。这是由于link只能将rank增1，而列表中tree的rank都
是非负单调增的整数。

有了这个辅助函数insert就非常方便实现了：
insert :: (Ord a) => BiHeap a -> a -> BiHeap a
insert h x = insertTree h (Node 0 x [])

当试图把一个新值插入到heap中，我们只需要用这个新值建立一个rank为0的只有一个节点的树，然后将其插入到heap中即可。

现在来测试一下：

首先定义个辅助函数，将一个列表变成一个binomial heap。
fromList :: (Ord a) => [a] -> BiHeap a
fromList xs = foldl insert [] xs

fromList [16, 14, 10, 8, 7, 9, 3, 2, 4, 1]将输出：
[Node 1 1 [Node 0 4 []],Node 3 2 [Node 2 8 [Node 1 14 [Node 0 16
[]],Node 0 10 []],Node 1 7 [Node 0 9 []],Node 0 3 []]]

把它画在纸上，大致如下：

(1, 4) --> (2, (8, (14, 6), 10), (7, 9), 3)

其由两个tree组成，一个rank为1，另外一个rank为3.

merge的定义非常像merge sort:
merge:: (Ord a) => BiHeap a -> BiHeap a -> BiHeap a
merge ts1 [] = ts1
merge [] ts2 = ts2
merge ts1@(t1:ts1') ts2@(t2:ts2')
| rank t1 < rank t2 = t1:(merge ts1' ts2)
| rank t1 > rank t2 = t2:(merge ts1 ts2')
| otherwise = insertTree (merge ts1' ts2') (link t1 t2)

现在思考deleteMin，由于heap由一系列binomail tree组成，每个tree的根节点都是最小（大）值，但是各个tree之间的根
节点的大小并不确定。
所以我们只要把heap列表中所有tree的根节点比较一下，找出最小的一个就可以了。

接下来的问题就棘手了。假设某个heap是由如下的binomail tree组成： b0, b1, ..., bm，并且bp的根节点最小。那么
delete min后，
bp的根节点就被拿走了，bp剩下了一堆儿子，共有p个。他们恰好也都是Binomial tree，rank分别是p-1, p-2, ...
2, 1， 0。
由于我们有O(lg N)的merge，所以一个可行的解决方案就是，把剩下的这p个儿子颠倒变成另外一个heap，然后和其余的tree merge
到一起。

根据这个思路，我们首先实现一个找到含有最小根节点的binomial tree的函数：
removeMinTree :: (Ord a) => BiHeap a -> (BiTree a, BiHeap a)
removeMinTree [t] = (t, [])
removeMinTree (t:ts) = if root t < root t' then (t, ts)
else (t', t:ts')
where
(t', ts') = removeMinTree ts

这个函数把含有最小根节点的树和剩余的其他树的列表，以一个pair返回。

有了这个辅助函数，findMin就非常容易了：
findMin :: (Ord a) => BiHeap a -> a
findMin = root . fst. removeMinTree

delete的实现就是摘除最小根节点，然后儿子倒序merge回去：
deleteMin :: (Ord a) => BiHeap a -> BiHeap a
deleteMin h = merge (reverse $ children t) ts where
(t, ts) = removeMinTree h

这样一个binomial heap就基本完成了，作为副产品，我们测试一下heap sort，测试代码和Leftist Heap的一样：

heapSort :: (Ord a) => [a] -> [a]
heapSort = hsort . fromList where

hsort [] = []

hsort h = (findMin h):(hsort $ deleteMin h)

heapSort [16, 14, 10, 8, 7, 9, 3, 2, 4, 1]

输出：
[1,2,3,4,7,8,9,10,14,16]

今天先告一段落。就到这里吧
--
刘
https://sites.google.com/site/algoxy

liuxinyu

unread,

Oct 11, 2010, 2:26:06 AM10/11/10

to TopLanguage

Hi,

多说一句，Binary Heap还应该加上Splay Heap， Splay Heap被发现是最快的Heap之一。
过一阵我抽空介绍。

另外Binomial Heap的insert和merge看似都是O(lg N)的复杂度，但是注意这是worst-case的复杂度，而
amortized的复杂度是O(1)。

--
刘
https://sites.google.com/site/algoxy/home

liuxinyu

unread,

Oct 19, 2010, 6:11:22 AM10/19/10

to TopLanguage

Hi,

快速说一下Splay Tree，Splay Tree的数据结构实际上就是一个排序二叉树BST。

但是BST的最大问题是平衡性，也就是说，如果插入一个基本有序的序列，生成的BST实际上是一个瘸腿树，几乎退化成了单链表。
所以O(Lg N)的很多树操作，退化成了O(N)

为了解决平衡性问题，人们给出了很多解法，其中著名的是红黑树，AVL树等等，
红黑树的FP实现我前阵给出过：
https://sites.google.com/site/algoxy/rbtree

Splay Tree提供了另外一个思路来解决平衡性问题。在所有access树节点时，我们会进行traverse，如果我们需要访问x，
但是从根节点起连续向左侧（右侧）访问很深的深度才访问到x，我们认为这棵树很可能不平衡，于是我们通过splay操作对
树进行旋转，使得x更加靠近根节点。（甚或干脆把x设置为根节点）。经过一系列splay操作后，这棵树就逐渐趋于平衡。

WIKI上有一个C的实现：
http://en.wikipedia.org/wiki/Splay_tree

在FP实现中，Okasaki给出了一个解决方案，当我们连续向左侧(右侧)访问了两个节点后，我们就做一次旋转。
首先是树的定义，和普通二叉树没有区别：
data STree a = E -- Empty
| Node (STree a) a (STree a) -- left, element, right
deriving (Eq, Show)

然后我们看核心函数partition，它接受2个参数，一个tree，和一个值y。该函数把树分成两部分，
一部分包含所有比y小的值全在一部分，所有大的在另一部分。在partition过程中，会进行splay操作。
-- partition the tree in two parts based on a pivot value,
-- less part contains all elements < pivot
-- bigger part contains all elements >= pivot
partition :: (Ord a) => STree a -> a -> (STree a, STree a)
partition E _ = (E, E)
partition t@(Node l x r) y
| x < y =
case r of
E -> (t, E)
Node l' x' r' ->
if x' < y then
let (small, big) = partition r' y in
(Node (Node l x l') x' small, big)
else
let (small, big) = partition l' y in
(Node l x small, Node big x' r')
| otherwise =
case l of
E -> (E, t)
Node l' x' r' ->
if y < x' then
let (small, big) = partition l' y in
(small, Node l' x' (Node r' x r))
else
let (small, big) = partition r' y in
(Node l' x' small, Node big x r)

有了partition就可以方便实现insert了：
insert :: (Ord a) => STree a -> a -> STree a
insert t x = Node small x big where (small, big) = partition t x

现在如果用splay作为底层数据结构实现Heap，我们必须解决findMin和deleteMin
由于是BST，所以我们只要不断寻找左子树就得到了min:

findMin :: STree a -> a
findMin (Node E x _) = x
findMin (Node l x _) = findMin l

delete时，我们不妨进行一下splay操作：

deleteMin :: STree a -> STree a
deleteMin (Node E x r) = r
deleteMin (Node (Node E x' r') x r) = Node r' x r
deleteMin (Node (Node l' x' r') x r) = Node (deleteMin l') x' (Node r'
x r)

现在就可以按照一般的heap来实现heap sort了：

fromList :: (Ord a) => [a] -> STree a
fromList xs = foldl insert E xs

heapSort :: (Ord a) => [a] -> [a]
heapSort = hsort . fromList where

hsort E = []

hsort h = (findMin h):(hsort $ deleteMin h)

我们测试下：
testHeapSort = heapSort [16, 14, 10, 8, 7, 9, 3, 2, 4, 1]

程序输出：
*SplayHeap> testHeapSort
[1,2,3,4,7,8,9,10,14,16]

我个人觉得Okasaki的方法存在pattern matching的更加简洁的解法，目前还在思考中，晚些时候我可能给出其他解法。

--
刘
https://sites.google.com/site/algoxy/home

liuxinyu

unread,

Oct 21, 2010, 10:11:13 PM10/21/10

to TopLanguage

Hi,

受到Okasaki在红黑树中的思路启发，早上draft了一个pattern matching的解法。

一共有3中类型6个pattern, 每个类型都有两个左右对称的case。
3种类型分别为
zig-zig;
zig-zag;
zig
样子请参考WIKI:
http://en.wikipedia.org/wiki/Splay_tree

代码如下：
-- splay by pattern matching
splay :: (Eq a) => STree a -> a ->STree a
-- zig-zig
splay t@(Node (Node (Node a x b) p c) g d) y =
if x == y then Node a x (Node b p (Node c g d)) else t
splay t@(Node a g (Node b p (Node c x d))) y =
if x == y then Node (Node (Node a g b) p c) x d else t
-- zig-zag
splay t@(Node (Node a p (Node b x c)) g d) y =
if x == y then Node (Node a p b) x (Node c g d) else t
splay t@(Node a g (Node (Node b x c) p d)) y =
if x == y then Node (Node a g b) x (Node c p d) else t
-- zig
splay t@(Node (Node a x b) p c) y = if x == y then Node a x (Node b p
c) else t
splay t@(Node a p (Node b x c)) y = if x == y then Node (Node a p b) x
c else t
-- otherwise
splay t _ = t

有了splay后，就可以实现insert了：

-- insert by using pattern matching

insert' :: (Ord a) => STree a -> a -> STree a

insert' E y = Node E y E
insert' (Node l x r) y
| x < y = splay (Node (insert l y) x r) y
| otherwise = splay (Node l x (insert r y)) y

我大致smoke test了一下。能run，但是没有仔细验证其正确性。

--
刘
http://sites.google.com/site/algoxy/home

On Oct 19, 6:11 pm, liuxinyu <liuxiny...@gmail.com> wrote:
> Hi,
>

> 刘https://sites.google.com/site/algoxy/home

liuxinyu

unread,

Oct 21, 2010, 10:19:29 PM10/21/10

to TopLanguage

Hi，

刚才翻看了一下Haskell标准库hackage

http://hackage.haskell.org/packages/archive/TreeStructures/0.0.1/doc/html/src/Data-Tree-Splay.html#SplayTree

发现其解法也是Okasaki给出的策略：
连续两次向左（右）则进行旋转。

似乎还没有利用pattern matching的解法和正确性测试

--
刘
http://sites.google.com/site/algoxy/home

> 刘http://sites.google.com/site/algoxy/home

liuxinyu

unread,

Oct 24, 2010, 4:21:55 AM10/24/10

to TopLanguage

Hi,

快速写了一个smoke test case:

import System.Random -- only for testing

lookup' :: (Ord a) => STree a -> a -> STree a
lookup' E _ = E
lookup' t@(Node l x r) y
| x == y = t
| x > y = splay (Node (lookup' l y) x r) y
| otherwise = splay (Node l x (lookup' r y)) y

testSplay = do
xs <- sequence (replicate 100 (randomRIO(1, 10)))
putStrLn $ show (foldl lookup' t xs)
where
t = foldl insert' (E::STree Int) [1..10]

其中insert中要把x<y 改成x>y (sorry for the mistake)
GHCI中跑出的结果得到了一个非常平衡的BST，我对程序的正确性现在比较有信心了：）

*SplayHeap> testSplay
Node (Node E 1 (Node (Node E 2 E) 3 (Node (Node E 4 (Node E 5 E)) 6
(Node E 7 E)))) 8 (Node (Node E 9 E) 10 E)

有空再分析效率问题。

--
刘
http://sites.google.com/site/algoxy/home

> 刘http://sites.google.com/site/algoxy/home

liuxinyu

unread,

Nov 9, 2010, 5:20:52 AM11/9/10

to TopLanguage

Hi,

给一个C++版本的implicit binary heap by array.

STL中带有一个heap,所有的函数都是基于iterator的。这样的好处是用iterator解耦了container，STL的
container可以用random access iterator，
C-array可以用pointer。缺点是算法的可读性下降。

这里给出两种方案，代码都酷似教科书中的伪代码，可读性有一定增强。

方案一，用reference + size取代iterator，由于binary heap by array的底层数据结构要么是STL
container，要么是c array。故而简化用
reference和size来代表

方案二，用range，boost提供了一个range, 为了方便不了解boost的读者，我给一个化简的range。

首先抽象min-heap和max-heap如下：
template<class T> struct MinHeap: public std::less<T>{};
template<class T> struct MaxHeap: public std::greater<T>{};

然后使用template meta programming来获取容器或者数组的值的类型：
template<class T> struct ValueType{
typedef typename T::value_type Result;
};

template<class T> struct ValueType<T*>{
typedef T Result; // c-pointer type
};

template<class T, unsigned int n> struct ValueType<T[n]>{
typedef T Result; // c-array type
};

下面是Range的简化定义：

template<class RIter> // random access iterator
struct Range{
typedef typename std::iterator_traits<RIter>::value_type value_type;
typedef typename std::iterator_traits<RIter>::difference_type
size_t;
typedef typename std::iterator_traits<RIter>::reference reference;
typedef RIter iterator;

Range(RIter left, RIter right):first(left), last(right){}

reference operator[](size_t i){ return *(first+i); }
size_t size() const { return last-first; }

RIter first;
RIter last;
};

为了方便构造range，提供了一些auxiliary function:

template<class Iter>
Range<Iter> range(Iter left, Iter right){ return Range<Iter>(left,
right); }

template<class Iter>
Range<Iter> range(Iter left, typename Range<Iter>::size_t n){
return Range<Iter>(left, left+n);
}

heap和binary tree的mapping定义如下：
template<class T>
T parent(T i){ return ((i+1)>>1)-1; }

template<class T>
T left(T i){ return (i<<1)+1; }

template<class T>
T right(T i){ return (i+1)<<1; }

下面是reference + size版本的heapify:
template<class Array, class LessOp>
void heapify(Array& a, unsigned int i, unsigned int n, LessOp lt){
while(true){
unsigned int l=left(i);
unsigned int r=right(i);
unsigned int smallest=i;
if(l < n && lt(a[l], a[i]))
smallest = l;
if(r < n && lt(a[r], a[smallest]))
smallest = r;
if(smallest != i){
std::swap(a[i], a[smallest]);
i = smallest;
}
else
break;
}
}

下面是range版本的heapify
template<class R, class LessOp>
void heapify(R a, typename R::size_t i, LessOp lt){
typename R::size_t l, r, smallest;
while(true){
l = left(i);
r = right(i);
smallest = i;
if( l < a.size() && lt(a[l], a[i]))
smallest = l;
if( r < a.size() && lt(a[r], a[smallest]))
smallest = r;
if( smallest != i){
std::swap(a[i], a[smallest]);
i = smallest;
}
else
break;
}
}

对应的build heap, 首先是ref+size版本的：
template<class Array, class LessOp>
void build_heap(Array& a, unsigned int n, LessOp lt){
unsigned int i = (n-1)>>1;
while(true){
heapify(a, i, n, lt);
if(i==0) break; // this is a trick: unsigned int always >=0
--i;
}
}

然后是range版本的：
template<class RangeType, class LessOp>
void build_heap(RangeType a, LessOp lt){
typename RangeType::size_t i = (a.size()-1)>>1;
while(true){
heapify(a, i, lt);
if(i==0) break;
--i;
}
}

两个版本返回堆顶元素的函数一样：
template<class T>
typename ValueType<T>::Result heap_top(T a){ return a[0]; }

Pop略有区别，首先是ref+size的：
template<class T, class LessOp>
typename ValueType<T>::Result heap_pop(T& a, unsigned int& n, LessOp
lt){
typename ValueType<T>::Result top = heap_top(a);
a[0] = a[n-1];
heapify(a, 0, --n, lt);
return top;
}

然后是range的：

// range is modified after pop, popped element is put to array[last]
template<class R, class LessOp>
typename R::value_type heap_pop(R& a, LessOp lt){
typename R::value_type top = heap_top(a);
std::swap(a[0], a[a.size()-1]);
--a.last;
heapify_(a, 0, lt);
return top;
}

注意后者是in place的

利用pop实现heap sort如下。首先是ref+size版本的：

template<class Iter, class Array, class LessOp>
void heap_sort_slow(Iter res, Array& a, unsigned int n, LessOp lt){
build_heap(a, n, lt);
while(n>0)
*res++=heap_pop(a, n, lt);
}

然后是range版本的：
// In-place sort by n-heapify with Range
template<class R, class LessOp>
void heap_sort_slow(R a, LessOp lt){
build_heap(a, lt);
do{
++a.first;
heapify(a, 0, lt);
}while(a.size()>0);
}

R.W. Floyd的heap sort实现如下：
template<class Array, class GreaterOp>
void heap_sort(Array& a, unsigned int n, GreaterOp gt){
for(build_heap(a, n, gt); n>1; --n){
std::swap(a[0], a[n-1]);
heapify(a, 0, n-1, gt);
}
}

对应Range 版本的：
template<class R, class GreaterOp>
void heap_sort(R a, GreaterOp gt){
build_heap(a, gt);
while(a.size()>1){
std::swap(a[0], a[a.size()-1]);
--a.last;
heapify(a, 0, gt);
}
}

decrease key实现如下：
template<class Array, class LessOp>
void heap_fix(Array& a, unsigned int i, LessOp lt){
while(i>0 && lt(a[i], a[parent(i)])){
std::swap(a[i], a[parent(i)]);
i = parent(i);
}
}

template<class Array, class LessOp>
void heap_decrease_key(Array& a,
unsigned int i,
typename ValueType<Array>::Result key,
LessOp lt){
if(lt(key, a[i])){
a[i] = key;
heap_fix(a, i, lt);
}
}

然后是push

template<class Array, class LessOp>
void heap_push(Array& a,
unsigned int& n,
typename ValueType<Array>::Result key,
LessOp lt){
a[n] = key;
heap_fix(a, n, lt);
++n;
}

template<class R, class LessOp>
void heap_push(R a, typename R::value_type key, LessOp lt){
*a.last++ = key;
heap_fix(a, a.size()-1, lt);
}

注意我们不保证分配内存，用户须自己保证内存有效。

返回top K大元素的程序：

template<class Iter, class Array, class LessOp>
void heap_top_k(Iter res, unsigned int k,
Array& a, unsigned int& n, LessOp lt){
build_heap(a, n, lt);
unsigned int count = std::min(k, n);
for(unsigned int i=0; i<count; ++i)
*res++=heap_pop(a, n, lt);
}

//in-place put the top-k elements in front of the range
template<class R, class LessOp>
void heap_top_k(R a, typename R::size_t k, LessOp lt){
typename R::size_t count = std::min(k, a.size());
build_heap(a, lt);
while(count--){
++a.first;
heapify(a, 0, lt);
}
}

测试略

--
LIU
http://sites.google.com/site/algoxy/

On Oct 24, 4:21 pm, liuxinyu <liuxiny...@gmail.com> wrote:
> Hi,
>

> 快速写了一个smoke test case:
>
> import System.Random -- only for testing
>
> lookup' :: (Ord a) => STree a -> a -> STree a
> lookup' E _ = E
> lookup' t@(Node l x r) y
> | x == y = t
> | x > y = splay (Node (lookup' l y) x r) y
> | otherwise = splay (Node l x (lookup' r y)) y
>
> testSplay = do
> xs <- sequence (replicate 100 (randomRIO(1, 10)))
> putStrLn $ show (foldl lookup' t xs)
> where
> t = foldl insert' (E::STree Int) [1..10]
>
> 其中insert中要把x<y 改成x>y (sorry for the mistake)
> GHCI中跑出的结果得到了一个非常平衡的BST，我对程序的正确性现在比较有信心了：）
>
> *SplayHeap> testSplay
> Node (Node E 1 (Node (Node E 2 E) 3 (Node (Node E 4 (Node E 5 E)) 6
> (Node E 7 E)))) 8 (Node (Node E 9 E) 10 E)
>
> 有空再分析效率问题。
>
> --

> 刘http://sites.google.com/site/algoxy/home

liuxinyu

unread,

Nov 22, 2010, 3:06:02 AM11/22/10

to TopLanguage

Hi,

经过仔细思考和查阅资料，我认为我这篇post中关于balance问题的阐述完全错误。
Leftist的设计，主要出于提高Merge的效率。Merge的的复杂度为O(lg N)。而implicit binary heap by
array的merge复杂度为O(N)。

--
LIU

On Sep 28, 5:10 pm, liuxinyu <liuxiny...@gmail.com> wrote:
> Hi,
>

> 刘https://sites.google.com/site/algoxy/home

liuxinyu

unread,

Nov 22, 2010, 4:21:49 AM11/22/10

to TopLanguage

Hi,

PS: Donald Knuth, TAOCP, Volume III, P150页有比较好的分析。

liuxinyu

unread,

Nov 23, 2010, 2:03:22 AM11/23/10

to TopLanguage

Hi,

把Leftist Heap简化一下，可以得到Skew Heap[1]。粗看起来似乎Skew Heap存在效率问题，但是参考文献给出了
Amortize复杂度分析，全部操作都为O(lg N)。
下面是Haskell的一个实现，附带Heap Sort。

Skew Heap的解法看起来暴力统一，merge时，取较小的元素为新的根，然后将较大的一个所在的tree merge到右侧子树中，然后强行交
换左右子树。每次均如是操作。

其定义和普通二叉树无异：

data SHeap a = E -- Empty
| Node a (SHeap a) (SHeap a) -- element, left, right
deriving (Eq, Show)

merge看起来更加简单：
merge::(Ord a)=>SHeap a -> SHeap a -> SHeap a

merge E h = h
merge h E = h

merge h1@(Node x l r) h2@(Node y l' r') =
if x < y then Node x (merge r h2) l
else Node y (merge h1 r') l'

insert时，将待插入元素做成一个只有一个叶子节点的树，然后merge之：

insert::(Ord a)=> SHeap a -> a -> SHeap a
insert h x = merge (Node x E E) h

find Min操作返回根节点的值：
findMin :: SHeap a -> a
findMin (Node x _ _) = x

delete Min操作（也就是pop）操作，直接抛弃根节点的值，然后merge左右子树。
deleteMin :: (Ord a) => SHeap a -> SHeap a
deleteMin (Node _ l r) = merge l r

建立堆操作以及heap sort和Leftist Heap无二[2]：
fromList :: (Ord a) => [a] -> SHeap a
fromList = foldl insert E

heapSort :: (Ord a) => [a] -> [a]
heapSort = hsort . fromList where
hsort E = []
hsort h = (findMin h):(hsort $ deleteMin h)

测试下：
>fromList [16, 14, 10, 8, 7, 9, 3, 2, 4, 1]
Node 1 (Node 2 (Node 4 E E) (Node 3 (Node 7 (Node 9 E E) (Node 8 (Node
10 (Node 14 (Node 16 E E) E) E) E)) E)) E

>heapSort [16, 14, 10, 8, 7, 9, 3, 2, 4, 1]

[1,2,3,4,7,8,9,10,14,16]
--
LIU

Reference:
[1]. D. D. Sleator, R. E. Tarjan, Self-Adjusting Heaps, SIAM J.
COMPUT., Vol. 15, No. 1, 1986. http://www.cs.cmu.edu/~sleator/papers/Adjusting-Heaps.htm
[2]. https://sites.google.com/site/algoxy/bheap

liuxinyu

unread,

Nov 24, 2010, 12:28:36 AM11/24/10

to TopLanguage

Hi,

今天仔细思考了一下，Skew Heap正好可以解决Leftist Heap的软肋，在前面的一个Post中，我曾经给出一个极端情况：

>>>
*LeftistHeap> fromList [1..10]
Node 3 1 (Node 2 3 (Node 1 4 E E) (Node 1 5 E E)) (Node 2 2 (Node 2 7
(Node 1 8 E E) (Node 1 9 E E)) (Node 1 6 (Node 1 10 E E) E))

完全退化成链表了。
<<<

我们再看看Skew Heap的结果：
>>>
*SkewHeap> fromList [1..10]
Node 1 (Node 2 (Node 6 (Node 10 E E) E) (Node 4 (Node 8 E E) E)) (Node
3 (Node 5 (Node 9 E E) E) (Node 7 E E))
<<<

这是一棵非常平衡的二叉树。因此算法复杂度基本维持在O(lg N)。

--
LIU
https://sites.google.com/site/algoxy/bheap

On Nov 23, 3:03 pm, liuxinyu <liuxiny...@gmail.com> wrote:
> Hi,
>

> [2].https://sites.google.com/site/algoxy/bheap

fenghuang

unread,

Nov 27, 2010, 5:12:20 AM11/27/10

to pon...@googlegroups.com

今天看到一和heap sort原理相通的题目：
http://lyle.smu.edu/~saad/courses/cse3358/ps5/problemset5sol.pdf

2010/11/24 liuxinyu <liuxi...@gmail.com>

liuxinyu

unread,

Nov 28, 2010, 9:49:46 PM11/28/10

to TopLanguage

Hi,

这是``Introduction to Algorithm''也就是俗称CLRS在Heap Sort一章的作业题。
正巧我前阵时间再TL上给出过：

http://groups.google.com/group/pongba/browse_thread/thread/f8601e34aba78eb1/981d3a829cc9d666?q=young&lnk=nl&

原题是：
>>>
[问题] 给定两个有序的正整数数组A[n]和B[n], 我们定义如下集合: S = { (a, b} | a in A[n] and b
in
B[n]}, 很明显S总共有n^2个元素对. 元素对的值定义如下: value(a, b) = a + b.
求集合S中值最大的n个元素对.
<<<

On Nov 27, 6:12 pm, fenghuang <fenghaungyuyi...@gmail.com> wrote:
> 今天看到一和heap sort原理相通的题目：http://lyle.smu.edu/~saad/courses/cse3358/ps5/problemset5sol.pdf
>

> 2010/11/24 liuxinyu <liuxiny...@gmail.com>

> ...
>
> read more >>

liuxinyu

unread,

Dec 7, 2010, 4:26:26 AM12/7/10

to TopLanguage

;; 为了完整，今天给出Leftist Heap的Scheme/Lisp实现。

;; 本质上底层的data structure是list，为了方便打印输出，我们把一个节点定义为这样的list：
;; (left rank element right)

;; 然后就可以定义一些helper function来访问一个节点的左右子树，值和rank了：

(define (left t)
(if (null? t) '() (car t)))

(define (rank t)
(if (null? t) 0 (cadr t)))

(define (elem t)
(if (null? t) '() (caddr t)))

(define (right t)
(if (null? t) '() (cadddr t)))

;; 为了方便构造节点，提供了一个构造函数：

(define (make-tree l s x r) ;; l: left, s: rank, x: elem, r: right
(list l s x r))

;; 最核心的merge函数定义如下：

(define (merge t1 t2)
(define (make-node x a b)
(if (< (rank a) (rank b))
(make-tree b (+ (rank a) 1) x a)
(make-tree a (+ (rank b) 1) x b)))
(cond ((null? t1) t2)
((null? t2) t1)
((< (elem t1) (elem t2)) (make-node (elem t1) (left t1) (merge (right
t1) t2)))
(else (make-node (elem t2) (left t2) (merge t1 (right t2))))))

;; 有了merge后，插入，删除，寻找堆顶元素都可以方便定义

(define (insert t x)
(merge (make-tree '() 1 x '()) t))

(define (find-min t)
(elem t))

(define (delete-min t)
(merge (left t) (right t)))

;; 副产品是一个generic的堆排序函数

(define (from-list lst)
(fold-left insert '() lst))

(define (heap-sort lst)
(define (hsort t)
(if (null? t) '() (cons (find-min t) (hsort (delete-min t)))))
(hsort (from-list lst)))

;; 下面是测试代码：

(define (from-list lst)
(fold-left insert '() lst))

(define (heap-sort lst)
(define (hsort t)
(if (null? t) '() (cons (find-min t) (hsort (delete-min t)))))
(hsort (from-list lst)))

;; 测试结果如下：

(test-from-list)
;Value 14: ((((((((() 1 16 ()) 1 14 ()) 1 10 ()) 1 8 ()) 2 7 (() 1 9
())) 1 3 ()) 2 2 (() 1 4 ())) 1 1 ())

(test-sort)
;Value 15: (1 2 3 4 7 8 9 10 14 16)

--
LIU
https://sites.google.com/site/algoxy/bheap

On Nov 29, 10:49 am, liuxinyu <liuxiny...@gmail.com> wrote:
> Hi,
>
> 这是``Introduction to Algorithm''也就是俗称CLRS在Heap Sort一章的作业题。
> 正巧我前阵时间再TL上给出过：
>

> http://groups.google.com/group/pongba/browse_thread/thread/f8601e34ab...

> ...
>
> read more >>

liuxinyu

unread,

Dec 9, 2010, 12:22:44 AM12/9/10

to TopLanguage

Hi,

;; Skew Heap的Scheme/Lisp实现

;; 本质上，Skew Heap较Leftist Heap更为简单，就是普通的二叉树（注意，非BST，而是BT）
;; 为了方便打印输出，我们定义底层的list为中序格式：
;; (left element right)

;; 首先是一组accessor函数，本质上合二叉树的一致：

(define (left t)
(if (null? t) '() (car t)))

(define (elem t)
(if (null? t) '() (cadr t)))

(define (right t)
(if (null? t) '() (caddr t)))

;; 然后是一个建议的构造函数：
(define (make-tree l x r) ;; l: left, x: element, r: right
(list l x r))

;; 核心的merge函数非常简单，非平凡解的情况下，每次强行交换左右子树。
(define (merge t1 t2)

(cond ((null? t1) t2)
((null? t2) t1)
((< (elem t1) (elem t2))

(make-tree (merge (right t1) t2)
(elem t1)
(left t1)))
(else
(make-tree (merge t1 (right t2))
(elem t2)
(left t2)))))

;; 插入就是合并数和一个叶子的特殊情况：
(define (insert t x)
(merge (make-tree '() x '()) t))

;; 返回堆顶，弹出堆顶，以及堆排序可以复用Leftist tree的，我们可以把这些函数抽象到一个generic merge-able
heap的模块中直接load
;; 测试结果如下：

(load "skewheap.scm")
;Loading "skewheap.scm"... done
;Value: insert

(test-from-list)
;Value 13: (((() 4 ()) 2 (((() 9 ()) 7 ((((() 16 ()) 14 ()) 10 ()) 8
())) 3 ())) 1 ())

(test-sort)
;Value 14: (1 2 3 4 7 8 9 10 14 16)

--
Liu
https://sites.google.com/site/algoxy/bheap

> LIUhttps://sites.google.com/site/algoxy/bheap

> ...
>
> read more >>

Larry, LIU Xinyu

unread,

Dec 9, 2010, 12:43:57 AM12/9/10

to pon...@googlegroups.com

Hi,

顺便说一句，Google Groups升级了。界面越发花哨。
可惜墙内的同学们看不到了。

--
LIU
https://sites.google.com/site/algoxy/bheap

Larry, LIU Xinyu

unread,

Dec 16, 2010, 4:33:54 AM12/16/10

to TopLanguage

Hi,

最近特别忙，刚刚折腾完Sikuli，把它改造成一个支持remote verification的工具。
今天有空把Scheme/Lisp版本的Splay Heap实现好。

;;Splay tree的底层结构和排序二叉树(BST)无异：
;; 为了方便输出，底层定义为中序形式：
;; (left element right)

;; 定义一些accessors

(define (left t)
(if (null? t) '() (car t)))

(define (elem t)
(if (null? t) '() (cadr t)))

(define (right t)
(if (null? t) '() (caddr t)))

;; 然后是构造函数：

(define (make-tree l x r) ;; l: left, x: element, r: right
(list l x r))

;; 接下来是一个很罗嗦，但很简单的partition函数，它接受一个pivot
;; 然后把BST中所有小于pivot和所有大于pivot的元素分别放在两个BST中。
;; 如果连续向左两次traverse或者连续向右做两次traverse，就进行一次splay操作。
(define (partition t pivot)
(if (null? t)
(cons '() '())
(let ((l (left t))
(x (elem t))
(r (right t)))
(if (< x pivot)
(if (null? r)
(cons t '())
(let ((l1 (left r))
(x1 (elem r))
(r1 (right r)))
(if (< x1 pivot)
(let* ((p (partition r1 pivot))
(small (car p))
(big (cdr p)))
(cons (make-tree (make-tree l x l1) x1 small) big))
(let* ((p (parition l1 pivot))
(small (car p))
(big (cdr p)))
(cons (make-tree l x small) (make-tree big x1 r1))))))
(if (null? l)
(cons '() t)
(let ((l1 (left l))
(x1 (elem l))
(r1 (right l)))
(if (> x1 pivot)
(let* ((p (partition l1 pivot))
(small (car p))
(big (cdr p)))
(cons small (make-tree l1 x1 (make-tree r1 x r))))
(let* ((p (partition r1 pivot))
(small (car p))
(big (cdr p)))
(cons (make-tree l1 x1 small) (make-tree big x r))))))))))

;; insert和merge都可以方便的利用partition来实现：

(define (insert t x)
(let* ((p (partition t x))
(small (car p))
(big (cdr p)))
(make-tree small x big)))

(define (merge t1 t2)
(if (null? t1)
t2
(let* ((p (partition t2))
(small (car p))
(big (cdr p)))
(make-tree (merge (left t1) small) (elem t1) (merge (right t1)
big)))))

;;由于是BST，所以最小的元素在最左侧
(define (find-min t)
(if (null? (left t))
(elem t)
(find-min (left t))))

;;删除最小元素时，如果连续两次向左遍历，则进行一次splay操作
(define (delete-min t)
(cond ((null? (left t)) (right t))
((null? (left (left t))) (make-tree (right (left t)) (elem t) (right
t)))
(else (make-tree (delete-min (left (left t)))
(elem (left t))
(make-tree (right (left t)) (elem t) (right t))))))

;; 以下是另外一种实现splay的解法，它本质是做pattern matching
(define (splay l x r y)
(cond ((eq? y (elem (left l))) ;; zig-zig
(make-tree (left (left l))
(elem (left l))
(make-tree (right (left l))
(elem l)
(make-tree (right l) x r))))
((eq? y (elem (right r))) ;; zig-zig
(make-tree (make-tree (make-tree l x (left r))
(elem r)
(left (right r)))
(elem (right r))
(right (right r))))
((eq? y (elem (right l))) ;; zig-zag
(make-tree (make-tree (left l) (elem l) (left (right l)))
(elem (right l))
(make-tree (right (right l)) x r)))
((eq? y (elem (left r))) ;; zig-zag
(make-tree (make-tree l x (left (left r)))
(elem (left r))
(make-tree (right (left r)) (elem r) (right r))))
((eq? y (elem l)) ;; zig
(make-tree (left l) (elem l) (make-tree (right l) x r)))
((eq? y (elem r)) ;; zig
(make-tree (make-tree l x (left r)) (elem r) (right r)))
(else (make-tree l x r))))

;; 使用pattern matching后，insert就和红黑树的实现方式异曲同工了。
(define (insert-splay t x)
(cond ((null? t) (make-tree '() x '()))
((> (elem t) x)
(splay (insert-splay (left t) x) (elem t) (right t) x))
(else
(splay (left t) (elem t) (insert-splay (right t) x) x))))

;; 同理，lookup也可以实现：
(define (insert-splay t x)
(cond ((null? t) (make-tree '() x '()))
((> (elem t) x)
(splay (insert-splay (left t) x) (elem t) (right t) x))
(else
(splay (left t) (elem t) (insert-splay (right t) x) x))))

;;给几个个测试函数：

(define (from-list lst)
(fold-left insert '() lst))

(define (heap-sort lst)
(define (hsort t)
(if (null? t) '() (cons (find-min t) (hsort (delete-min t)))))
(hsort (from-list lst)))

;; test
(define (test-from-list)
(from-list '(16 14 10 8 7 9 3 2 4 1)))

(define (test-sort)
(heap-sort '(16 14 10 8 7 9 3 2 4 1)))
(define (from-list-splay lst)
(fold-left insert-splay '() lst))

(define (test-from-list-splay)
(from-list-splay '(16 14 10 8 7 9 3 2 4 1)))

我们看看测试结果：
(load "splayheap.scm")
;Loading "splayheap.scm"...
; Loading "genheap.scm"... done
;... done
;Value: test-from-list-splay

(test-from-list)
;Value 13: (() 1 ((() 2 ()) 3 (() 4 (() 7 (() 8 (() 9 (() 10 (() 14
(() 16 ())))))))))

(test-sort)
;Value 14: (1 2 3 4 7 8 9 10 14 16)

(test-from-list-splay)
;Value 15: (() 1 ((() 2 (() 3 ())) 4 ((() 7 (() 8 ())) 9 (() 10 (() 14
(() 16 ()))))))

--
LIU
https://sites.google.com/site/algoxy/bheap

On Dec 9, 1:22 pm, liuxinyu <liuxiny...@gmail.com> wrote:
> Hi,
>

> ;; SkewHeap的Scheme/Lisp实现
>
> ;; 本质上，SkewHeap较LeftistHeap更为简单，就是普通的二叉树（注意，非BST，而是BT）

> Liuhttps://sites.google.com/site/algoxy/bheap

> > > > 今天看到一和heapsort原理相通的题目：http://lyle.smu.edu/~saad/courses/cse3358/ps5/problemset5sol.pdf
>
> > > > 2010/11/24 liuxinyu <liuxiny...@gmail.com>
>
> > > > > Hi,
>
> > > > > 今天仔细思考了一下，SkewHeap正好可以解决LeftistHeap的软肋，在前面的一个Post中，我曾经给出一个极端情况：

>
> > > > > *LeftistHeap> fromList [1..10]
> > > > > Node 3 1 (Node 2 3 (Node 1 4 E E) (Node 1 5 E E)) (Node 2 2 (Node 2 7
> > > > > (Node 1 8 E E) (Node 1 9 E E)) (Node 1 6 (Node 1 10 E E) E))
>
> > > > > 完全退化成链表了。
> > > > > <<<
>
> > > > > 我们再看看SkewHeap的结果：
>
> > > > > *SkewHeap> fromList [1..10]
> > > > > Node 1 (Node 2 (Node 6 (Node 10 E E) E) (Node 4 (Node 8 E E) E)) (Node
> > > > > 3 (Node 5 (Node 9 E E) E) (Node 7 E E))
> > > > > <<<
>
> > > > > 这是一棵非常平衡的二叉树。因此算法复杂度基本维持在O(lg N)。
>
> > > > > --
> > > > > LIU
> > > > >https://sites.google.com/site/algoxy/bheap
>
> > > > > On Nov 23, 3:03 pm, liuxinyu <liuxiny...@gmail.com> wrote:
> > > > > > Hi,
>

> > > > > > 把LeftistHeap简化一下，可以得到SkewHeap[1]。粗看起来似乎SkewHeap存在效率问题，但是参考文献给出了

> > > > > > Amortize复杂度分析，全部操作都为O(lg N)。
> > > > > > 下面是Haskell的一个实现，附带HeapSort。
>

> > > > > > SkewHeap的解法看起来暴力统一，merge时，取较小的元素为新的根，然后将较大的一个所在的tree merge到右侧子树中，然后强行交

> > > > > > 换左右子树。每次均如是操作。
>
> > > > > > 其定义和普通二叉树无异：
>
> > > > > > data SHeap a = E -- Empty
> > > > > > | Node a (SHeap a) (SHeap a) -- element, left, right
> > > > > > deriving (Eq, Show)
>
> > > > > > merge看起来更加简单：
> > > > > > merge::(Ord a)=>SHeap a -> SHeap a -> SHeap a
> > > > > > merge E h = h
> > > > > > merge h E = h
> > > > > > merge h1@(Node x l r) h2@(Node y l' r') =
> > > > > > if x < y then Node x (merge r h2) l
> > > > > > else Node y (merge h1 r') l'
>
> > > > > > insert时，将待插入元素做成一个只有一个叶子节点的树，然后merge之：
>
> > > > > > insert::(Ord a)=> SHeap a -> a -> SHeap a
> > > > > > insert h x = merge (Node x E E) h
>
> > > > > > find Min操作返回根节点的值：
> > > > > > findMin :: SHeap a -> a
> > > > > > findMin (Node x _ _) = x
>
> > > > > > delete Min操作（也就是pop）操作，直接抛弃根节点的值，然后merge左右子树。
> > > > > > deleteMin :: (Ord a) => SHeap a -> SHeap a
> > > > > > deleteMin (Node _ l r) = merge l r
>

> > > > > > 建立堆操作以及heapsort和LeftistHeap无二[2]：

> > > > > > > > > 第一个难题是heappop，也就是delete Min操作。考虑下面的显示二叉树：

>
> > > > > > > > > (left value right)
>
> > > > > > > > > 其中value小于所有left中节点上的值，且value小于所有right中节点上的值。
>

> > > > > > > > > 那么heappop后，value被弹出去了，剩下了两个子树left和right，我们需要merge

> > > > > left和right从而得到一个新的二叉
> > > > > > > > > 树，且其根节点的值为最小值，以满足heap的性质。
>
> > > > > > > > > 如何merge left和right呢？我们知道left和right也是用显示二叉树表示的heap，有两个平凡解可以立刻的出来：
> > > > > > > > > merge NIL right = right
> > > > > > > > > merge left NIL = left
> > > > > > > > > 难点是，如果left和right都不为空怎么merge。
>
> > > > > 由于left和right都满足heap的性质，故而left和right的顶部的值分别是各自的最小值。我们只要比较这两个值，找一个更小的作为新的
> > > > > > > > > 顶部，然后把较大值所在的树的merge到一个子树中即可。举个例子，假设left = (a x b);
>

> ...
>
> read more >>

Larry, LIU Xinyu

unread,

Dec 20, 2010, 2:36:00 AM12/20/10

to TopLanguage

Binary heap脱稿。

最后做了一遍spell check, Emacs下没有找到好的grammar check tool，放弃了。
全文53页，覆盖:
1, implicit binary heap by array;
2, leftist heap;
3, skew heap;
4, splay heap;

其中1提供了Python/ISO C++的实现，2,3,4提供了Haskell和Scheme/Lisp的实现。
本质上2,3,4可以提供C++和Python的实现，不过没有时间弄了。

全文下载：
https://sites.google.com/site/algoxy/bheap/bheap-en.pdf
在线全文：
https://sites.google.com/site/algoxy/bheap

AlgoXY更新了一下：
https://sites.google.com/site/algoxy/home/main-en.pdf

我争取后面把K-ary heap写一写。春节后再说吧。

--
LIU

Larry, LIU Xinyu

unread,

Dec 22, 2010, 12:35:54 AM12/22/10

to TopLanguage

Hi,

今天给Fibonacci heap开个头。

首先是为什么叫Fibonacci heap？其实Fibonacci heap的设计和实现与Fibonacci数列一点都没有关系。
只有在证明其算法复杂度的时候才会用到Fibonacci数列的性质。于是作者（ Michael L. Fredman and Robert
E. Tarjan）
用Fibonacci来给这个数据结构命名。

基础算法(elementary algorithms and data structures）中真正与Fibonacci有关的不多，还有一个著
名的是Fibonacci search。
理论上他的性能比binary search(二分查找)要好。Knuth在TAOCP的第3卷417-419页有所介绍。

Fibonacci heap在CLRS第20章有所介绍，算法非常优美。但是略显复杂。本质上Fibonacci heap可以认为是一个lazy版
的binomial heap。
所以amortized复杂度分析出了delete min之外，都是O(1)。

由于lazy这一策略，Fibonacci heap用支持lazy evaluation的方式存在一个特别优雅的实现：
http://hackage.haskell.org/packages/archive/pqueue-mtl/1.0.7/doc/html/src/Data-Queue-FibQueue.html#FQueue
感兴趣的可以和CLRS上的实现对比一下。

我下面将逐步给出解释。

--
LIU
http://sites.google.com/site/algoxy/home

On Dec 20, 3:36 pm, "Larry, LIU Xinyu" <liuxiny...@gmail.com> wrote:
> Binaryheap脱稿。

Larry, LIU Xinyu

unread,

Dec 29, 2010, 3:00:50 AM12/29/10

to TopLanguage

Hi,

Chris, Okasaki 在1995年对此有一个特别详细的解释。非常清楚：
http://darcs.haskell.org/nofib/gc/fibheaps/orig

整个解中，最困难的部分在于，不同于Binomial heap，各个子tree是按照rank排序的，Fibonacci的子tree可能是任意
rank order的。
为此引入了array来管理这些子tree。

--
LIU
https://sites.google.com/site/algoxy

On Dec 22, 1:35 pm, "Larry, LIU Xinyu" <liuxiny...@gmail.com> wrote:
> Hi,
>
> 今天给Fibonacciheap开个头。
>
> 首先是为什么叫Fibonacciheap？其实Fibonacciheap的设计和实现与Fibonacci数列一点都没有关系。

> 只有在证明其算法复杂度的时候才会用到Fibonacci数列的性质。于是作者（ Michael L. Fredman and Robert
> E. Tarjan）
> 用Fibonacci来给这个数据结构命名。
>
> 基础算法(elementary algorithms and data structures）中真正与Fibonacci有关的不多，还有一个著
> 名的是Fibonacci search。
> 理论上他的性能比binary search(二分查找)要好。Knuth在TAOCP的第3卷417-419页有所介绍。
>
> Fibonacciheap在CLRS第20章有所介绍，算法非常优美。但是略显复杂。本质上Fibonacciheap可以认为是一个lazy版
> 的binomialheap。
> 所以amortized复杂度分析出了delete min之外，都是O(1)。
>

> 由于lazy这一策略，Fibonacciheap用支持lazy evaluation的方式存在一个特别优雅的实现：http://hackage.haskell.org/packages/archive/pqueue-mtl/1.0.7/doc/html...
> 感兴趣的可以和CLRS上的实现对比一下。
>
> 我下面将逐步给出解释。
>
> --
> LIUhttp://sites.google.com/site/algoxy/home

Larry, LIU Xinyu

unread,

Dec 30, 2010, 1:53:30 AM12/30/10

to TopLanguage

Hi,

今天给出Fibonacci Heap的详细解释。
如果读过CLRS（《算法导论》）第20章，恐怕读者都会觉得Fibonacci Heap挺麻烦的。

其实，如果从Functional Programming的角度，就会发现Fibonacci Heap其实也是非常朴素的。理解起来一点也不困
难。

首先原理上说，Fibonacci Heap本质上是Binomial Heap的一个lazy实现。我们回顾一下Binomial Heap的性
能：
插入：O(lg N)
merge: O(lg N)
删除：O(lg N)

我们的目标是：
插入: O(1)
merge: O(1)
删除：O(lg N) amortized

现在我们分析一下究竟Binomial插入时的瓶颈在哪里，Binomial heap在插入元素x时，把x作为一个只有一个叶子节点的
binomail tree，merge进去。
merge时，我们实际上是进行一次根据rank的有序序列插入，并且一旦发现有rank同样的，还要进行递归的合并。

所以作为lazy的策略，我们在插入时，把这个有序序列插入以及合并的复杂计算推迟到以后，而直接把这个叶子节点扔进去。
但是这样做的问题就是，我们在寻找堆顶的最小（大）元素时，复杂度就下来了，为了保持O(1)的复杂度，我们必须记录最小值元素所在的子树。

基于这个策略，我们定义如下。

首先是具体的树的定义，我们复用Binomial的定义：

data BiTree a = Node { rank :: Int
, root :: a
, children :: [BiTree a]} deriving (Eq, Show)

然后是Fibonacci Heap的定义。要么是空，要么包含3个部分，heap的size, 包含最小元素的树，和剩余的子树：
data FibHeap a = E | FH { size :: Int
, minTree :: BiTree a
, trees :: [BiTree a]} deriving (Eq, Show)

先到这，一会接着发。

--
LIU
https://sites.google.com/site/algoxy/home

On Dec 29, 4:00 pm, "Larry, LIU Xinyu" <liuxiny...@gmail.com> wrote:
> Hi,
>

> Chris, Okasaki 在1995年对此有一个特别详细的解释。非常清楚：http://darcs.haskell.org/nofib/gc/fibheaps/orig
>

> 整个解中，最困难的部分在于，不同于Binomialheap，各个子tree是按照rank排序的，Fibonacci的子tree可能是任意
> rank order的。
> 为此引入了array来管理这些子tree。
>
> --
> LIUhttps://sites.google.com/site/algoxy

Larry, LIU Xinyu

unread,

Dec 30, 2010, 2:36:48 AM12/30/10

to TopLanguage

Hi,

继续，为了方便。我提供一个singleton函数，用于从一个值，建立一个只有一个叶子节点的树，并放在heap里：
singleton :: a -> FibHeap a
singleton x = FH 1 (Node 1 x []) []

把两个rank一样的树link到一起的函数可以复用Binomial heap的。

link :: (Ord a) => BiTree a -> BiTree a -> BiTree a
link t1@(Node r x c1) t2@(Node _ y c2)

| x<y = Node (r+1) x (t2:c1)
| otherwise = Node (r+1) y (t1:c2)

注意，它可以保证树种最小的元素在root。

insert几乎和Binamial heap一样，它只不过调用merge:

insert :: (Ord a) => FibHeap a -> a -> FibHeap a
insert h x = merge h (singleton x)

下面看最关键的merge，它是实现lazy策略的最终地方：

merge:: (Ord a) => FibHeap a -> FibHeap a -> FibHeap a

merge h E = h
merge E h = h

merge h1@(FH sz1 minTr1 ts1) h2@(FH sz2 minTr2 ts2)
| root minTr1 < root minTr2 = FH (sz1+sz2) minTr1 (minTr2:ts2+
+ts1)
| otherwise = FH (sz1+sz2) minTr2 (minTr1:ts1++ts2)

这样，就实现了O(1)的插入，合并。顺便给出找堆顶的元素的函数:

findMin :: (Ord a) => FibHeap a -> a
findMin = root . minTree

先到这，下面给最困难的deleteMin
--
LIU
https://sites.google.com/site/algoxy/home

On Dec 30, 2:53 pm, "Larry, LIU Xinyu" <liuxiny...@gmail.com> wrote:
> Hi,
>

> 今天给出FibonacciHeap的详细解释。
> 如果读过CLRS（《算法导论》）第20章，恐怕读者都会觉得FibonacciHeap挺麻烦的。
>
> 其实，如果从Functional Programming的角度，就会发现FibonacciHeap其实也是非常朴素的。理解起来一点也不困
> 难。
>
> 首先原理上说，FibonacciHeap本质上是BinomialHeap的一个lazy实现。我们回顾一下BinomialHeap的性

> 能：
> 插入：O(lg N)
> merge: O(lg N)
> 删除：O(lg N)
>
> 我们的目标是：
> 插入: O(1)
> merge: O(1)
> 删除：O(lg N) amortized
>

> 现在我们分析一下究竟Binomial插入时的瓶颈在哪里，Binomialheap在插入元素x时，把x作为一个只有一个叶子节点的

> binomail tree，merge进去。
> merge时，我们实际上是进行一次根据rank的有序序列插入，并且一旦发现有rank同样的，还要进行递归的合并。
>
> 所以作为lazy的策略，我们在插入时，把这个有序序列插入以及合并的复杂计算推迟到以后，而直接把这个叶子节点扔进去。
> 但是这样做的问题就是，我们在寻找堆顶的最小（大）元素时，复杂度就下来了，为了保持O(1)的复杂度，我们必须记录最小值元素所在的子树。
>
> 基于这个策略，我们定义如下。
>
> 首先是具体的树的定义，我们复用Binomial的定义：
> data BiTree a = Node { rank :: Int
> , root :: a
> , children :: [BiTree a]} deriving (Eq, Show)
>

> 然后是FibonacciHeap的定义。要么是空，要么包含3个部分，heap的size, 包含最小元素的树，和剩余的子树：

> data FibHeap a = E | FH { size :: Int
> , minTree :: BiTree a
> , trees :: [BiTree a]} deriving (Eq, Show)
>
> 先到这，一会接着发。
>
> --

> LIUhttps://sites.google.com/site/algoxy/home

Larry, LIU Xinyu

unread,

Dec 30, 2010, 3:01:21 AM12/30/10

to TopLanguage

Hi,

deleteMin的实现思路为，把minTree的root扔掉。minTree的子tree和Heap里其他的tree，统一进行一次
consolidate。
以前lazy的工作，在这里统一要偿还。---出来混，一定要还的哈。

CLRS上给了一个特别优美的算法，只用遍历一次就OK了，结果是所有的tree里，没有重复rank的。

这个算法用FP实现需要twist一下mind。我终于想出一个，为了解释，我们先看一个简单的问题：

下面这个列表：[2, 1, 1, 4, 8, 1, 1, 2, 4]
如何把所有相同的值相加，并且相加后，如果有相同的值，还要继续相加。这个题目的答案是[8, 16]

解法为：
consolidate:: (Num a)=>[a] -> [a]
consolidate xs = foldl meld [] xs where
meld :: (Num a)=>[a] -> a -> [a]
meld [] x = [x]
meld (x':xs) x = if x == x' then meld xs (x+x')
else x:x':xs

测试下：
consolidate [2, 1, 1, 4, 8, 1, 1, 2, 4]
输出：
[8,16]

我们的tree的consolidate也是这个原理。
consolidate :: (Ord a) => [BiTree a] -> [BiTree a]
consolidate ts = foldl meld [] ts where
meld [] t = [t]
meld (t':ts) t = if rank t' == rank t then meld ts (link t t')
else t:t':ts

注意他是O(lg N)的

然后从merge好的tree中还要找出root最小的。函数如下：

extractMin :: (Ord a) => [BiTree a] -> (BiTree a, [BiTree a])
extractMin [t] = (t, [])
extractMin (t:ts) = if root t < root t' then (t, ts)

else (t', t:ts')
where

(t', ts') = extractMin ts

最后是千呼万唤的deleteMin

deleteMin :: (Ord a) => FibHeap a -> FibHeap a
deleteMin (FH _ (Node _ x []) []) = E
deleteMin h@(FH sz minTr ts) = FH (sz-1) minTr' ts' where
(minTr', ts') = extractMin $ consolidate (children minTr ++ ts)

现在可以测试。测试代码和其他的Heap都一样。

fromList :: (Ord a) => [a] -> FibHeap a

fromList xs = foldl insert E xs

heapSort :: (Ord a) => [a] -> [a]

heapSort = hsort . fromList where
hsort E = []
hsort h = (findMin h):(hsort $ deleteMin h)

-- test

testFromList = fromList [16, 14, 10, 8, 7, 9, 3, 2, 4, 1]

testHeapSort = heapSort [16, 14, 10, 8, 7, 9, 3, 2, 4, 1]

*FibonacciHeap> testHeapSort
[1,2,3,4,7,8,9,10,14,16]

完
--
LIU
https://sites.google.com/site/algoxy/home

On Dec 30, 3:36 pm, "Larry, LIU Xinyu" <liuxiny...@gmail.com> wrote:
> Hi,
>
> 继续，为了方便。我提供一个singleton函数，用于从一个值，建立一个只有一个叶子节点的树，并放在heap里：
> singleton :: a -> FibHeap a
> singleton x = FH 1 (Node 1 x []) []
>
> 把两个rank一样的树link到一起的函数可以复用Binomialheap的。
> link :: (Ord a) => BiTree a -> BiTree a -> BiTree a
> link t1@(Node r x c1) t2@(Node _ y c2)
> | x<y = Node (r+1) x (t2:c1)
> | otherwise = Node (r+1) y (t1:c2)
>
> 注意，它可以保证树种最小的元素在root。
>

> insert几乎和Binamialheap一样，它只不过调用merge:

>
> insert :: (Ord a) => FibHeap a -> a -> FibHeap a
> insert h x = merge h (singleton x)
>
> 下面看最关键的merge，它是实现lazy策略的最终地方：
>
> merge:: (Ord a) => FibHeap a -> FibHeap a -> FibHeap a
> merge h E = h
> merge E h = h
> merge h1@(FH sz1 minTr1 ts1) h2@(FH sz2 minTr2 ts2)
> | root minTr1 < root minTr2 = FH (sz1+sz2) minTr1 (minTr2:ts2+
> +ts1)
> | otherwise = FH (sz1+sz2) minTr2 (minTr1:ts1++ts2)
>
> 这样，就实现了O(1)的插入，合并。顺便给出找堆顶的元素的函数:
>
> findMin :: (Ord a) => FibHeap a -> a
> findMin = root . minTree
>
> 先到这，下面给最困难的deleteMin
> --

> LIUhttps://sites.google.com/site/algoxy/home

Larry, LIU Xinyu

unread,

Dec 30, 2010, 4:06:35 AM12/30/10

to TopLanguage

Hi,

今年写完K-ary Heap是不大可能了。我先预告一下，下面我会在TL的maillist上贴paring heap。
这是一种非常神奇的heap。其算法复杂度除了deleteMin之外都达到了O(1)。deleteMin的复杂度至今无人能够证明，而是猜想是
O(lg N)。
事实证明，它是迄今为止性能最好的heap之一。并且非常实用。

--
LIU
https://sites.google.com/site/algoxy/home

On Dec 30, 4:01 pm, "Larry, LIU Xinyu" <liuxiny...@gmail.com> wrote:
> Hi,
>

Kula

unread,

Dec 30, 2010, 4:54:53 AM12/30/10

to pon...@googlegroups.com

楼主用心了. 学到了不少东西..

2010/12/30 Larry, LIU Xinyu <liuxi...@gmail.com>

Larry, LIU Xinyu

unread,

Dec 30, 2010, 8:35:27 PM12/30/10

to pon...@googlegroups.com

Hi,

抱歉。昨天代码里有个bug。

首先，用于示例的consolidate函数应该改动如下：

consolidate xs = foldl meld [] xs where

    meld [] x = [x]
    meld (x':xs) x | x == x' = meld xs (x+x')
                   | x < x' = x:x':xs
                   | otherwise = x': meld xs x

这个bug的现象为：

--------before fixing--------
>consolidate [2, 1, 1, 32, 4, 8, 1, 1, 2, 4]
>[16,4,32,4]

--------after fixing----------
>consolidate [2, 1, 1, 32, 4, 8, 1, 1, 2, 4]
>[8, 16, 32]

因此，用于consolidate子树的代码应该相应修改为：

consolidate :: (Ord a) => [BiTree a] -> [BiTree a]
consolidate ts = foldl meld [] ts where
meld [] t = [t]

    meld (t':ts) t | rank t == rank t' = meld ts (link t t')
                   | rank t < rank t' = t:t':ts
                   | otherwise = t' : meld ts t

这个bug是昨天夜里睡醒时突然意识到的。早上半梦半醒之间的时候想到了答案。

更新的代码可以在这里下载：
https://sites.google.com/site/algoxy/otherheaps/otherheaps.zip

Happy new year.

--
Larry, LIU

Larry, LIU Xinyu

unread,

Dec 30, 2010, 10:32:19 PM12/30/10

to pon...@googlegroups.com

Hi,

今天看到一个文章。豁然开朗，作者认为FP的merge sort本质上是个利用implicit heap by array的heap sort。

http://apfelmus.nfshost.com/articles/implicit-heaps.html

然后，新的思路出来了，看这个素数筛法：
http://permalink.gmane.org/gmane.comp.lang.haskell.cafe/84666

只能用“有趣”来形容了
--
LIU

Larry, LIU Xinyu

unread,

Jan 5, 2011, 5:53:59 AM1/5/11

to TopLanguage

Hi,

开个pairing heap的头吧。

pairing heap是目前已知的性能最好的heap之一，并且他的实现非常简洁，同时适用于imperative和functional。
对于所有的操作如insert, find min, merge它的性能都是O(1),只有delete min(heap pop)的性能"可
能"是O(lg N)。
注意我用了引号，因为这仅仅是个假说，15年过去了，没有人能从数学上证明它，虽然大量数据显示它是O(lg N)的。

不同于Binomial heap和Fibonacci heap，它们本质上是用森林来实现的，而pairing heap是实实在在的K-
ary。
也就是有多个分支的树。

pairing heap的设计思路很简单：树根放置最小(大)值。其他东西统一扔到子树里。所以其定义为：

data PHeap a = E | Node a [PHeap a] deriving (Eq, Show)

也就是说，一个pairing heap，要么是空，要么是一个K-ary树，由一个一个值（最小值），和若干子树组成；

merge的思路非常直白：
一个空树和任何树h进行merge结果是h。
否则选择根值较大的树当做另外一颗树的新的子树

merge :: (Ord a) => PHeap a -> PHeap a -> PHeap a

merge h E = h
merge E h = h

merge h1@(Node x hs1) h2@(Node y hs2) =
if x < y then Node x (h2:hs1) else Node y (h1:hs2)

插入的话，就是把一个只有一个值的叶子树merge进去：
insert :: (Ord a) => PHeap a -> a -> PHeap a
insert h x = merge (Node x []) h

返回最小值的时候，只要返回根就可以了。
findMin :: PHeap a -> a
findMin (Node x _) = x

注意，上面都是O(1)的。下面进入真格的地方了：

deleteMin的时候，根扔掉后，剩下了孤零零的一组子树，怎么处理呢？方法是把这些子树从左到右，两两通过merge配对。然后再从右到左全部
merge起来。
写成伪代码是：

list = empty
for each x, y in children
list.append(merge(x, y))

t = empty
for each i in reverse(list)
t = merge(t, i)

return t

但是，如果使用递归，会有一个特别优雅的算法，它和bottom up merge sort很像：
deleteMin :: (Ord a) => PHeap a -> PHeap a
deleteMin (Node _ hs) = mergePairs hs where
mergePairs [] = E
mergePairs [h] = h
mergePairs (h1:h2:hs) = merge (merge h1 h2) (mergePairs hs)

好了，基本大功告成了，现在测试：

fromList :: (Ord a) => [a] -> PHeap a

fromList xs = foldl insert E xs

heapSort :: (Ord a) => [a] -> [a]
heapSort = hsort . fromList where
hsort E = []
hsort h = (findMin h):(hsort $ deleteMin h)

-- test

testFromList = fromList [16, 14, 10, 8, 7, 9, 3, 2, 4, 1]

testHeapSort = heapSort [16, 14, 10, 8, 7, 9, 3, 2, 4, 1]

--
LIU
https://sites.google.com/site/algoxy/otherheaps

Larry, LIU Xinyu

unread,

Jan 21, 2011, 4:42:51 AM1/21/11

to pon...@googlegroups.com

Hi,

今天给出Binomial Heap的imperative实现. python版本。
CLRS上给的merge算法有些复杂，分了3个case，我把它化简了。

另外，使用left-child, right sibling的策略来记录K-ary tree需要非常的小心。不然特别容易产生bug。
我后面会给一个使用python的list或者STL的std::list的K-ary实现，估计会进一步简化。

今天比较懒，直接贴代码吧。

#!/usr/bin/python

import random # for testing only

# Assume the heap is min-heap

# Use left child, right sibling approach
class BinomialTree:
    def __init__(self, x = None):
        self.rank = 0
        self.key = x
        self.parent = None
        self.child = None
        self.sibling = None

# Auxiliary function to extract the first tree
def extract_first(h):
    t = None
    if h is not None:
        t = h
        h = h.sibling
        t.sibling = None
    return (t, h)

# Implicit condition that the rank of the two trees are same
def link(t1, t2):
    if t2.key < t1.key:
        (t1, t2) = (t2, t1)
    t2.sibling = t1.child
    t1.child = t2
    t2.parent = t1
    t1.rank = t1.rank + 1
    #release t2
    return t1

# Insert a tree to the proper position in the heap
# So that the trees are in monotonically increase order by rank
# Implicit condition: the rank of tree is lower or equal to the
# first tree in the heap
def insert_tree(h, t):
    while h is not None and t.rank == h.rank:
        (t1, h) = extract_first(h)
        t = link(t, t1)
    if h is not None and t.rank < h.rank:
        t.sibling = h
    return t

# Insertion
def insert(h, x):
    return insert_tree(h, BinomialTree(x))

# Append a tree to the heap, so that the trees are in
# monotonically increase order by rank
# Implicit condition: the rank of tree is equal to or bigger by 1 than
# the last tree in the heap.
# Because the tail tree in the heap may be changed, we need the 2nd last
# tree as the argument
def append_tree(head, prev, tail, x):
    if head is None:
        return (x, None, x)
    if tail.rank == x.rank:
        tail = link(tail, x)
        if prev is None:
            return (tail, None, tail)
        prev.sibling = tail
    else:
        tail.sibling = x
        prev = tail
        tail = x
    return (head, prev, tail)

# Helper function to append a heap to another one by repeatedly calling
# append_tree()
def append_trees(h, p, t, xs):
    while xs is not None:
        (x, xs) = extract_first(xs)
        (h, p, t) = append_tree(h, p, t, x)
    return (h, p, t)

# Merge 2 heaps together. Use a merge sort like approach
def merge(h1, h2):
    if h1 is None:
        return h2
    if h2 is None:
        return h1
    (h, p, t) = (None, None, None)
    while h1 is not None and h2 is not None:
        x = None
        if h1.rank < h2.rank:
            (x, h1) = extract_first(h1)
        elif h2.rank < h1.rank:
            (x, h2) = extract_first(h2)
        else:
            (x1, h1) = extract_first(h1)
            (x2, h2) = extract_first(h2)
            x = link(x1, x2)
        (h, p, t) = append_trees(h, p, t, x)
    if h1 is not None:
        (h, p, t) = append_trees(h, p, t, h1)
    if h2 is not None:
        (h, p, t) = append_trees(h, p, t, h2)
    return h

# Reverse the linked list
def reverse(h):
    prev = None
    while h is not None:
        x = h
        h = h.sibling
        x.sibling = prev
        prev = x
    return prev

# Extract the minimum binomial tree from the heap
# returns (min tree, rest trees)
def remove_min_tree(h):
    head = h
    (prev_min, min_t) = (None, None)
    prev = None
    while h is not None:
        if min_t is None or h.key < min_t.key:
            min_t = h
            prev_min = prev
        prev = h
        h = h.sibling
    if prev_min is not None:
        prev_min.sibling = min_t.sibling
    else:
        head = min_t.sibling
    min_t.sibling = None
    return (min_t, head)

# Assume h is not empty
def find_min(h):
    min_t = None
    while h is not None:
        if min_t is None or h.key < min_t.key:
            min_t = h
        h = h.sibling
    return min_t.key

# Extract the min element, returns the (min, heap')
def extract_min(h):
    (min_t, h) = remove_min_tree(h)
    h = merge(h, reverse(min_t.child))
    min_t.child = None
    return (min_t.key, h)

# helper function
def from_list(lst):
    h = None
    for x in lst:
        h = insert(h, x)
    return h

def heap_sort(lst):
    h = from_list(lst)
    res = []
    while h is not None:
        (x, h) = extract_min(h)
        res.append(x)
    return res

def to_string(h):
    s = ""
    while h is not None:
        s = s+ "(" + str(h.key)+", "+to_string(h.child)+"), "
        h = h.sibling
    return s

class TestHeap:
    def __init__(self):
        print "Binomial heap testing"

    def run(self):
        self.test_heap_sort()

    def test_heap_sort(self):
        n = 1000
        for i in range(100):
            lst = random.sample(range(n), random.randint(0, n))
            assert(heap_sort(lst) == sorted(lst))
        print "OK"

if __name__ == "__main__":
    TestHeap().run()

--
LIU
https://sites.google.com/site/algoxy/home

Larry, LIU Xinyu

unread,

Jan 24, 2011, 2:27:32 AM1/24/11

to pon...@googlegroups.com

Hi,

略微给上周的加些解释：

首先看定义：

# Use left child, right sibling approach

class BinomialTree:
    def __init__(self, x = None):
        self.rank = 0
        self.key = x
        self.parent = None
        self.child = None
        self.sibling = None

由于使用纯指针的left child, right sibling策略，这样就允许使用类似二叉树的形式实现K-ary树，如果一个节点有子树，则left指针指向第一个子树。
所有的子树以一个单向链表来代表。每个子树通过right指针指向下一个兄弟节点。在实际定义中，为了避免left, right的混淆，我特意将他们命名为child和sibling。rank是binomial tree独有的。其定义为一个整数，key为该节点的值，parent指向父节点。目前的代码中暂时未用，在后续的decrease-key中，我们会利用它向上回溯到根节点。
link是将两个rank相同的binomial tree连接成一个更大的树。它比较待链接的两个树的根，选取较小的座位新树的根，然后将较大的树作为第一个子树插入到单向链表的表头。最后将rank增加1.

# Implicit condition that the rank of the two trees are same
def link(t1, t2):
    if t2.key < t1.key:
        (t1, t2) = (t2, t1)
    t2.sibling = t1.child
    t1.child = t2
    t2.parent = t1
    t1.rank = t1.rank + 1
    #release t2
    return t1

inert tree并非最终的insert函数，它也是一个辅助函数，它用来将一个binomial tree按照rank单调递增的顺序，插入到heap中，由于heap本身是一个binomial tree的单调递增森林，并且以单链表表示。所以这个函数本身是一个针对单链表的插入函数。但是这里有一个隐含条件，就是待插入的树的rank要么小于表头的rank，要么相等。而针对相等的情况，需要做link，并且递归进行插入。我在这里消除了递归，变成了纯imperative实现。

def insert_tree(h, t):
    while h is not None and t.rank == h.rank:
        (t1, h) = extract_first(h)
        t = link(t, t1)
    if h is not None and t.rank < h.rank:
        t.sibling = h
    return t

有了上述的辅助函数，heap的插入就可以方便实现了，因为插入任何值，都可以认为是插入一个rank为0的叶子树。而任何heap的表头的rank一定大于等于0。

# Insertion
def insert(h, x):
return insert_tree(h, BinomialTree(x))

下面主要实现merge，我们需要一个辅助函数，它把一个binomial tree添加到一个heap的末尾，这里也有一个隐含条件，就是待插入的树的rank，一定大于或等于链表末尾的节点rank。针对相等的情况，我们还需要link，由于表达为heap 的单链表的tree的rank是单调增的，所以link后的rank一定是最大的，因此我们不需要进行递归操作。
另外，由于末尾发生link后，我们需要更新链表的指针，所以传入的参数包含表头，表尾，和指向表尾前一个的指针。

def append_tree(head, prev, tail, x):
    if head is None:
        return (x, None, x)
    if tail.rank == x.rank:
        tail = link(tail, x)
        if prev is None:
            return (tail, None, tail)
        prev.sibling = tail
    else:
        tail.sibling = x
        prev = tail
        tail = x
    return (head, prev, tail)

如果需要连续调用上述的append-tree会比较麻烦，为此我定义了一个帮助函数，用以连续调用：

def append_trees(h, p, t, xs):
    while xs is not None:
        (x, xs) = extract_first(xs)
        (h, p, t) = append_tree(h, p, t, x)
    return (h, p, t)

在单向链表的操作中，我们经常要取出表头节点，然后返回剩余的链表，并且还需要切断表头和剩余部分的联系。为此我定义辅助函数：

# Auxiliary function to extract the first tree
def extract_first(h):
    t = None
    if h is not None:
        t = h
        h = h.sibling
        t.sibling = None
    return (t, h)

有了这些辅助函数，merge就可以方便实现了。merge的策略非常像merge sort。由于连个被merge的heap，本质上是两个单项链表，并且是rank单调增的。
所以我们每次从2个待合并heap中选取一个rank较小的，放到结果中，如果两个待合并heap的当前tree的rank相同，我们把他们link成一个rank增加1的较大的tree，然后append到结果中，注意这时可能会再次引发link，但append_tree会帮助我们处理好。当某个待合并的heap变成空时，我们连续调用append-tree将剩余的所有tree都放到结果中。

# Merge 2 heaps together. Use a merge sort like approach
def merge(h1, h2):
    if h1 is None:
        return h2
    if h2 is None:
        return h1
    (h, p, t) = (None, None, None)
    while h1 is not None and h2 is not None:
        x = None
        if h1.rank < h2.rank:
            (x, h1) = extract_first(h1)
        elif h2.rank < h1.rank:
            (x, h2) = extract_first(h2)
        else:
            (x1, h1) = extract_first(h1)
            (x2, h2) = extract_first(h2)
            x = link(x1, x2)
        (h, p, t) = append_trees(h, p, t, x)
    if h1 is not None:
        (h, p, t) = append_trees(h, p, t, h1)
    if h2 is not None:
        (h, p, t) = append_trees(h, p, t, h2)
    return h

下面要集中实现delete Min了，策略是在heap的单项链表中，找到root值最小的那个数，将其root去除，剩余的子树是一个rank单调减的链表。所以我们
将它逆序，然后和剩余的其他森林merge到一起。

为此我们需要实现一个单向链表逆序的函数，非常考验基本功：

def reverse(h):
    prev = None
    while h is not None:
        x = h
        h = h.sibling
        x.sibling = prev
        prev = x
    return prev

之后是将森林中最小树找出并剔除的函数

def remove_min_tree(h):
    head = h
    (prev_min, min_t) = (None, None)
    prev = None
    while h is not None:
        if min_t is None or h.key < min_t.key:
            min_t = h
            prev_min = prev
        prev = h
        h = h.sibling
    if prev_min is not None:
        prev_min.sibling = min_t.sibling
    else:
        head = min_t.sibling
    min_t.sibling = None
    return (min_t, head)

返回堆顶的函数就是找出最小树，并返回其根的值：

def find_min(h):
    min_t = None
    while h is not None:
        if min_t is None or h.key < min_t.key:
            min_t = h
        h = h.sibling
    return min_t.key

最后定义deleteMin, 由于我们返回最小值和更新的heap，故而我将名字改为extract-min。

def extract_min(h):
    (min_t, h) = remove_min_tree(h)
    h = merge(h, reverse(min_t.child))
    min_t.child = None
    return (min_t.key, h)

为了方便从列表构造堆，我定义了一个from-list函数，它重复调用insert函数，把列表中的所有元素插入到堆中。

def from_list(lst):
    h = None
    for x in lst:
        h = insert(h, x)
    return h

heap sort可以先构造堆，然后依次弹出堆顶元素来实现。

def heap_sort(lst):
    h = from_list(lst)
    res = []
    while h is not None:
        (x, h) = extract_min(h)
        res.append(x)
    return res

为了调试，我还定义了一个前序打印函数。

def to_string(h):
    s = ""
    while h is not None:
        s = s+ "(" + str(h.key)+", "+to_string(h.child)+"), "
        h = h.sibling
    return s

最后是测试：

Larry, LIU Xinyu

unread,

Jan 24, 2011, 4:13:47 AM1/24/11

to pon...@googlegroups.com

Hi,

果然如我所料，不使用left-child, right-sibling，而直接使用lib中的list或者array，可以获得一个比较简单的实现。库帮助避免了单向链表操作中容易出现的很多错误。

看代码：
#!/usr/bin/python

# binomialheap2.py, binomial heap, implemented with list
# Copyright (C) 2011, Liu Xinyu (liuxi...@gmail.com)
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.

import random # for testing only

# Assume the heap is min-heap

# binomialheap.py uses ``left child, right sibling'' way as mentioned
# in CLRS[1]. I think by replacing with list, it can be simplified
# 改用list后，children用python的list代替。

class BinomialTree:
    def __init__(self, x = None):
        self.rank = 0
        self.key = x

self.children = []

# Heap is list of [BinomialTree]
# Heap的定义也就是一个Binomial tree的list

# Implicit condition that the rank of the two trees are same

# 将2个rank一样的tree link到一起。把根较大的tree作为另一个tree的第一个child

def link(t1, t2):
if t2.key < t1.key:
(t1, t2) = (t2, t1)

t1.children.insert(0, t2)

t1.rank = t1.rank + 1

# release t2

return t1

# Insert a tree to the proper position in the heap
# So that the trees are in monotonically increase order by rank
# Implicit condition: the rank of tree is lower or equal to the
# first tree in the heap

# heap中的tree是rank单调增的，隐含条件是待增加的tree的rank小于或等于第一个tree。
# 若rank相等，则需要link，并且递归继续把新link出的tree插入进去。注意这里消除了递归。
def insert_tree(ts, t):
    while ts !=[] and t.rank == ts[0].rank:
        t = link(t, ts.pop(0))
    ts.insert(0, t)
    return ts

# Insertion
# 由于insert实际是插入一个叶子节点，其rank永远小于或等于heap中的第一个tree。故而可直接利用上述函数

def insert(h, x):
return insert_tree(h, BinomialTree(x))

# Append a tree to the heap, so that the trees are in
# monotonically increase order by rank
# Implicit condition: the rank of tree is equal to or bigger by 1 than
# the last tree in the heap.

# 将一个tree插入到heap尾部，隐含条件是待插入tree的rank大于或等于最后一个tree，
# 相等时需要进行link
def append_tree(ts, t):
    if ts != [] and ts[-1].rank == t.rank:
        ts[-1] = link(ts[-1], t)
    else:
        ts.append(t)
    return ts

# Helper function to append a heap to another one by repeatedly calling
# append_tree()

# 重复调用上述的append-tree可以通过fold left实现，python中的fold left未reduce
def append_trees(ts1, ts2):
return reduce(append_tree, ts2, ts1)

# Merge 2 heaps together. Use a merge sort like approach

# 合并两个heap可以通过类似merge sort的算法来实现。
def merge(ts1, ts2):
    if ts1 == []:
        return ts2
    if ts2 == []:
        return ts1
    ts = []
    while ts1 != [] and ts2 != []:
        t = None
        if ts1[0].rank < ts2[0].rank:
            t = ts1.pop(0)
        elif ts2[0].rank < ts1[0].rank:
            t = ts2.pop(0)
        else:
            t = link(ts1.pop(0), ts2.pop(0))
        ts = append_tree(ts, t)
    ts = append_trees(ts, ts1)
    ts = append_trees(ts, ts2)
    return ts

# Extract the minimum binomial tree from the heap
# returns (min tree, rest trees)

# 利用了python的min函数来获取根最小的树，并将其剔除heap。
def remove_min_tree(ts):
    min_t = min(ts, key=lambda t: t.key)
    ts.remove(min_t)
    return (min_t, ts)

# Assume ts is not empty
# 返回堆顶
def find_min(ts):
    min_t = min(ts, key=lambda t: t.key)

return min_t.key

# Extract the min element, returns the (min, heap')

# 移除堆顶，首先找到根最小的tree，将其剔除，然后将其children逆序后合并入剩余的heap。
def extract_min(ts):
    (min_t, ts) = remove_min_tree(ts)
    min_t.children.reverse()
    ts = merge(ts, min_t.children)
    min_t.children = []
    return (min_t.key, ts)

# helper function
# 辅助函数，用以从list产生一个heap。
def from_list(lst):
    return reduce(insert, lst, [])

# 堆排序

def heap_sort(lst):
h = from_list(lst)
res = []

while h != []:

        (x, h) = extract_min(h)
        res.append(x)
    return res

class TestHeap:
    def __init__(self):
        print "Binomial heap testing"

    def run(self):

self.test_random_sort()

def test_random_sort(self):

        n = 1000
        for i in range(100):
            lst = random.sample(range(n), random.randint(0, n))
            assert(heap_sort(lst) == sorted(lst))
        print "OK"

if __name__ == "__main__":
    TestHeap().run()

# Reference
# [1]. CLRS. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein. ``Introduction to Algorithms, Second Edition''. The MIT Press, 2001. ISBN: 0262032937.

--
LIU
https://sites.google.com/site/algoxy/home

Wang Fei

unread,

Jan 24, 2011, 8:27:24 PM1/24/11

to pon...@googlegroups.com

能整理一份pdf吗？

以便后人学习

----
Wang Fei
M.Eng. in Telecommunications Engineering

Larry, LIU Xinyu

unread,

Jan 25, 2011, 9:54:18 PM1/25/11

to pon...@googlegroups.com

Hi,

binary heap的已经有PDF了：
https://sites.google.com/site/algoxy/bheap

K-ary heap我估计要五一前才能全部完成吧.我会不断在这里更新：
https://sites.google.com/site/algoxy/otherheaps

Reply all

Reply to author

Forward