为什么要学习算法和数据结构

Larry, LIU Xinyu

unread,

Mar 14, 2011, 5:42:14 AM3/14/11

to TopLanguage

刚刚脱稿：
http://sites.google.com/site/algoxy/introduction
(在墙外)

--
LIU

lzprgmr

unread,

Mar 14, 2011, 5:46:58 AM3/14/11

to pon...@googlegroups.com

你干脆直接贴出来算了

Blog: http://www.cnblogs.com/baiyanhuang/
Douban:http://www.douban.com/people/baiyanhuang/

2011/3/14 Larry, LIU Xinyu <liuxi...@gmail.com>

Chinainvent

unread,

Mar 14, 2011, 5:48:14 AM3/14/11

to pon...@googlegroups.com

很神奇，此刻居然可以轻松地访问google site而不用翻墙，解禁了？

2011/3/14 Larry, LIU Xinyu <liuxi...@gmail.com>

刚刚脱稿：

--

Chinainvent

Blog: http://blog.csdn.net/chinainvent

Work at Alibaba

Devymex

unread,

Mar 14, 2011, 11:29:35 PM3/14/11

to pon...@googlegroups.com, Chinainvent

任意一个长度为数组，可能有正有负，32位范围为界。

求O(n)算法，将数组整理为前n/2部分都小于后n/2部分，但各部分不要求有序。

暂时没有思路。

=========程序控=========

2011/3/14 Chinainvent <chinain...@gmail.com>

机械唯物主义 : linjunhalida

unread,

Mar 14, 2011, 11:47:36 PM3/14/11

to pon...@googlegroups.com

2遍循环, 第一遍求中点,第2遍移位.
每遍都是O(n)的. 具体算法没有想清楚, 但是感觉一定能用O(n)做到.

2011/3/15 Devymex <dev...@gmail.com>:

Devymex

unread,

Mar 14, 2011, 11:54:08 PM3/14/11

to pon...@googlegroups.com, Chinainvent

嗯，应该是没有，这个问题等价于查找数组中第n/2小的元素，不用哈希的话是没有O(n)的解法的。

=========程序控=========

2011/3/15 Devymex <dev...@gmail.com>

Yi Chao

unread,

Mar 14, 2011, 11:56:38 PM3/14/11

to pon...@googlegroups.com

有O(N)的算法的，请查看算法导论，O(N)找第K大数，常数大点就是了。

fairywell

unread,

Mar 15, 2011, 1:05:09 AM3/15/11

to pon...@googlegroups.com

嗯，很土的算法了

Wu Yin

unread,

Mar 15, 2011, 1:41:01 AM3/15/11

to pon...@googlegroups.com

求用hash的O(n)算法

2011/3/15 Devymex <dev...@gmail.com>

--
滚滚长江东逝水，浪花淘尽英雄。
是非成败转头空。
青山依旧在，几度夕阳红。
白发渔樵江渚上，惯看秋月春风。
一壶浊酒喜相逢。
古今多少事，都付笑谈中。
----------

Xiamen University Cognitive Science Department, Brain-like Intelligent System Labs

Blog (all about mathematics & algorithms): http://wywcgs.wordpress.com.cn
Email: wyw...@gmail.com
Gtalk: wyw...@gmail.com
MSN: wyw...@hotmail.com

QQ: 346693733

TopCoder Handle: wywcgs

fairywell

unread,

Mar 15, 2011, 1:48:44 AM3/15/11

to pon...@googlegroups.com

吴教，你也来了，呵呵

Shuo Chen

unread,

Mar 15, 2011, 2:36:09 AM3/15/11

to TopLanguage

std::nth_element()

On Mar 15, 11:29 am, Devymex <devy...@gmail.com> wrote:
> 任意一个长度为数组，可能有正有负，32位范围为界。
>
> 求O(n)算法，将数组整理为前n/2部分都小于后n/2部分，但各部分不要求有序。
>
> 暂时没有思路。
>

> =========程序控 <http://www.cnblogs.com/devymex>=========
>
> 2011/3/14 Chinainvent <chinainvent....@gmail.com>
>
> > 很神奇，此刻居然可以轻松地访问google site而不用翻墙，解禁了？
>
> > 2011/3/14 Larry, LIU Xinyu <liuxiny...@gmail.com>

>
> >> 刚刚脱稿：
>
> >>http://sites.google.com/site/algoxy/introduction
> >> (在墙外)
>
> >> --
> >> LIU
>
> > --
> > Chinainvent
> > Blog:http://blog.csdn.net/chinainvent

> > Work at Alibaba <http://www.alibaba.com>

ma martin

unread,

Mar 15, 2011, 4:18:44 AM3/15/11

to pon...@googlegroups.com

算法很神奇，我很敬佩那些研究算法的牛人

2011/3/15 Shuo Chen <gian...@gmail.com>

332.gif

Larry, LIU Xinyu

unread,

Mar 15, 2011, 6:03:16 AM3/15/11

to TopLanguage

Hi,

说一下这个算法的描述：

记序列为xs, length xs = n, 记k=n/2
1. 取xs的第一个元素为pivot
2. partition，将所有小于pivot的元素放在前，记为as，大于的放在后，记为bs，记此时pivot的位置为i;
3. 比较i和k, 若i < k，则令xs=bs， k = k - i重复1
若i>k则，令xs=as，重复1

复杂度:该算法为线性算法，其中2的实现和quick sort相同。

--
LIU

On Mar 15, 11:56 am, Yi Chao <yichao...@gmail.com> wrote:
> 有O(N)的算法的，请查看算法导论，O(N)找第K大数，常数大点就是了。
>
> 在 2011-3-15，上午11:54， Devymex 写道：
>
> > 嗯，应该是没有，这个问题等价于查找数组中第n/2小的元素，不用哈希的话是没有O(n)的解法的。
>
> > =========程序控=========
>

> > 2011/3/15 Devymex <devy...@gmail.com>

> > 任意一个长度为数组，可能有正有负，32位范围为界。
>
> > 求O(n)算法，将数组整理为前n/2部分都小于后n/2部分，但各部分不要求有序。
>
> > 暂时没有思路。
>
> > =========程序控=========
>

> > 2011/3/14 Chinainvent <chinainvent....@gmail.com>
> > 很神奇，此刻居然可以轻松地访问google site而不用翻墙，解禁了？
>
> > 2011/3/14 Larry, LIU Xinyu <liuxiny...@gmail.com>

free.wang

unread,

Mar 15, 2011, 6:41:41 AM3/15/11

to pon...@googlegroups.com

问题是没法第一遍就求出中点

这是整个算法难度最瓶颈的地方

2011/3/15 机械唯物主义 : linjunhalida <linjun...@gmail.com>

--
真正的杰出，不是妙用规则的错层，而是极致的偏执于信念.
The Crankiness of Belief achieves Great , not the Trick of Regulation.

Devymex

unread,

Mar 15, 2011, 8:45:07 AM3/15/11

to pon...@googlegroups.com

你说的就是计数排序吗？那是类Hash算法！对于带负域的数组是无效的。

=========程序控=========

2011/3/15 Yi Chao <yich...@gmail.com>

Yi Chao

unread,

Mar 15, 2011, 8:49:15 AM3/15/11

to pon...@googlegroups.com

不是基数排序⋯⋯说了，看算导⋯⋯

Devymex

unread,

Mar 15, 2011, 8:53:45 AM3/15/11

to pon...@googlegroups.com

建Hash表，长度等于数组元素可能的取值范围。

第一遍，将Hash表置0
第二遍，将数组中存在的元素对应的Hash位置1
第三遍，将Hash表中第i个元素+=第i-1个元素。从1到结束。

这样，Hash表中不为0的元素就是数组中比对应的那个值小的数的个数。

=========程序控=========

2011/3/15 Wu Yin <wyw...@gmail.com>

Wu Yin

unread,

Mar 15, 2011, 10:27:11 AM3/15/11

to pon...@googlegroups.com

自己去看CLRS那个T(n) = T(5/n) + T(3n/5) + O(n)的东西，选出在中间n/5的某个数来做partition。和什么正负域没有任何关系，算法全程只基于比较。

还有，那个所谓的"hash"明显只是一个counting sort而已。

2011/3/15 Devymex <dev...@gmail.com>

Wu Yin

unread,

Mar 15, 2011, 10:29:31 AM3/15/11

to pon...@googlegroups.com

这种简单的partition只是expected O(n)而已，worst case是O(nk)的。

那个d&c求pivot的方法才是worst case O(n)的

2011/3/15 Larry, LIU Xinyu <liuxi...@gmail.com>

fairywell

unread,

Mar 15, 2011, 9:25:15 PM3/15/11

to pon...@googlegroups.com

吴教V5 :D

Wang Feng

unread,

Mar 15, 2011, 9:32:10 PM3/15/11

to pon...@googlegroups.com, fairywell

不要这样灌水好么？

2011/3/16 fairywell <sid...@gmail.com>

吴教V5 :D

在 2011-3-15，下午10:29， Wu Yin 写道：

这种简单的partition只是expected O(n)而已，worst case是O(nk)的。
那个d&c求pivot的方法才是worst case O(n)的

另外，nth_element 一种最简单的实现如下：

1 void swap( int& a, int& b )
2 {
3
4     a ^= b;
5     b ^= a;
6     a ^= b;
7 }
8
9 int partation( int* arr, int low, int high )
10 {
11     int ans = low - 1;
12     for ( int i = low; i < high; ++i )
13     {
14         if ( arr[i] < arr[high] )
15         {
16             swap( arr[i], arr[++ans] );
17         }
18     }
19     swap( arr[++ans], arr[high] );
20     return ans;
21 }
22
23 int nth( int* arr, int low, int high, int index )
24 {
25     int mid = partation(arr, low, high);
26     if ( mid == index ) return arr[mid];
27     return
28         ( mid < index )  ?
29         nth(arr, mid+1, high, index) :
30         nth(arr, low, mid-1, index);
31 }

Larry, LIU Xinyu

unread,

Mar 15, 2011, 9:40:00 PM3/15/11

to TopLanguage

Hi,

worst case 是O(n^2)的，此时序列为有序序列，每次pivot都是smallest或者biggest。
average情况下，此算法复杂度期望为：O(n)+O(n/2)+O(n/4)+...=O(n).

一个不那么精确的改良：每次处理xs时，取pivot=median(xs[1], xs[n], xs[n/2])。
这在大多数情况下已经足够健壮了。

--
LIU

On Mar 15, 10:29 pm, Wu Yin <wyw...@gmail.com> wrote:
> 这种简单的partition只是expected O(n)而已，worst case是O(nk)的。
> 那个d&c求pivot的方法才是worst case O(n)的
>

> 2011/3/15 Larry, LIU Xinyu <liuxiny...@gmail.com>

> wywcgs<http://www.topcoder.com/tc?module=MemberProfile&cr=22629546>

fairywell

unread,

Mar 15, 2011, 9:47:01 PM3/15/11

to pon...@googlegroups.com

感觉大家在努力讨论 60－90 年代已经研究得很好了的东西：）

Wu Yin

unread,

Mar 15, 2011, 9:48:34 PM3/15/11

to pon...@googlegroups.com

1. 确实是O(n^2)不是O(nk),我的问题

2. 期望虽然是O(n), 可是不是你这么个算法. 去找找randomized quicksort期望O(nlgn)的分析法, 用那种方法可以分析出O(n)的复杂度.

3. 最后的改良理论上无意义.

2011/3/16 Larry, LIU Xinyu <liuxi...@gmail.com>

Wu Yin

unread,

Mar 15, 2011, 9:49:37 PM3/15/11

to pon...@googlegroups.com

又不是做算法的, 而且大多数人知识储备没到那个层次, 根本没办法讨论前沿知识吧.

2011/3/16 fairywell <sid...@gmail.com>

fairywell

unread,

Mar 15, 2011, 9:58:44 PM3/15/11

to pon...@googlegroups.com

嗯，您说的是，讨论也是学习的过程，也能碰撞出思想的火花

Larry, LIU Xinyu

unread,

Mar 15, 2011, 10:43:46 PM3/15/11

to TopLanguage

Hi,

Worst case analysis比较简单，就不细说了。Expected running time的话，比较复杂，需要一个引理和数学期望的
累加计算。这里不大方便写。

我说说最后的改良，如果大家细看STL quick sort的SGI版本，就会发现实际上就是用这个方法选取的pivot。
可以说理论上无意义，但是它可以在很多情况下的避免前面提到的worst case。保证每次partition后，不会形成一侧为空，一侧为满的情
况。
类似的等价方法为pivot = xs[randint(0, length xs)]。除非精心设计一个序列进行exploit，否则对于很多序列，
这类策略足够强壮了。

--
LIU

On Mar 16, 9:48 am, Wu Yin <wyw...@gmail.com> wrote:
> 1. 确实是O(n^2)不是O(nk),我的问题
> 2. 期望虽然是O(n), 可是不是你这么个算法. 去找找randomized quicksort期望O(nlgn)的分析法,
> 用那种方法可以分析出O(n)的复杂度.
> 3. 最后的改良理论上无意义.
>

> 2011/3/16 Larry, LIU Xinyu <liuxiny...@gmail.com>

> wywcgs<http://www.topcoder.com/tc?module=MemberProfile&cr=22629546>

Wu Yin

unread,

Mar 15, 2011, 11:01:24 PM3/15/11

to pon...@googlegroups.com

我见过sgi stl的sort.

只是既然是在讨论算法, 那就去讨论算法. 和实用没什么太大的关系. 就像那个d&c的O(n)常数很大, 但是他依然是个好算法; fibonacci heap实际应用效率不高, 但是它仍然是非常游泳的数据结构一样.

至少在算法本身层面上, 你举出一个方法, 就得分析一下它的效果, 单纯说实际效果不错没什么意义. 因为你也不知道实际数据的分布到底是什么样子, 这个算法到底是不是真的就是正好对这个分布有很好的performance.

2011/3/16 Larry, LIU Xinyu <liuxi...@gmail.com>

Hi,

Larry, LIU Xinyu

unread,

Mar 15, 2011, 11:33:45 PM3/15/11

to TopLanguage

Hi,

我觉得这里说的事情有偏差。我也举一个例子，就是费马检验，我们知道费马检验是存在漏杀的，但是它给出了一种方法。
也就是说通过费马检验的数字很可能是素数了。如果否定这类方法，那么很多问题无法给出目前水平可接受的解。或者成本可以接受的解。

On Mar 16, 11:01 am, Wu Yin <wyw...@gmail.com> wrote:
> 我见过sgi stl的sort.
> 只是既然是在讨论算法, 那就去讨论算法. 和实用没什么太大的关系. 就像那个d&c的O(n)常数很大, 但是他依然是个好算法; fibonacci
> heap实际应用效率不高, 但是它仍然是非常游泳的数据结构一样.
> 至少在算法本身层面上, 你举出一个方法, 就得分析一下它的效果, 单纯说实际效果不错没什么意义. 因为你也不知道实际数据的分布到底是什么样子,
> 这个算法到底是不是真的就是正好对这个分布有很好的performance.
>

> 2011/3/16 Larry, LIU Xinyu <liuxiny...@gmail.com>

> wywcgs<http://www.topcoder.com/tc?module=MemberProfile&cr=22629546>

Wu Yin

unread,

Mar 15, 2011, 11:41:45 PM3/15/11

to pon...@googlegroups.com

但是你要知道rabin miller给出了一个每次检测的确定下界,所以可以用possibility amplify的方法去得到一个正确率无限接近100%的方法.

重要的不在idea本身,在于对idea的分析.

2011/3/16 Larry, LIU Xinyu <liuxi...@gmail.com>

Hi,

Larry, LIU Xinyu

unread,

Mar 16, 2011, 12:54:52 AM3/16/11

to TopLanguage

Hi,

并不矛盾啊，Miller-Rabin primary test也是probabilistic search method. 这个方法也是存在漏
杀的
Miller-Rabin(n, s) 的错误概率为1/(2^s)，如果取s足够大，通常我们可以得到一个非常好的结果。

回到前面说的三点中值法，和pivot=xs[randint(0, length xs)]法。也可以计算出出现worst case的可能性很
低。
对于长度为n的序列pivot为max/min的概率为2/n，所以每次pivot都恰好把序列分成一半空，一半满的概率为：
(2^(n-1))/(n!)。

当n=32时，这一概率为：8.16127700206094e-27已经非常小了。

--
LIU

On Mar 16, 11:41 am, Wu Yin <wyw...@gmail.com> wrote:
> 但是你要知道rabin miller给出了一个每次检测的确定下界,所以可以用possibility
> amplify的方法去得到一个正确率无限接近100%的方法.
> 重要的不在idea本身,在于对idea的分析.
>

> 2011/3/16 Larry, LIU Xinyu <liuxiny...@gmail.com>

> wywcgs<http://www.topcoder.com/tc?module=MemberProfile&cr=22629546>

Wu Yin

unread,

Mar 16, 2011, 1:54:16 AM3/16/11

to pon...@googlegroups.com

不想再回了,意义不大了.

你这个分析显然是错误的. 又不是只有分成一边是空的才会达到worst. 一边是一个一边是n-1个行不行? 一边是2个一边是n-2个行不行? 有一半总是常数行不行? 有一半个数不断变化但是上界是o(n)的行不行? 太多了.

2011/3/16 Larry, LIU Xinyu <liuxi...@gmail.com>

Hi,

Wang Feng

unread,

Mar 16, 2011, 2:09:38 AM3/16/11

to pon...@googlegroups.com, Wu Yin

我记得对于 nth_element 有一些实验得出的结论可以参考：
1） random_shuffle 之后再挑选
2）如果长度 N 大于 100000 则随机挑选 64 个元素排序后将剩余数据塞入 65 个空隙后再处理
3）如果要挑选的元素序数 m << N 的时候，采用 heap sort

忘记了具体出处了，方才翻看 TAOCP 没有找到

2011/3/16 Wu Yin <wyw...@gmail.com>

Wu Yin

unread,

Mar 16, 2011, 2:15:56 AM3/16/11

to pon...@googlegroups.com

你倘若真的想好好去分析一下这个算法的期望运行时间, 我可以提供一下我的思路.

1. 证明每次partition之后的序列是均匀分布的. 我不知道这个是不是对的, 但是直觉没问题. 可以考虑对partition的次数使用数学归纳法.

2. 计算出这种选取方法选出第k大的数的概率. 如果之前是均匀分布, 那么就相当于是选出一个均匀分布的三元组算中位数, 概率肯定可以写出公式.

3. 求运行时间的期望. 方法类似分析randomized quicksort, 对每两个数计算它们发生比较的概率, 然后求总和. 可能需要分成3种情况讨论: (1) 两个数的位置都<k; (2) 两个数的位置都>k; (3) 两个数的位置一个<k一个>k.

基本上三段分别求和, 最后再求总和就行.

2011/3/16 Larry, LIU Xinyu <liuxi...@gmail.com>

Hi,

Wu Yin

unread,

Mar 16, 2011, 3:17:35 AM3/16/11

to pon...@googlegroups.com

你再好好想想吧. 这个确实是worst case. 但是这难道是唯一的worst case么? 你只考虑了这种情况发生的概率, 你如何证明只要避免了这种情况, 算法的运行时间就不是O(n^2), 嗯?

2011/3/16 Larry, LIU Xinyu <liuxi...@gmail.com>

Hi,

我也感觉这么Re来Re去下去不大好了。我贴一段CLRS 7.2.1的一段：

The worst-case behavior for quicksort occurs when the partitioning
routine produces one subproblem with n - 1 elements and one with 0
elements. (This claim is proved in Section 7.4.1.) Let us assume that
this unbalanced partitioning arises in each recursive call. The
partitioning costs Θ(n) time. Since the recursive call on an array of
size 0 just returns, T(0) = Θ(1), and the recurrence for the running
time is

T(n)
=
T(n - 1) + T(0) + Θ(n)

=
T(n - 1) + Θ(n).

On Mar 16, 1:54 pm, Wu Yin <wyw...@gmail.com> wrote:
> 不想再回了,意义不大了.
> 你这个分析显然是错误的. 又不是只有分成一边是空的才会达到worst. 一边是一个一边是n-1个行不行? 一边是2个一边是n-2个行不行?
> 有一半总是常数行不行? 有一半个数不断变化但是上界是o(n)的行不行? 太多了.
>

> 2011/3/16 Larry, LIU Xinyu <liuxiny...@gmail.com>

> TopCoder Handle:...
>
> read more >>

Larry, LIU Xinyu

unread,

Mar 16, 2011, 3:42:56 AM3/16/11

to TopLanguage

Hi,

的确不那么严格的给个例子，假设算法每次pivot的分割都很不均匀，例如1:99吧，也就是比pivot大的元素个数是比pivot小的元素个数的
99倍。
令K=100

第一次分割后，xs=as++bs 有length as/length xs = K; 此次运行时间T(n), 是令xs = bs，继续分割；
第二次分割后，仍然as只有1/K的xs长，此次运行时间T(n(K-1)/K), 继续分割；
...

总时间是：T(n)+T(n(K-1)/K)+T(n(K-1)^2/(K^2))+...
注意到K-1/K<1，这个等比数列求和收敛到K。

故复杂度是O(nK)=O(100n)=O(n)

事实上，真正悲剧的情形是K=O(n)。每次左右不对称的比例达到n的量级就退化到O(n^2)了，包括：
左侧0，右侧n-1
左侧1，右侧n-2
....
左侧O(1)个，右侧O(n)个

--
LIU

On Mar 16, 3:17 pm, Wu Yin <wyw...@gmail.com> wrote:
> 你再好好想想吧. 这个确实是worst case. 但是这难道是唯一的worst case么? 你只考虑了这种情况发生的概率, 你如何证明
> 只要避免了这种情况, 算法的运行时间就不是O(n^2), 嗯?
>

> 2011/3/16 Larry, LIU Xinyu <liuxiny...@gmail.com>

> ...
>
> read more >>

Wu Yin

unread,

Mar 16, 2011, 3:47:44 AM3/16/11

to pon...@googlegroups.com

你这个例子确实不严格. 不仅不严格, 而且说明不了任何问题. worst case是每次长度减少固定数值, 不是减少固定比例。也就是说，是每次从n变成n-k，不是每次从n变成n-n/k。

其实我在上面已经说过了。如果每次partition之后总有一个序列的长度是o(n)的，那会如何？注意是o(n)不是O(n)。你这个什么1:XXXX就已经是O(n)量级了，当然能分析出总体O(n)的复杂度。

2011/3/16 Larry, LIU Xinyu <liuxi...@gmail.com>

Hi,

Wu Yin

unread,

Mar 16, 2011, 3:49:35 AM3/16/11

to pon...@googlegroups.com

表示本主题我不再回了。真的没有什么讨论的意义了。

2011/3/16 Wu Yin <wyw...@gmail.com>

Larry, LIU Xinyu

unread,

Mar 16, 2011, 4:28:51 AM3/16/11

to TopLanguage

Hi,

其实说明一个问题：如果算法分为best case, worst case, 和average case。
partition的average case更加接近best case而远离worst case。（又要被批评不严格了）
这也是为什么quick sort在大多数的场合表现很好，而人们存在相对简单的手段避免被某些特殊情况（属于worst case）
exploit。

另：对于99:1的情况，我可以利用master methods直接从T(n)=T(n/K)+T(n(K-1)/K) + cn 直接给出
O(n)，
但是我仍然一步一步列出来，然后用序列求和计算，因为这样比较省力，可以少记忆一些东西。

--
LIU

On Mar 16, 3:47 pm, Wu Yin <wyw...@gmail.com> wrote:
> 你这个例子确实不严格. 不仅不严格, 而且说明不了任何问题. worst case是每次长度减少固定数值,
> 不是减少固定比例。也就是说，是每次从n变成n-k，不是每次从n变成n-n/k。
> 其实我在上面已经说过了。如果每次partition之后总有一个序列的长度是o(n)的，那会如何？注意是o(n)不是O(n)。你这个什么1:XXXX就已经是O(n)量级了，
> 当然能分析出总体O(n)的复杂度。
>

> 2011/3/16 Larry, LIU Xinyu <liuxiny...@gmail.com>

> ...
>
> read more >>

Shuo Chen

unread,

Mar 16, 2011, 3:09:01 AM3/16/11

to TopLanguage

赞！

On Mar 16, 1:54 pm, Wu Yin <wyw...@gmail.com> wrote:
> 不想再回了,意义不大了.
> 你这个分析显然是错误的. 又不是只有分成一边是空的才会达到worst. 一边是一个一边是n-1个行不行? 一边是2个一边是n-2个行不行?
> 有一半总是常数行不行? 有一半个数不断变化但是上界是o(n)的行不行? 太多了.
>

> 2011/3/16 Larry, LIU Xinyu <liuxiny...@gmail.com>

> TopCoder Handle: ...
>
> read more >>

Larry, LIU Xinyu

unread,

Mar 16, 2011, 3:03:34 AM3/16/11

to TopLanguage

Hi,

我也感觉这么Re来Re去下去不大好了。我贴一段CLRS 7.2.1的一段：

The worst-case behavior for quicksort occurs when the partitioning
routine produces one subproblem with n - 1 elements and one with 0
elements. (This claim is proved in Section 7.4.1.) Let us assume that
this unbalanced partitioning arises in each recursive call. The
partitioning costs Θ(n) time. Since the recursive call on an array of
size 0 just returns, T(0) = Θ(1), and the recurrence for the running
time is

T(n)
=
T(n - 1) + T(0) + Θ(n)

=
T(n - 1) + Θ(n).

On Mar 16, 1:54 pm, Wu Yin <wyw...@gmail.com> wrote:
> 不想再回了,意义不大了.
> 你这个分析显然是错误的. 又不是只有分成一边是空的才会达到worst. 一边是一个一边是n-1个行不行? 一边是2个一边是n-2个行不行?
> 有一半总是常数行不行? 有一半个数不断变化但是上界是o(n)的行不行? 太多了.
>

> 2011/3/16 Larry, LIU Xinyu <liuxiny...@gmail.com>

> TopCoder Handle:...
>
> read more >>

Shuo Chen

unread,

Mar 16, 2011, 5:10:10 AM3/16/11

to TopLanguage

"median-of-3 killer" sequence, David Musser.

Wu Yin 已经说了"worst case是每次长度减少固定数值, 不是减少固定比例"，不是分成 0 vs. n-1 才叫 worst
case。

> ...
>
> read more >>

songyy

unread,

Mar 16, 2011, 10:26:48 PM3/16/11

to TopLanguage

哦，这个问题等价于从n个元素中找到第k个元素，用O（n）时间。
我们算法课刚好讲过这个，看这个Lecture Notes的71页以后：

http://www.eusoff.nus.edu.sg/~flyfy1/learn/alg/Lecture7.pdf

On Mar 15, 11:29 am, Devymex <devy...@gmail.com> wrote:
> 任意一个长度为数组，可能有正有负，32位范围为界。
>
> 求O(n)算法，将数组整理为前n/2部分都小于后n/2部分，但各部分不要求有序。
>
> 暂时没有思路。
>

> =========程序控 <http://www.cnblogs.com/devymex>=========
>
> 2011/3/14 Chinainvent <chinainvent....@gmail.com>
>
>
>

> > 很神奇，此刻居然可以轻松地访问google site而不用翻墙，解禁了？
>
> > 2011/3/14 Larry, LIU Xinyu <liuxiny...@gmail.com>
>
> >> 刚刚脱稿：
>
> >>http://sites.google.com/site/algoxy/introduction
> >> (在墙外)
>
> >> --
> >> LIU
>
> > --
> > Chinainvent
> > Blog:http://blog.csdn.net/chinainvent

> > Work at Alibaba <http://www.alibaba.com>- Hide quoted text -
>
> - Show quoted text -

Larry, LIU Xinyu

unread,

Mar 17, 2011, 1:03:01 AM3/17/11

to TopLanguage

Hi,

A self contained Python program with test cases as below.

I considered another method, which is essentially equivalent to nth-
element method:

function nth(xs, k)
l <- 1
u <- length xs
loop
x <- max(xs[1..k])
y <- min(xs[k+1..u])
if x < y then
return
else
swap(x, y)
l <- partition xs[1..k] with x
u <- partition xs[k_1..n] with y

Here are the source code:

----------------8--------------------
#!/usr/bin/python

import random # for test only

# partition xs[l..u] with value x
def partition_with(xs, l, u, x):
left = l
for right in range(l, u+1):
if xs[right] <= x:
(xs[left], xs[right]) = (xs[right], xs[left])
left = left + 1
return left

# Randomized partition with
def partition(xs, l, u):
return partition_with(xs, l, u, xs[random.randint(l, u)])

# method 1, randomized nth-element
def partition_at(xs, k):
(l, u)=(0, len(xs) - 1)
while True:
m = partition(xs, l, u)
if m < k:
l = m
elif m > k:
u = m - 1
else:
return

# method 2, min-max
def partition_at2(xs, k):
(l, u) = (0, len(xs) - 1)
while True:
a = max_at(xs, l, k)
b = min_at(xs, k, u)
if xs[a] > xs[b]:
(xs[a], xs[b]) = (xs[b], xs[a])
(l, u) = (partition_with(xs, l, k-1, xs[a]),
partition_with(xs, k, u, xs[b]) - 1)
else:
return

# find the index of min/max elements from x[l..u]
def max_at(xs, l, u):
i = l
for j in xrange(l, u+1):
if xs[j] > xs[i]:
i = j
return i

def min_at(xs, l, u):
i = l
for j in xrange(l, u+1):
if xs[j] < xs[i]:
i = j
return i

# a brute force verification
def verify(xs, k):
assert(sorted(xs[:k]) == sorted(xs)[:k]):

def gen_xs(n):
xs = random.sample(range(n), random.randint(2, n))
k = random.randint(1, len(xs)-1)
return (xs, k)

def test():
n = 10000
for i in range(100):
(xs, k)=gen_xs(n)
partition_at(xs, k)
verify(xs, k)
(xs, k)=gen_xs(n)
partition_at2(xs, k)
verify(xs, k)

if __name__ == "__main__":
test()

Larry, LIU Xinyu

unread,

Mar 17, 2011, 2:02:51 AM3/17/11

to TopLanguage

Hi,

And a Haskell nth-elem program for reference:

import Data.List -- for partition function
import Test.QuickCheck

-- returns the top n elements
top::Int->[Int]->[Int]
top _ [] = []
top 0 _ = []
top n (x:xs) | len ==n = as
| len < n = as++[x]++(top (n-len-1) bs)
| otherwise = top n as
where
(as, bs) = partition (<= x) xs
len = length as

prop_top :: Int ->[Int] ->Bool
prop_top k xs = if k>=0 then sort (top k xs) == take k (sort xs)
else True

Here is the quick verification:
*> test prop_top
OK, passed 100 tests.

Alternatively, the `partition` can be replaced as:
as = [ a | a<-xs, a<=x]
bs = [ b | b<-xs, b>x]

And here is a median method for change the pivot:

median x3 = sum x3 - maximum x3 - minimum x3

example: median [3, 1, 2] ==> 2

About the min-max method I mentioned in previous post.
The worst case happens when the sequence is in decrease order, and the
algorithm downgrades to O(n^2)

--
LIU

Reply all

Reply to author

Forward