Python实现一个带权无回置随机抽选函数的方法-侯体宗的博客

Python实现一个带权无回置随机抽选函数的方法
Python / 管理员发布于 7年前 259

需求

有一个抽奖应用，从所有参与的用户抽出K位中奖用户(K=奖品数量)，且要根据每位用户拥有的抽奖码数量作为权重。

如假设有三个用户及他们的权重是: A(1), B(1), C(2)。希望抽到A的概率为25%，抽到B的概率为25%, 抽到C的概率为50%。

分析

比较直观的做法是把两个C放到列表中抽选，如[A, B, C, C]，使用Python内置的函数random.choice[A, B, C, C], 这样C抽到的概率即为50%。

这个办法的问题是权重比较大的时候，浪费内存空间。

更一般的方法是，将所有权重加和4，然后从[0, 4)区间里随机挑选一个值，将A, B, C占用不同大小的区间。[0,1)是A, [1,2)是B, [2,4)是C。

使用Python的函数random.ranint(0, 3)或者int(random.random()*4)均可产生0-3的随机整数R。判断R在哪个区间即选择哪个用户。

接下来是寻找随机数在哪个区间的方法，

一种方法是按顺序遍历列表并保存已遍历的元素权重综合S，一旦S大于R，就返回当前元素。

from operator import itemgetterusers = [('A', 1), ('B', 1), ('C', 2)]total = sum(map(itemgetter(1), users))rnd = int(random.random()*total) # 0~3s = 0for u, w in users:  s += w  if s > rnd:   return u

不过这种方法的复杂度是O(N)，因为要遍历所有的users。

可以想到另外一种方法，先按顺序把累积加的权重排成列表，然后对它使用二分法搜索，二分法复杂度降到O(logN)(除去其他的处理)

users = [('A', 1), ('B', 1), ('C', 2)]cum_weights = list(itertools.accumulate(map(itemgetter(1), users))) # [1, 2, 4]total = cum_weights[-1]rnd = int(random.random()*total) # 0~3hi = len(cum_weights) - 1index = bisect.bisect(cum_weights, rnd, 0, hi)return users(index)[0]

Python内置库random的choices函数(3.6版本后有)即是如此实现，random.choices函数签名为 random.choices(population, weights=None, *, cum_weights=None, k=1) population是待选列表， weights是各自的权重，cum_weights是可选的计算好的累加权重（两者选一），k是抽选数量（有回置抽选）。源码如下:

def choices(self, population, weights=None, *, cum_weights=None, k=1):  """Return a k sized list of population elements chosen with replacement.  If the relative weights or cumulative weights are not specified,  the selections are made with equal probability.  """  random = self.random  if cum_weights is None:    if weights is None:      _int = int      total = len(population)      return [population[_int(random() * total)] for i in range(k)]    cum_weights = list(_itertools.accumulate(weights))  elif weights is not None:    raise TypeError('Cannot specify both weights and cumulative weights')  if len(cum_weights) != len(population):    raise ValueError('The number of weights does not match the population')  bisect = _bisect.bisect  total = cum_weights[-1]  hi = len(cum_weights) - 1  return [population[bisect(cum_weights, random() * total, 0, hi)]      for i in range(k)]

更进一步

因为Python内置的random.choices是有回置抽选，无回置抽选函数是random.sample，但该函数不能根据权重抽选（random.sample(population, k)）。

原生的random.sample可以抽选个多个元素但不影响原有的列表，其使用了两种算法实现, 保证了各种情况均有良好的性能。 (源码地址：random.sample)

第一种是部分shuffle，得到K个元素就返回。时间复杂度是O(N)，不过需要复制原有的序列，增加内存使用。

result = [None] * kn = len(population)pool = list(population) # 不改变原有的序列for i in range(k):  j = int(random.random()*(n-i))  result[k] = pool[j]  pool[j] = pool[n-i-1] # 已选中的元素移走，后面未选中元素填上return result

而第二种是设置一个已选择的set，多次随机抽选，如果抽中的元素在set内，就重新再抽，无需复制新的序列。当k相对n较小时，random.sample使用该算法，重复选择元素的概率较小。

selected = set()selected_add = selected.add # 加速方法访问for i in range(k):  j = int(random.random()*n)  while j in selected:    j = int(random.random()*n)  selected_add(j)  result[j] = population[j]return result

抽奖应用需要的是带权无回置抽选算法，结合random.choices和random.sample的实现写一个函数weighted_sample。

一般抽奖的人数都比奖品数量大得多，可选用random.sample的第二种方法作为无回置抽选，当然可以继续优化。

代码如下：

def weighted_sample(population, weights, k=1):  """Like random.sample, but add weights.  """  n = len(population)  if n == 0:    return []  if not 0 <= k <= n:    raise ValueError("Sample larger than population or is negative")  if len(weights) != n:    raise ValueError('The number of weights does not match the population')  cum_weights = list(itertools.accumulate(weights))  total = cum_weights[-1]  if total <= 0: # 预防一些错误的权重    return random.sample(population, k=k)  hi = len(cum_weights) - 1  selected = set()  _bisect = bisect.bisect  _random = random.random  selected_add = selected.add  result = [None] * k  for i in range(k):    j = _bisect(cum_weights, _random()*total, 0, hi)    while j in selected:      j = _bisect(cum_weights, _random()*total, 0, hi)    selected_add(j)    result[i] = population[j]  return result

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持。

上一条：
Python定时任务工具之APScheduler使用方式
下一条：
python3字符串操作总结

0条评论 (评论内容有缓存机制,请悉知!)

最新最热

近期文章
在go语言中实现字符串可逆性压缩及解压缩功能(0个评论)
使用go + gin + jwt + qrcode实现网站生成登录二维码在app中扫码登录功能(0个评论)
在windows10中升级go版本至1.24后LiteIDE的Ctrl+左击无法跳转问题解决方案(0个评论)
智能合约Solidity学习CryptoZombie第四课:僵尸作战系统(0个评论)
智能合约Solidity学习CryptoZombie第三课:组建僵尸军队(高级Solidity理论)(0个评论)
智能合约Solidity学习CryptoZombie第二课:让你的僵尸猎食(0个评论)
智能合约Solidity学习CryptoZombie第一课:生成一只你的僵尸(0个评论)
在go中实现一个常用的先进先出的缓存淘汰算法示例代码(0个评论)
在go+gin中使用"github.com/skip2/go-qrcode"实现url转二维码功能(0个评论)
在go语言中使用api.geonames.org接口实现根据国际邮政编码获取地址信息功能(1个评论)

近期评论
122 在
学历：一种延缓就业设计，生活需求下的权衡之选中评论工作几年后，报名考研了，到现在还没认真学习备考，迷茫中。作为一名北漂互联网打工人..
123 在
Clash for Windows作者删库跑路了，github已404中评论按理说只要你在国内，所有的流量进出都在监控范围内，不管你怎么隐藏也没用，想搞你分..
原梓番博客在
在Laravel框架中使用模型Model分表最简单的方法中评论好久好久都没看友情链接申请了，今天刚看，已经添加。..
博主在
佛跳墙vpn软件不会用?上不了网?佛跳墙vpn常见问题以及解决办法中评论 @1111老铁这个不行了，可以看看近期评论的其他文章..
1111 在
佛跳墙vpn软件不会用?上不了网?佛跳墙vpn常见问题以及解决办法中评论网站不能打开，博主百忙中能否发个APP下载链接，佛跳墙或极光..

Top