sampling Package

sampling Package

This module implement an incremental sampler used to approximate the task and randomly select a portion of the triplets.

sampler Module

The sampler class implementing incremental sampling without replacement. Incremental meaning that you don’t have to draw the whole sample at once, instead at any given time you can get a piece of the sample of a size you specify. This is useful for very large sample sizes.

class ABXpy.sampling.sampler.IncrementalSampler(N, K, step=None, relative_indexing=True, dtype=<Mock id='140696957940240'>)[source]

Bases: object

next()[source]
sample(n, dtype=<Mock id='140696957942224'>)[source]

Fast implementation of the sampling function

Get all samples from the next n items in a way that avoid rejection sampling with too large samples, more precisely samples whose expected number of sampled items is larger than 10**5.

Parameters:

n : int

the size of the chunk

Returns

——-

sample : numpy.array

the indices to keep given relative to the current position in the sample or absolutely, depending on the value of relative_indexing specified when initialising the sampler (default value is True)

simple_sample(n)[source]

get all samples from the next n items in a naive fashion

Parameters:

n : int

the size of the chunk

Returns

——-

sample : numpy.array

the indices to be kept relative to the current position in the sample

ABXpy.sampling.sampler.Knuth_sampling(n, N, dtype=<Mock id='140696957940496'>)[source]

This is the usual sampling function when n is comparable to N

ABXpy.sampling.sampler.hypergeometric_sample(N, K, n)[source]

This function return the number of elements to sample from the next n items.

ABXpy.sampling.sampler.rejection_sampling(n, N, dtype=<Mock id='140696957940688'>)[source]

Using rejection sampling to keep a good performance if n << N

ABXpy.sampling.sampler.sample_without_replacement(n, N, dtype=<Mock id='140696957940368'>)[source]

Returns uniform samples in [0, N-1] without replacement. It will use Knuth sampling or rejection sampling depending on the parameters n and N.

Note

the values 0.6 and 100 are based on empirical tests of the functions and would need to be changed if the functions are changed