Strategies Code Reference

Hashing Strategies

hash_strategy.py

Implements two basic hashing strategies - a universal hashing function as described by Cormen et al., and a hash-pair strategy as discussed by Kirsch and Mitzenmacher (2008).

class minsketch.hash_strategy.DoubleHashingStrategy(depth, width, hash_gen=None)[source]

A single hash-pair count-min hashing scheme, based on Kirsch & Mitzenmacher (2008): https://www.eecs.harvard.edu/~michaelm/postscripts/rsa2008.pdf

Each item (x) is hashed as: \(h_1(x) + j * h_2(x) \ \forall \ j \in \{0, 1, ..., d - 1\}\)

class minsketch.hash_strategy.NaiveHashingStrategy(depth, width, hash_gen=None)[source]

An implementation of the basic hashing strategy, using a different member of a universal function for each row

class minsketch.hash_strategy.UniversalHashFunctionGenerator(m)[source]

Implementation of a universal hash function family generator, as described in Cormen et al.’s Introduction to Algorithms.

The sets are used to guarantee independence between the hash functions in this family.

For a given prime number p, and a width of table m, functions are constructed as \(f(x)=((ax + b)\mod p)\mod m, \ a \in \{1, 2, ..., p - 1\}, \ b \in \{0, 1, ..., p - 1\}\)

Updating Strategies

update_strategy.py

An implementation of naive and conservative update strategies. Each update strategy also implements an error-estimate, used by count-min-mean.

class minsketch.update_strategy.ConservativeUpdateStrategy[source]

A conservative update strategy, as introduced by Cormode (2009) and Goyal (2010): http://www.cs.utah.edu/~amitg/papers/goyal10GEMS.pdf At every point, only update the minimal number of items.

As for why this is a class, see the class-comment for NaiveUpdateStrategy

count_mean_error_estimates(table, values)[source]

Error estimates for the count-min-mean, but in this case, we used the row-wise sum as an estimator for the entry in that row, rather than the entire total. :param table: The table being updated :param values: Values of the item we return error estimates for :return: Error estimates for each of these values

class minsketch.update_strategy.NaiveUpdateStrategy[source]

This could clearly be a function, rather than a class. Why make it a class, then? Two reasons: 1) Compatibility with other strategies, all of whom are classes 2) In the event I’ll want to implement stateful updating strategies in the future 3) I already ended up using its class-ness, to couple error estimation strategies here.

count_mean_error_estimates(table, values)[source]

Provide an error estimate, using the total insertions into the table as a baseline :param table: The table being updated :param values: Values of the item we return error estimates for :return: Error estimates for each of these values

Lossy-Counting Window Strategies

lossy_strategy.py

Implementation of lossy tracking strategies.

class minsketch.lossy_strategy.LossyUpdateStrategy(gamma, threshold_func)[source]

A simple class implementing lossy update strategies

class minsketch.lossy_strategy.NoLossyUpdateStrategy[source]

This looks silly, but it’s implemented as a default over always doing a lossy update strategy check.

minsketch.lossy_strategy.no_threshold_func(window_count=None)[source]

The threshold function for LCU-ALL - accepts a parameter to conform to the ‘interface’ :param window_count: How many windows have appeared so far :return: positive infinity, always

minsketch.lossy_strategy.one_threshold_func(window_count=None)[source]

The threshold function for LCU-1 - accepts a parameter to conform to the ‘interface’ :param window_count: How many windows have appeared so far :return: 1, always

minsketch.lossy_strategy.sqrt_window_size_threshold_func(window_count)[source]

The threshold function for LCU-SWS :param window_count: How many windows have appeared so far :return: the square root of the windows seen so far as a threshold

minsketch.lossy_strategy.window_size_threshold_func(window_count)[source]

The threshold function for LCU-WS :param window_count: How many windows have appeared so far :return: the count of the windows seen so far as a threshold