Strategies Code Reference¶
Hashing Strategies¶
hash_strategy.py
Implements two basic hashing strategies - a universal hashing function as described by Cormen et al., and a hash-pair strategy as discussed by Kirsch and Mitzenmacher (2008).
-
class
minsketch.hash_strategy.
DoubleHashingStrategy
(depth, width, hash_gen=None)[source]¶ A single hash-pair count-min hashing scheme, based on Kirsch & Mitzenmacher (2008): https://www.eecs.harvard.edu/~michaelm/postscripts/rsa2008.pdf
Each item (x) is hashed as: \(h_1(x) + j * h_2(x) \ \forall \ j \in \{0, 1, ..., d - 1\}\)
-
class
minsketch.hash_strategy.
NaiveHashingStrategy
(depth, width, hash_gen=None)[source]¶ An implementation of the basic hashing strategy, using a different member of a universal function for each row
-
class
minsketch.hash_strategy.
UniversalHashFunctionGenerator
(m)[source]¶ Implementation of a universal hash function family generator, as described in Cormen et al.’s Introduction to Algorithms.
The sets are used to guarantee independence between the hash functions in this family.
For a given prime number p, and a width of table m, functions are constructed as \(f(x)=((ax + b)\mod p)\mod m, \ a \in \{1, 2, ..., p - 1\}, \ b \in \{0, 1, ..., p - 1\}\)
Updating Strategies¶
update_strategy.py
An implementation of naive and conservative update strategies. Each update strategy also implements an error-estimate, used by count-min-mean.
-
class
minsketch.update_strategy.
ConservativeUpdateStrategy
[source]¶ A conservative update strategy, as introduced by Cormode (2009) and Goyal (2010): http://www.cs.utah.edu/~amitg/papers/goyal10GEMS.pdf At every point, only update the minimal number of items.
As for why this is a class, see the class-comment for NaiveUpdateStrategy
-
count_mean_error_estimates
(table, values)[source]¶ Error estimates for the count-min-mean, but in this case, we used the row-wise sum as an estimator for the entry in that row, rather than the entire total. :param table: The table being updated :param values: Values of the item we return error estimates for :return: Error estimates for each of these values
-
-
class
minsketch.update_strategy.
NaiveUpdateStrategy
[source]¶ This could clearly be a function, rather than a class. Why make it a class, then? Two reasons: 1) Compatibility with other strategies, all of whom are classes 2) In the event I’ll want to implement stateful updating strategies in the future 3) I already ended up using its class-ness, to couple error estimation strategies here.
Lossy-Counting Window Strategies¶
lossy_strategy.py
Implementation of lossy tracking strategies.
-
class
minsketch.lossy_strategy.
LossyUpdateStrategy
(gamma, threshold_func)[source]¶ A simple class implementing lossy update strategies
-
class
minsketch.lossy_strategy.
NoLossyUpdateStrategy
[source]¶ This looks silly, but it’s implemented as a default over always doing a lossy update strategy check.
-
minsketch.lossy_strategy.
no_threshold_func
(window_count=None)[source]¶ The threshold function for LCU-ALL - accepts a parameter to conform to the ‘interface’ :param window_count: How many windows have appeared so far :return: positive infinity, always
-
minsketch.lossy_strategy.
one_threshold_func
(window_count=None)[source]¶ The threshold function for LCU-1 - accepts a parameter to conform to the ‘interface’ :param window_count: How many windows have appeared so far :return: 1, always