C++: Suggestions about a hash function for a sequence of strings where the order of the strings is irrelevant -
let's have these 2 sequences of strings
abc cba bc
bc abc cba
i'm trying create mapping such sequences(the sequence string) above 2 sequences mapped same bucket.
my initial thought add results of hashing function applied each string separately. in way order won't matter. if applied hashing function sequence string whole, of course hash result different.
however i'm new world of string hashing functions , have no idea whether approach efficient.
in website http://www.partow.net/programming/hashfunctions/index.html
i found many different implementations string hashing, i'm not sure 1 "best" needs.
some technical details each string in sequence each of them won't have more 25 characters. each sequence won't have more 3 strings.
questions
1.
approach of adding results of string hashing function each string of sequence work?
2.
if yes string hashing function should use give low amount of collisions , time efficient?
thank in advance
just idea demonstration (very inefficient string copying), complexity o(nlogn) n size of key (=== o(1) if keys have constant length known @ compile time), don't think can better complexity:
#include <boost/functional/hash.hpp> #include <set> #include <algorithm> std::size_t make_hash( std::string const& a, std::string const& b, std::string const& c) { std::string input[] = {a,b,c}; std::sort(input, input + (sizeof(input)/sizeof(*input))); return boost::hash_range(input, input + (sizeof(input)/sizeof(*input))); } #include <iostream> // g++ -i.../boost_1_47_0 string_set_hash.cpp int main() { std::cout << make_hash("abc", "bcd", "def") << std::endl; // 46247451276990640 std::cout << make_hash("bcd", "def", "abc") << std::endl; // 46247451276990640 }
a fragment of boost/functional/hash.hpp reference:
template <class t> inline void hash_combine(std::size_t& seed, t const& v) { boost::hash<t> hasher; seed ^= hasher(v) + 0x9e3779b9 + (seed<<6) + (seed>>2); } template <class it> inline std::size_t hash_range(it first, last) { std::size_t seed = 0; for(; first != last; ++first) { hash_combine(seed, *first); } return seed; }
Comments
Post a Comment