Given an array of strings, group anagrams together.
Anagrams are words that are made of the same letters but in different order.
Brute force
Sort each word in the input lexicographically, then put the same word in the same dict that uses the sorted word as key.
def groupAnagrams(self, strs: 'List[str]') -> 'List[List[str]]':
"""
vf
"""
anagram = {}
res = []
for s in strs:
sorted_s = sorted(s)
anagram_key = tuple(sorted_s)
if anagram_key not in anagram:
anagram[anagram_key] = [s]
else:
anagram[anagram_key].append(s)
for key in anagram:
res.append(anagram[key])
return res
Only immutable types can be used as dict key.
Can we do away with sorting lexigraphically?
I was thinking computing a hash based on the letter frequency and use the hash.
The key is costlier than previous.
anagram = {}
for s in strs:
freq = [0] * 26
for c in s:
ascii = ord(c) - ord('a')
freq[ascii] += 1
anagram_key = tuple(freq)
if anagram_key not in anagram:
anagram[anagram_key] = [s]
else:
anagram[anagram_key].append(s)
res = []
for key in anagram:
res.append(anagram[key])
return res
We probably can’t do away with dict
Using prime
Before we code, we need to know the fundamental theorem of arithmetic
Fundamental theorem of arithmetic
Every number larger than 2 has prime factorization
This factorization is unique
No matter how we factorize a number, the frequency of each prime factor won’t change.
We first init a list of the first 26 prime numbers.
We convert each char in our string into 0-25. Each letter is associated now with a prime number. For example, a-2, b-3, c-5 and so on. Then create the key for anagram dict by multiplying prime number associated with each letter.
Since all words that share the same letters will end up with the same key and no other can(since unique prime factorization) we can use the key as a dictionary.
def groupAnagrams(self, strs: 'List[str]') -> 'List[List[str]]':
"""
vf
"""
prime = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103]
anagram = {}
for s in strs:
key = 1
for c in s:
ascii = ord(c) - ord('a')
key *= prime[ascii]
if key not in anagram:
anagram[key] = [s]
else:
anagram[key].append(s)
res = []
for key in anagram:
res.append(anagram[key])
return res
Drawback
Key might become super large. But still smaller than 26 * sizeof(int) in the previous case