LC49. Group Anagrams

Given an array of strings, group anagrams together.

Anagrams are words that are made of the same letters but in different order.

Brute force

Sort each word in the input lexicographically, then put the same word in the same dict that uses the sorted word as key.

    def groupAnagrams(self, strs: 'List[str]') -> 'List[List[str]]':
        """
        vf
        """

        anagram = {}
        res = []
        for s in strs:
            sorted_s = sorted(s)
            anagram_key = tuple(sorted_s)
            if anagram_key not in anagram:
                anagram[anagram_key] = [s]
            else:
                anagram[anagram_key].append(s)

        for key in anagram:
            res.append(anagram[key])
        return res

Only immutable types can be used as dict key.

Can we do away with sorting lexigraphically?

I was thinking computing a hash based on the letter frequency and use the hash.
The key is costlier than previous.

        anagram = {}
        for s in strs:
            freq = [0] * 26
            for c in s:
                ascii = ord(c) - ord('a')
                freq[ascii] += 1

            anagram_key = tuple(freq)
            if anagram_key not in anagram:
                anagram[anagram_key] = [s]
            else:
                anagram[anagram_key].append(s)

        res = []
        for key in anagram:
            res.append(anagram[key])
        return res

We probably can’t do away with dict

Using prime

Before we code, we need to know the fundamental theorem of arithmetic

Fundamental theorem of arithmetic

Every number larger than 2 has prime factorization

This factorization is unique

No matter how we factorize a number, the frequency of each prime factor won’t change.

We first init a list of the first 26 prime numbers.
We convert each char in our string into 0-25. Each letter is associated now with a prime number. For example, a-2, b-3, c-5 and so on. Then create the key for anagram dict by multiplying prime number associated with each letter.
Since all words that share the same letters will end up with the same key and no other can(since unique prime factorization) we can use the key as a dictionary.

    def groupAnagrams(self, strs: 'List[str]') -> 'List[List[str]]':
        """
        vf
        """
        prime = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103]
        anagram = {}
        for s in strs:
            key = 1
            for c in s:
                ascii = ord(c) - ord('a')
                key *= prime[ascii]

            if key not in anagram:
                anagram[key] = [s]
            else:
                anagram[key].append(s)


        res = []
        for key in anagram:
            res.append(anagram[key])

        return res

Drawback

Key might become super large. But still smaller than 26 * sizeof(int) in the previous case

Leave a Reply

Your email address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax