Given a set of words in a dictionary, write a program to group all words which are anagrams of each other in to sets.
input: “bat”, “tar”, “xyz” , “tab”, “tar”
output:[[“bat”, tab”], [“tar”, rat”],”xyz” ]
Anagrams
Two words are anagrams if one of them has exactly same characters as that of the another word.
Example : Anagram & Nagaram are anagrams (case-insensitive).
Approach
The simpler logic would be to :
- Use some sort of a hash structure - key being string, and value may be list of words
- Sort all words alphabetically and use the sorted word as the key, the value being the non sorted word. So bat will become abt, and it is the key in the dictionary and values will be bat and tab.
- If the key is found, keep on adding the original word to the ‘value’ list of strings
Something like this :
Solution 1 : Using hash table
def compare\_sort(x):
sorted\_key\_list = list(x)
sorted\_key\_list.sort()
sorted\_str = ','.join(str(n) for n in sorted\_key\_list)
sorted\_str = sorted\_str.replace(',','')
return sorted\_str
source = \['tab','bat','atb','tac','cat','xyz'\]
target\_hash = {}
for x in source:
sorted\_key = compare\_sort(x)
print sorted\_key
if target\_hash.has\_key(sorted\_key):
existing\_list = target\_hash\[sorted\_key\]
existing\_list.append(x)
target\_hash\[sorted\_key\] = existing\_list
else:
target\_list =\[x\]
target\_hash\[sorted\_key\] = target\_list
print target\_hash
Complexity: number of words in dictionary
space complexity : hash table entries
Solution 2 - Assigning characters with prime numbers
The obvious solution is to map each character to a prime number and multiply the prime numbers. So if ‘a’’ -> 2 and ‘b’ -> 3, then
- ‘ab’ -> 6
- ‘ba’ -> 6
- ‘bab’ -> 18
- ‘abba’ -> 36
- ‘baba’ -> 36
To minimise the chance of overflow, the smallest primes could be assigned to the more frequent letters (e,t,i,a,n). Note: The 26th prime is 101.So, for all the words where the integer comes out to be same, you have an anagram. Here is more you can read on this.