Problem
Compare and contrast a hash table vs. an STL map. How is a hash table implemented? If the number of inputs is small, what data structure options can be used instead of a hash table?
Solution
Hash Table
In a hash table, a value is stored by applying hash function on a key. Thus, values are not stored in a hashtable in sorted order. Additionally, since hash tables use the key to find the index that will store the value, an insert/lookup can be done in amortized O(1) time (assuming only a few collisions in the hashtable).
[Read More]
Print the last K lines of a file using C++
Problem
Write a method to print the last K lines of an input file using C++.
Solution
Method 1 - Slow and fast file pointer pointer
Using the same idea from this article, we place two pointers to the file, the fast pointer and the slow pointer. The fast pointer is K lines ahead of the slow pointer. When the fast pointer reaches the EOF, we can start to print everything from the slow pointer, all the way to the end.
[Read More]
How to detect duplicate documents in billions of urls
Problem
You have a billion urls, where each is a huge page. How do you detect the duplicate documents?
Solution
Characterstic of page
Pages are huge, so bringing all of them in memory is a costly affair. We need a shorter representation of pages in memory. A hash is an obvious choice for this. A hash can depend on links inside page, title, creation Billions of urls exist so we don’t want to compare every page with every other page that would be O(n^2).
[Read More]
Design a database with terabytes of data that supports efficient range queries
Problem
You have to design a database that can store terabytes of data. It should support efficient range queries. How would you do it?
Solution
This rings couple of points in the head. I believe the interviewer was expecting a data structure like hashtable ,B-tree or B+ trees. If we switch to distributed computing based solution, esp when “100 billion records” are involved, I would suggest you to look into Distributed Hash Table and map-reduce (for parallel query processing)
[Read More]
Fizz Buzz
Problem
Write a program that prints the numbers from 1 to 100. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz”
Example of sequence - 3,6,9,12,15,18,1,24,27.
Solution
Method 1- Brute force
Here we notice that that after 15, there is 18. So, we should increment the iterator by 2, and then 1 i.
[Read More]
Fizz Buzz
Problem
Write a program that prints the numbers from 1 to 100. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz”
Example of sequence - 3,6,9,12,15,18,1,24,27.
Solution
Method 1- Brute force
Here we notice that that after 15, there is 18. So, we should increment the iterator by 2, and then 1 i.
[Read More]
Finding the celebrity group
Problem
Given the group of people in party find the group of celebrities
Suppose we have N people and there might be a group of celebrities inside.
Every person knows every celebrity and every celebrity knows only every other celebrity.
If you are given the function of Knows(x,y) which returns true if x knows y, false otherwise. identify the group of celebrities.
This problem is to identify a group of celebrities, and it is not identifying the only celebrity among the people, such as http://k2code.
[Read More]
The Celebrity Problem
Problem
In a party of N people, only one person is known to everyone. Such a person may be present in the party, if yes, (s)he doesn’t know anyone in the party. We can only ask questions like “does A know B? “. Find the stranger (celebrity) in minimum number of questions.
Solution
We can describe the problem input as an array of numbers/characters representing persons in the party.
[Read More]
How to avoid getting into infinite loops when designing a web crawler
Problem
If you were designing a web crawler, how would you avoid getting into infinite loops?
Solution
First, how does the crawler get into a loop? The answer is very simple: when we re-parse an already parsed page.
As answered by Lirik%20%20Per%20Fr0zenFyr%27s%20comment:%20if%20one%20uses%20the%20AOPIC%20algorithm%20for%20selecting%20pages,%20then%20it%27s%20fairly%20easy%20to%20avoid%20bot-traps%20of%20the%20infinite%20loop%20kind.%20Here%20is%20a%20summary%20of%20how%20AOPIC%20works:%20%20%20%20%20%20Get%20a%20set%20of%20N%20seed%20pages.%20%20%20%20%20Allocate%20X%20amount%20of%20credit%20to%20each%20page,%20such%20that%20each%20page%20has%20X/N%20credit%20(i.e.%20equal%20amount%20of%20credit)%20before%20crawling%20has%20started.%20%20%20%20%20Select%20a%20page%20P,%20where%20the%20P%20has%20the%20highest%20amount%20of%20credit%20(or%20if%20all%20pages%20have%20the%20same%20amount%20of%20credit,%20then%20crawl%20a%20random%20page).%20%20%20%20%20Crawl%20page%20P%20(let%27s%20say%20that%20P%20had%20100%20credits%20when%20it%20was%20crawled).%20%20%20%20%20Extract%20all%20the%20links%20from%20page%20P%20(let%27s%20say%20there%20are%2010%20of%20them).%20%20%20%20%20Set%20the%20credits%20of%20P%20to%200.%20%20%20%20%20Take%20a%2010%%20%22tax%22%20and%20allocate%20it%20to%20a%20Lambda%20page.%20%20%20%20%20Allocate%20an%20equal%20amount%20of%20credits%20each%20link%20found%20on%20page%20P%20from%20P%27s%20original%20credit%20-%20the%20tax:%20so%20(100%20(P%20credits)%20-%2010%20(10%%20tax))/10%20(links)%20=%209%20credits%20per%20each%20link.%20%20%20%20%20Repeat%20from%20step%203.%20%20Since%20the%20Lambda%20page%20continuously%20collects%20tax,%20eventually%20it%20will%20be%20the%20page%20with%20the%20largest%20amount%20of%20credit%20and%20we%27ll%20have%20to%20%22crawl%22%20it.%20I%20say%20%22crawl%22%20in%20quotes,%20because%20we%20don%27t%20actually%20make%20an%20HTTP%20request%20for%20the%20Lambda%20page,%20we%20just%20take%20its%20credits%20and%20distribute%20them%20equally%20to%20all%20of%20the%20pages%20in%20our%20database.%20%20Since%20bot%20traps%20only%20give%20internal%20links%20credits%20and%20they%20rarely%20get%20credit%20from%20the%20outside,%20they%20will%20continually%20leak%20credits%20(from%20taxation)%20to%20the%20Lambda%20page.%20The%20Lambda%20page%20will%20distribute%20that%20credits%20out%20to%20all%20of%20the%20pages%20in%20the%20database%20evenly%20and%20upon%20each%20cycle%20the%20bot%20trap%20page%20will%20lose%20more%20and%20more%20credits,%20until%20it%20has%20so%20little%20credits%20that%20it%20almost%20never%20gets%20crawled%20again.%20This%20will%20not%20happen%20with%20good%20pages,%20because%20they%20often%20get%20credits%20from%20back-links%20found%20on%20other%20pages.%20This%20also%20results%20in%20a%20dynamic%20page%20rank%20and%20what%20you%20will%20notice%20is%20that%20any%20time%20you%20take%20a%20snapshot%20of%20your%20database,%20order%20the%20pages%20by%20the%20amount%20of%20credits%20they%20have,%20then%20they%20will%20most%20likely%20be%20ordered%20roughly%20according%20to%20their%20true%20page%20rank.%20%20This%20only%20avoid%20bot%20traps%20of%20the%20infinite-loop%20kind,%20but%20there%20are%20many%20other%20bot%20traps%20which%20you%20should%20watch%20out%20for%20and%20there%20are%20ways%20to%20get%20around%20them%20too.) here : If you want to get a detailed answer take a look at section 3.8 this paper, which describes the URL-seen test of a modern scraper:
[Read More]
Find duplicate elements in the array with 4KB memory only
Problem
You have an array with all the numbers from 1 to N, where N is at most 32,000. The array may have duplicate entries and you do not know what N is. With only 4KB of memory available, how would you print all duplicate elements in the array?
Solution
4KB = 32768 bits.
We can use a byte array to represent if we have seen number i. If so, we make it 1.
[Read More]