Problem
Write a method that takes a pointer to a Node structure as a parameter and returns a complete copy of the passed-in data structure. The Node structure contains two pointers to other Node structures.
Solution
A shallow copy copies the array and maintains references or copies address to the original objects.
A deep copy will copy (clone) the objects too so they bear no relation to the original.
[Read More]
Virtual functions in C++
Problem How do virtual functions work in C++?
Solution Virtual functions are defined in a super class. It does not require implementation for the super class. But all the extended class have to implement the function. So, the derived class can override the behavior of the virtual function.
Example
For example, a Shape class can have a virtual draw function. Its derived class, e.g., Circle, has to implement its detailed draw function.
[Read More]
Compare and contrast a hash table vs. an STL map
Problem
Compare and contrast a hash table vs. an STL map. How is a hash table implemented? If the number of inputs is small, what data structure options can be used instead of a hash table?
Solution
Hash Table
In a hash table, a value is stored by applying hash function on a key. Thus, values are not stored in a hashtable in sorted order. Additionally, since hash tables use the key to find the index that will store the value, an insert/lookup can be done in amortized O(1) time (assuming only a few collisions in the hashtable).
[Read More]
Print the last K lines of a file using C++
Problem
Write a method to print the last K lines of an input file using C++.
Solution
Method 1 - Slow and fast file pointer pointer
Using the same idea from this article, we place two pointers to the file, the fast pointer and the slow pointer. The fast pointer is K lines ahead of the slow pointer. When the fast pointer reaches the EOF, we can start to print everything from the slow pointer, all the way to the end.
[Read More]
How to detect duplicate documents in billions of urls
Problem
You have a billion urls, where each is a huge page. How do you detect the duplicate documents?
Solution
Characterstic of page
Pages are huge, so bringing all of them in memory is a costly affair. We need a shorter representation of pages in memory. A hash is an obvious choice for this. A hash can depend on links inside page, title, creation Billions of urls exist so we don’t want to compare every page with every other page that would be O(n^2).
[Read More]
Design a database with terabytes of data that supports efficient range queries
Problem
You have to design a database that can store terabytes of data. It should support efficient range queries. How would you do it?
Solution
This rings couple of points in the head. I believe the interviewer was expecting a data structure like hashtable ,B-tree or B+ trees. If we switch to distributed computing based solution, esp when “100 billion records” are involved, I would suggest you to look into Distributed Hash Table and map-reduce (for parallel query processing)
[Read More]
How to avoid getting into infinite loops when designing a web crawler
Problem
If you were designing a web crawler, how would you avoid getting into infinite loops?
Solution
First, how does the crawler get into a loop? The answer is very simple: when we re-parse an already parsed page.
As answered by Lirik%20%20Per%20Fr0zenFyr%27s%20comment:%20if%20one%20uses%20the%20AOPIC%20algorithm%20for%20selecting%20pages,%20then%20it%27s%20fairly%20easy%20to%20avoid%20bot-traps%20of%20the%20infinite%20loop%20kind.%20Here%20is%20a%20summary%20of%20how%20AOPIC%20works:%20%20%20%20%20%20Get%20a%20set%20of%20N%20seed%20pages.%20%20%20%20%20Allocate%20X%20amount%20of%20credit%20to%20each%20page,%20such%20that%20each%20page%20has%20X/N%20credit%20(i.e.%20equal%20amount%20of%20credit)%20before%20crawling%20has%20started.%20%20%20%20%20Select%20a%20page%20P,%20where%20the%20P%20has%20the%20highest%20amount%20of%20credit%20(or%20if%20all%20pages%20have%20the%20same%20amount%20of%20credit,%20then%20crawl%20a%20random%20page).%20%20%20%20%20Crawl%20page%20P%20(let%27s%20say%20that%20P%20had%20100%20credits%20when%20it%20was%20crawled).%20%20%20%20%20Extract%20all%20the%20links%20from%20page%20P%20(let%27s%20say%20there%20are%2010%20of%20them).%20%20%20%20%20Set%20the%20credits%20of%20P%20to%200.%20%20%20%20%20Take%20a%2010%%20%22tax%22%20and%20allocate%20it%20to%20a%20Lambda%20page.%20%20%20%20%20Allocate%20an%20equal%20amount%20of%20credits%20each%20link%20found%20on%20page%20P%20from%20P%27s%20original%20credit%20-%20the%20tax:%20so%20(100%20(P%20credits)%20-%2010%20(10%%20tax))/10%20(links)%20=%209%20credits%20per%20each%20link.%20%20%20%20%20Repeat%20from%20step%203.%20%20Since%20the%20Lambda%20page%20continuously%20collects%20tax,%20eventually%20it%20will%20be%20the%20page%20with%20the%20largest%20amount%20of%20credit%20and%20we%27ll%20have%20to%20%22crawl%22%20it.%20I%20say%20%22crawl%22%20in%20quotes,%20because%20we%20don%27t%20actually%20make%20an%20HTTP%20request%20for%20the%20Lambda%20page,%20we%20just%20take%20its%20credits%20and%20distribute%20them%20equally%20to%20all%20of%20the%20pages%20in%20our%20database.%20%20Since%20bot%20traps%20only%20give%20internal%20links%20credits%20and%20they%20rarely%20get%20credit%20from%20the%20outside,%20they%20will%20continually%20leak%20credits%20(from%20taxation)%20to%20the%20Lambda%20page.%20The%20Lambda%20page%20will%20distribute%20that%20credits%20out%20to%20all%20of%20the%20pages%20in%20the%20database%20evenly%20and%20upon%20each%20cycle%20the%20bot%20trap%20page%20will%20lose%20more%20and%20more%20credits,%20until%20it%20has%20so%20little%20credits%20that%20it%20almost%20never%20gets%20crawled%20again.%20This%20will%20not%20happen%20with%20good%20pages,%20because%20they%20often%20get%20credits%20from%20back-links%20found%20on%20other%20pages.%20This%20also%20results%20in%20a%20dynamic%20page%20rank%20and%20what%20you%20will%20notice%20is%20that%20any%20time%20you%20take%20a%20snapshot%20of%20your%20database,%20order%20the%20pages%20by%20the%20amount%20of%20credits%20they%20have,%20then%20they%20will%20most%20likely%20be%20ordered%20roughly%20according%20to%20their%20true%20page%20rank.%20%20This%20only%20avoid%20bot%20traps%20of%20the%20infinite-loop%20kind,%20but%20there%20are%20many%20other%20bot%20traps%20which%20you%20should%20watch%20out%20for%20and%20there%20are%20ways%20to%20get%20around%20them%20too.) here : If you want to get a detailed answer take a look at section 3.8 this paper, which describes the URL-seen test of a modern scraper:
[Read More]
Find duplicate elements in the array with 4KB memory only
Problem
You have an array with all the numbers from 1 to N, where N is at most 32,000. The array may have duplicate entries and you do not know what N is. With only 4KB of memory available, how would you print all duplicate elements in the array?
Solution
4KB = 32768 bits.
We can use a byte array to represent if we have seen number i. If so, we make it 1.
[Read More]
Generate an integer which is not in a file with 4 billion integers, limited memory
Problem
Given an input file with four billion integers, provide an algorithm to generate an integer which is not contained in the file. Assume you have 1 GB of memory.
FOLLOW UP
What if you have only 10 MB of memory?
Solution
Case 1 - If very large integers are allowed then one can generate a number that is statistically likely to be unique in O(1) time. For example, a psuedo random 128 bit integer like a GUID will only collide with one of the existing four billion integers in the set in less than one out of every 64 billion billion billion cases.
[Read More]
How to test an ATM in a distributed banking system
Problem
How would you test an ATM in a distributed banking system?
Solution
The first thing to do on this question is to clarify assumptions Ask the following questions:
Who is going to use the ATM? Answers might be “anyone”, or it might be “blind people” – or any number of other answers What are they going to use it for? Answers might be “withdrawing money”, “transferring money”, “checking their balance”, or many other answers What tools do we have to test?
[Read More]