Virtual functions in C++

Problem How do virtual functions work in C++? Solution Virtual functions are defined in a super class. It does not require implementation for the super class. But all the extended class have to implement the function. So, the derived class can override the behavior of the virtual function. Example For example, a Shape class can have a virtual draw function. Its derived class, e.g., Circle, has to implement its detailed draw function. [Read More]

Compare and contrast a hash table vs. an STL map

Problem Compare and contrast a hash table vs. an STL map. How is a hash table implemented? If the number of inputs is small, what data structure options can be used instead of a hash table? Solution Hash Table In a hash table, a value is stored by applying hash function on a key. Thus, values are not stored in a hashtable in sorted order. Additionally, since hash tables use the key to find the index that will store the value, an insert/lookup can be done in amortized O(1) time (assuming only a few collisions in the hashtable). [Read More]

Print the last K lines of a file using C++

Problem Write a method to print the last K lines of an input file using C++. Solution Method 1 - Slow and fast file pointer pointer Using the same idea from this article, we place two pointers to the file, the fast pointer and the slow pointer. The fast pointer is K lines ahead of the slow pointer. When the fast pointer reaches the EOF, we can start to print everything from the slow pointer, all the way to the end. [Read More]

How to detect duplicate documents in billions of urls

Problem You have a billion urls, where each is a huge page. How do you detect the duplicate documents? Solution Characterstic of page Pages are huge, so bringing all of them in memory is a costly affair. We need a shorter representation of pages in memory. A hash is an obvious choice for this. A hash can depend on links inside page, title, creation Billions of urls exist so we don’t want to compare every page with every other page that would be O(n^2). [Read More]

Design a database with terabytes of data that supports efficient range queries

Problem You have to design a database that can store terabytes of data. It should support efficient range queries. How would you do it? Solution This rings couple of points in the head. I believe the interviewer was expecting a data structure like hashtable ,B-tree or B+ trees. If we switch to distributed computing based solution, esp when “100 billion records” are involved, I would suggest you to look into Distributed Hash Table and map-reduce (for parallel query processing) [Read More]

Fizz Buzz

Problem Write a program that prints the numbers from 1 to 100. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz” Example of sequence - 3,6,9,12,15,18,1,24,27. Solution Method 1- Brute force Here we notice that that after 15, there is 18. So, we should increment the iterator by 2, and then 1 i. [Read More]

Fizz Buzz

Problem Write a program that prints the numbers from 1 to 100. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz” Example of sequence - 3,6,9,12,15,18,1,24,27. Solution Method 1- Brute force Here we notice that that after 15, there is 18. So, we should increment the iterator by 2, and then 1 i. [Read More]

Finding the celebrity group

Problem Given the group of people in party find the group of celebrities Suppose we have N people and there might be a group of celebrities inside. Every person knows every celebrity and every celebrity knows only every other celebrity. If you are given the function of Knows(x,y) which returns true if x knows y, false otherwise. identify the group of celebrities. This problem is to identify a group of celebrities, and it is not identifying the only celebrity among the people, such as http://k2code. [Read More]

The Celebrity Problem

Problem In a party of N people, only one person is known to everyone. Such a person may be present in the party, if yes, (s)he doesn’t know anyone in the party. We can only ask questions like “does A know B? “. Find the stranger (celebrity) in minimum number of questions. Solution We can describe the problem input as an array of numbers/characters representing persons in the party. [Read More]

How to avoid getting into infinite loops when designing a web crawler

Problem If you were designing a web crawler, how would you avoid getting into infinite loops? Solution First, how does the crawler get into a loop? The answer is very simple: when we re-parse an already parsed page. As answered by Lirik%20%20Per%20Fr0zenFyr%27s%20comment:%20if%20one%20uses%20the%20AOPIC%20algorithm%20for%20selecting%20pages,%20then%20it%27s%20fairly%20easy%20to%20avoid%20bot-traps%20of%20the%20infinite%20loop%20kind.%20Here%20is%20a%20summary%20of%20how%20AOPIC%20works:%20%20%20%20%20%20Get%20a%20set%20of%20N%20seed%20pages.%20%20%20%20%20Allocate%20X%20amount%20of%20credit%20to%20each%20page,%20such%20that%20each%20page%20has%20X/N%20credit%20(i.e.%20equal%20amount%20of%20credit)%20before%20crawling%20has%20started.%20%20%20%20%20Select%20a%20page%20P,%20where%20the%20P%20has%20the%20highest%20amount%20of%20credit%20(or%20if%20all%20pages%20have%20the%20same%20amount%20of%20credit,%20then%20crawl%20a%20random%20page).%20%20%20%20%20Crawl%20page%20P%20(let%27s%20say%20that%20P%20had%20100%20credits%20when%20it%20was%20crawled).%20%20%20%20%20Extract%20all%20the%20links%20from%20page%20P%20(let%27s%20say%20there%20are%2010%20of%20them).%20%20%20%20%20Set%20the%20credits%20of%20P%20to%200.%20%20%20%20%20Take%20a%2010%%20%22tax%22%20and%20allocate%20it%20to%20a%20Lambda%20page.%20%20%20%20%20Allocate%20an%20equal%20amount%20of%20credits%20each%20link%20found%20on%20page%20P%20from%20P%27s%20original%20credit%20-%20the%20tax:%20so%20(100%20(P%20credits)%20-%2010%20(10%%20tax))/10%20(links)%20=%209%20credits%20per%20each%20link.%20%20%20%20%20Repeat%20from%20step%203.%20%20Since%20the%20Lambda%20page%20continuously%20collects%20tax,%20eventually%20it%20will%20be%20the%20page%20with%20the%20largest%20amount%20of%20credit%20and%20we%27ll%20have%20to%20%22crawl%22%20it.%20I%20say%20%22crawl%22%20in%20quotes,%20because%20we%20don%27t%20actually%20make%20an%20HTTP%20request%20for%20the%20Lambda%20page,%20we%20just%20take%20its%20credits%20and%20distribute%20them%20equally%20to%20all%20of%20the%20pages%20in%20our%20database.%20%20Since%20bot%20traps%20only%20give%20internal%20links%20credits%20and%20they%20rarely%20get%20credit%20from%20the%20outside,%20they%20will%20continually%20leak%20credits%20(from%20taxation)%20to%20the%20Lambda%20page.%20The%20Lambda%20page%20will%20distribute%20that%20credits%20out%20to%20all%20of%20the%20pages%20in%20the%20database%20evenly%20and%20upon%20each%20cycle%20the%20bot%20trap%20page%20will%20lose%20more%20and%20more%20credits,%20until%20it%20has%20so%20little%20credits%20that%20it%20almost%20never%20gets%20crawled%20again.%20This%20will%20not%20happen%20with%20good%20pages,%20because%20they%20often%20get%20credits%20from%20back-links%20found%20on%20other%20pages.%20This%20also%20results%20in%20a%20dynamic%20page%20rank%20and%20what%20you%20will%20notice%20is%20that%20any%20time%20you%20take%20a%20snapshot%20of%20your%20database,%20order%20the%20pages%20by%20the%20amount%20of%20credits%20they%20have,%20then%20they%20will%20most%20likely%20be%20ordered%20roughly%20according%20to%20their%20true%20page%20rank.%20%20This%20only%20avoid%20bot%20traps%20of%20the%20infinite-loop%20kind,%20but%20there%20are%20many%20other%20bot%20traps%20which%20you%20should%20watch%20out%20for%20and%20there%20are%20ways%20to%20get%20around%20them%20too.) here : If you want to get a detailed answer take a look at section 3.8 this paper, which describes the URL-seen test of a modern scraper: [Read More]