Deep copy and shallow copy in C++

Problem Write a method that takes a pointer to a Node structure as a parameter and returns a complete copy of the passed-in data structure. The Node structure contains two pointers to other Node structures. Solution A shallow copy copies the array and maintains references or copies address to the original objects. A deep copy will copy (clone) the objects too so they bear no relation to the original. [Read More]

Virtual functions in C++

Problem How do virtual functions work in C++? Solution Virtual functions are defined in a super class. It does not require implementation for the super class. But all the extended class have to implement the function. So, the derived class can override the behavior of the virtual function. Example For example, a Shape class can have a virtual draw function. Its derived class, e.g., Circle, has to implement its detailed draw function. [Read More]

Compare and contrast a hash table vs. an STL map

Problem Compare and contrast a hash table vs. an STL map. How is a hash table implemented? If the number of inputs is small, what data structure options can be used instead of a hash table? Solution Hash Table In a hash table, a value is stored by applying hash function on a key. Thus, values are not stored in a hashtable in sorted order. Additionally, since hash tables use the key to find the index that will store the value, an insert/lookup can be done in amortized O(1) time (assuming only a few collisions in the hashtable). [Read More]

Print the last K lines of a file using C++

Problem Write a method to print the last K lines of an input file using C++. Solution Method 1 - Slow and fast file pointer pointer Using the same idea from this article, we place two pointers to the file, the fast pointer and the slow pointer. The fast pointer is K lines ahead of the slow pointer. When the fast pointer reaches the EOF, we can start to print everything from the slow pointer, all the way to the end. [Read More]

How to detect duplicate documents in billions of urls

Problem You have a billion urls, where each is a huge page. How do you detect the duplicate documents? Solution Characterstic of page Pages are huge, so bringing all of them in memory is a costly affair. We need a shorter representation of pages in memory. A hash is an obvious choice for this. A hash can depend on links inside page, title, creation Billions of urls exist so we don’t want to compare every page with every other page that would be O(n^2). [Read More]

Design a database with terabytes of data that supports efficient range queries

Problem You have to design a database that can store terabytes of data. It should support efficient range queries. How would you do it? Solution This rings couple of points in the head. I believe the interviewer was expecting a data structure like hashtable ,B-tree or B+ trees. If we switch to distributed computing based solution, esp when “100 billion records” are involved, I would suggest you to look into Distributed Hash Table and map-reduce (for parallel query processing) [Read More]

How to avoid getting into infinite loops when designing a web crawler

Problem If you were designing a web crawler, how would you avoid getting into infinite loops? Solution First, how does the crawler get into a loop? The answer is very simple: when we re-parse an already parsed page. As answered by Lirik%20%20Per%20Fr0zenFyr%27s%20comment:%20if%20one%20uses%20the%20AOPIC%20algorithm%20for%20selecting%20pages,%20then%20it%27s%20fairly%20easy%20to%20avoid%20bot-traps%20of%20the%20infinite%20loop%20kind.%20Here%20is%20a%20summary%20of%20how%20AOPIC%20works:%20%20%20%20%20%20Get%20a%20set%20of%20N%20seed%20pages.%20%20%20%20%20Allocate%20X%20amount%20of%20credit%20to%20each%20page,%20such%20that%20each%20page%20has%20X/N%20credit%20(i.e.%20equal%20amount%20of%20credit)%20before%20crawling%20has%20started.%20%20%20%20%20Select%20a%20page%20P,%20where%20the%20P%20has%20the%20highest%20amount%20of%20credit%20(or%20if%20all%20pages%20have%20the%20same%20amount%20of%20credit,%20then%20crawl%20a%20random%20page).%20%20%20%20%20Crawl%20page%20P%20(let%27s%20say%20that%20P%20had%20100%20credits%20when%20it%20was%20crawled).%20%20%20%20%20Extract%20all%20the%20links%20from%20page%20P%20(let%27s%20say%20there%20are%2010%20of%20them).%20%20%20%20%20Set%20the%20credits%20of%20P%20to%200.%20%20%20%20%20Take%20a%2010%%20%22tax%22%20and%20allocate%20it%20to%20a%20Lambda%20page.%20%20%20%20%20Allocate%20an%20equal%20amount%20of%20credits%20each%20link%20found%20on%20page%20P%20from%20P%27s%20original%20credit%20-%20the%20tax:%20so%20(100%20(P%20credits)%20-%2010%20(10%%20tax))/10%20(links)%20=%209%20credits%20per%20each%20link.%20%20%20%20%20Repeat%20from%20step%203.%20%20Since%20the%20Lambda%20page%20continuously%20collects%20tax,%20eventually%20it%20will%20be%20the%20page%20with%20the%20largest%20amount%20of%20credit%20and%20we%27ll%20have%20to%20%22crawl%22%20it.%20I%20say%20%22crawl%22%20in%20quotes,%20because%20we%20don%27t%20actually%20make%20an%20HTTP%20request%20for%20the%20Lambda%20page,%20we%20just%20take%20its%20credits%20and%20distribute%20them%20equally%20to%20all%20of%20the%20pages%20in%20our%20database.%20%20Since%20bot%20traps%20only%20give%20internal%20links%20credits%20and%20they%20rarely%20get%20credit%20from%20the%20outside,%20they%20will%20continually%20leak%20credits%20(from%20taxation)%20to%20the%20Lambda%20page.%20The%20Lambda%20page%20will%20distribute%20that%20credits%20out%20to%20all%20of%20the%20pages%20in%20the%20database%20evenly%20and%20upon%20each%20cycle%20the%20bot%20trap%20page%20will%20lose%20more%20and%20more%20credits,%20until%20it%20has%20so%20little%20credits%20that%20it%20almost%20never%20gets%20crawled%20again.%20This%20will%20not%20happen%20with%20good%20pages,%20because%20they%20often%20get%20credits%20from%20back-links%20found%20on%20other%20pages.%20This%20also%20results%20in%20a%20dynamic%20page%20rank%20and%20what%20you%20will%20notice%20is%20that%20any%20time%20you%20take%20a%20snapshot%20of%20your%20database,%20order%20the%20pages%20by%20the%20amount%20of%20credits%20they%20have,%20then%20they%20will%20most%20likely%20be%20ordered%20roughly%20according%20to%20their%20true%20page%20rank.%20%20This%20only%20avoid%20bot%20traps%20of%20the%20infinite-loop%20kind,%20but%20there%20are%20many%20other%20bot%20traps%20which%20you%20should%20watch%20out%20for%20and%20there%20are%20ways%20to%20get%20around%20them%20too.) here : If you want to get a detailed answer take a look at section 3.8 this paper, which describes the URL-seen test of a modern scraper: [Read More]

Find duplicate elements in the array with 4KB memory only

Problem You have an array with all the numbers from 1 to N, where N is at most 32,000. The array may have duplicate entries and you do not know what N is. With only 4KB of memory available, how would you print all duplicate elements in the array? Solution 4KB = 32768 bits. We can use a byte array to represent if we have seen number i. If so, we make it 1. [Read More]

Generate an integer which is not in a file with 4 billion integers, limited memory

Problem Given an input file with four billion integers, provide an algorithm to generate an integer which is not contained in the file. Assume you have 1 GB of memory. FOLLOW UP What if you have only 10 MB of memory? Solution Case 1 - If very large integers are allowed then one can generate a number that is statistically likely to be unique in O(1) time. For example, a psuedo random 128 bit integer like a GUID will only collide with one of the existing four billion integers in the set in less than one out of every 64 billion billion billion cases. [Read More]

How to test an ATM in a distributed banking system

Problem How would you test an ATM in a distributed banking system? Solution The first thing to do on this question is to clarify assumptions Ask the following questions: Who is going to use the ATM? Answers might be “anyone”, or it might be “blind people” – or any number of other answers What are they going to use it for? Answers might be “withdrawing money”, “transferring money”, “checking their balance”, or many other answers What tools do we have to test? [Read More]