kodeknight

Compare and contrast a hash table vs. an STL map

Posted on April 3, 2014 (Last modified on August 7, 2020) | 3 minutes | Kinshuk Chandra

Problem Compare and contrast a hash table vs. an STL map. How is a hash table implemented? If the number of inputs is small, what data structure options can be used instead of a hash table? Solution Hash Table In a hash table, a value is stored by applying hash function on a key. Thus, values are not stored in a hashtable in sorted order. Additionally, since hash tables use the key to find the index that will store the value, an insert/lookup can be done in amortized O(1) time (assuming only a few collisions in the hashtable). [Read More]

Print the last K lines of a file using C++

Posted on April 3, 2014 (Last modified on August 7, 2020) | 2 minutes | Kinshuk Chandra

Problem Write a method to print the last K lines of an input file using C++. Solution Method 1 - Slow and fast file pointer pointer Using the same idea from this article, we place two pointers to the file, the fast pointer and the slow pointer. The fast pointer is K lines ahead of the slow pointer. When the fast pointer reaches the EOF, we can start to print everything from the slow pointer, all the way to the end. [Read More]

cpp kodeknight CrackingTheCodingInterview

How to detect duplicate documents in billions of urls

Posted on April 3, 2014 (Last modified on August 7, 2020) | 3 minutes | Kinshuk Chandra

Problem You have a billion urls, where each is a huge page. How do you detect the duplicate documents? Solution Characterstic of page Pages are huge, so bringing all of them in memory is a costly affair. We need a shorter representation of pages in memory. A hash is an obvious choice for this. A hash can depend on links inside page, title, creation Billions of urls exist so we don’t want to compare every page with every other page that would be O(n^2). [Read More]

kodeknight System Design And Memory Limits incomplete CrackingTheCodingInterview limited memory

Design a database with terabytes of data that supports efficient range queries

Posted on April 3, 2014 (Last modified on August 7, 2020) | 2 minutes | Kinshuk Chandra

Problem You have to design a database that can store terabytes of data. It should support efficient range queries. How would you do it? Solution This rings couple of points in the head. I believe the interviewer was expecting a data structure like hashtable ,B-tree or B+ trees. If we switch to distributed computing based solution, esp when “100 billion records” are involved, I would suggest you to look into Distributed Hash Table and map-reduce (for parallel query processing) [Read More]

Database kodeknight System Design And Memory Limits CrackingTheCodingInterview limited memory

Fizz Buzz

Posted on April 3, 2014 (Last modified on August 7, 2020) | 2 minutes | Kinshuk Chandra

Problem Write a program that prints the numbers from 1 to 100. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz” Example of sequence - 3,6,9,12,15,18,1,24,27. Solution Method 1- Brute force Here we notice that that after 15, there is 18. So, we should increment the iterator by 2, and then 1 i. [Read More]

kodeknight prime prime-factor factorization

Fizz Buzz

Posted on April 3, 2014 (Last modified on August 11, 2020) | 2 minutes | Kinshuk Chandra

kodeknight prime prime-factor factorization

Finding the celebrity group

Posted on April 2, 2014 (Last modified on August 7, 2020) | 2 minutes | Kinshuk Chandra

Problem Given the group of people in party find the group of celebrities Suppose we have N people and there might be a group of celebrities inside. Every person knows every celebrity and every celebrity knows only every other celebrity. If you are given the function of Knows(x,y) which returns true if x knows y, false otherwise. identify the group of celebrities. This problem is to identify a group of celebrities, and it is not identifying the only celebrity among the people, such as http://k2code. [Read More]

Algorithms kodeknight celebrity

The Celebrity Problem

Posted on April 2, 2014 (Last modified on August 7, 2020) | 3 minutes | Kinshuk Chandra

Problem In a party of N people, only one person is known to everyone. Such a person may be present in the party, if yes, (s)he doesn’t know anyone in the party. We can only ask questions like “does A know B? “. Find the stranger (celebrity) in minimum number of questions. Solution We can describe the problem input as an array of numbers/characters representing persons in the party. [Read More]

Graph kodeknight recursion celebrity stack

How to avoid getting into infinite loops when designing a web crawler

Posted on April 1, 2014 (Last modified on August 7, 2020) | 5 minutes | Kinshuk Chandra

Problem If you were designing a web crawler, how would you avoid getting into infinite loops? Solution First, how does the crawler get into a loop? The answer is very simple: when we re-parse an already parsed page. As answered by Lirik%20%20Per%20Fr0zenFyr%27s%20comment:%20if%20one%20uses%20the%20AOPIC%20algorithm%20for%20selecting%20pages,%20then%20it%27s%20fairly%20easy%20to%20avoid%20bot-traps%20of%20the%20infinite%20loop%20kind.%20Here%20is%20a%20summary%20of%20how%20AOPIC%20works:%20%20%20%20%20%20Get%20a%20set%20of%20N%20seed%20pages.%20%20%20%20%20Allocate%20X%20amount%20of%20credit%20to%20each%20page,%20such%20that%20each%20page%20has%20X/N%20credit%20(i.e.%20equal%20amount%20of%20credit)%20before%20crawling%20has%20started.%20%20%20%20%20Select%20a%20page%20P,%20where%20the%20P%20has%20the%20highest%20amount%20of%20credit%20(or%20if%20all%20pages%20have%20the%20same%20amount%20of%20credit,%20then%20crawl%20a%20random%20page).%20%20%20%20%20Crawl%20page%20P%20(let%27s%20say%20that%20P%20had%20100%20credits%20when%20it%20was%20crawled).%20%20%20%20%20Extract%20all%20the%20links%20from%20page%20P%20(let%27s%20say%20there%20are%2010%20of%20them).%20%20%20%20%20Set%20the%20credits%20of%20P%20to%200.%20%20%20%20%20Take%20a%2010%%20%22tax%22%20and%20allocate%20it%20to%20a%20Lambda%20page.%20%20%20%20%20Allocate%20an%20equal%20amount%20of%20credits%20each%20link%20found%20on%20page%20P%20from%20P%27s%20original%20credit%20-%20the%20tax:%20so%20(100%20(P%20credits)%20-%2010%20(10%%20tax))/10%20(links)%20=%209%20credits%20per%20each%20link.%20%20%20%20%20Repeat%20from%20step%203.%20%20Since%20the%20Lambda%20page%20continuously%20collects%20tax,%20eventually%20it%20will%20be%20the%20page%20with%20the%20largest%20amount%20of%20credit%20and%20we%27ll%20have%20to%20%22crawl%22%20it.%20I%20say%20%22crawl%22%20in%20quotes,%20because%20we%20don%27t%20actually%20make%20an%20HTTP%20request%20for%20the%20Lambda%20page,%20we%20just%20take%20its%20credits%20and%20distribute%20them%20equally%20to%20all%20of%20the%20pages%20in%20our%20database.%20%20Since%20bot%20traps%20only%20give%20internal%20links%20credits%20and%20they%20rarely%20get%20credit%20from%20the%20outside,%20they%20will%20continually%20leak%20credits%20(from%20taxation)%20to%20the%20Lambda%20page.%20The%20Lambda%20page%20will%20distribute%20that%20credits%20out%20to%20all%20of%20the%20pages%20in%20the%20database%20evenly%20and%20upon%20each%20cycle%20the%20bot%20trap%20page%20will%20lose%20more%20and%20more%20credits,%20until%20it%20has%20so%20little%20credits%20that%20it%20almost%20never%20gets%20crawled%20again.%20This%20will%20not%20happen%20with%20good%20pages,%20because%20they%20often%20get%20credits%20from%20back-links%20found%20on%20other%20pages.%20This%20also%20results%20in%20a%20dynamic%20page%20rank%20and%20what%20you%20will%20notice%20is%20that%20any%20time%20you%20take%20a%20snapshot%20of%20your%20database,%20order%20the%20pages%20by%20the%20amount%20of%20credits%20they%20have,%20then%20they%20will%20most%20likely%20be%20ordered%20roughly%20according%20to%20their%20true%20page%20rank.%20%20This%20only%20avoid%20bot%20traps%20of%20the%20infinite-loop%20kind,%20but%20there%20are%20many%20other%20bot%20traps%20which%20you%20should%20watch%20out%20for%20and%20there%20are%20ways%20to%20get%20around%20them%20too.) here : If you want to get a detailed answer take a look at section 3.8 this paper, which describes the URL-seen test of a modern scraper: [Read More]

kodeknight System Design And Memory Limits CrackingTheCodingInterview limited memory

Find duplicate elements in the array with 4KB memory only

Posted on April 1, 2014 (Last modified on August 7, 2020) | 2 minutes | Kinshuk Chandra

Problem You have an array with all the numbers from 1 to N, where N is at most 32,000. The array may have duplicate entries and you do not know what N is. With only 4KB of memory available, how would you print all duplicate elements in the array? Solution 4KB = 32768 bits. We can use a byte array to represent if we have seen number i. If so, we make it 1. [Read More]

kodeknight System Design And Memory Limits duplicate CrackingTheCodingInterview limited memory