The Big Programming Thread - Page 808
Forum Index > General Forum |
Thread Rules 1. This is not a "do my homework for me" thread. If you have specific questions, ask, but don't post an assignment or homework problem and expect an exact solution. 2. No recruiting for your cockamamie projects (you won't replace facebook with 3 dudes you found on the internet and $20) 3. If you can't articulate why a language is bad, don't start slinging shit about it. Just remember that nothing is worse than making CSS IE6 compatible. 4. Use [code] tags to format code blocks. | ||
Blisse
Canada3710 Posts
| ||
Deleted User 3420
24492 Posts
| ||
phar
United States1080 Posts
Travis - read this: http://steve-yegge.blogspot.com/2008/03/get-that-job-at-google.html | ||
![]()
tofucake
Hyrule18968 Posts
| ||
Ilikestarcraft
Korea (South)17724 Posts
| ||
Deleted User 3420
24492 Posts
I did miss one question though and I want your opinions on it. The question was about bucket sort. It specified "Given a range greater than N and uniform distribution", what is the big O complexity for average case and worst case. Average case I put O(n), which is correct. Worst case, I know is typically O(n^2) for bucket sort. However, we never really went over this term "uniform distribution". At least not in detail, and it was never stressed. I assumed that meant that it meant that the elements were distributed equally among the buckets. But apparently it's a statistics term and that's not what it means, it just means that the probability for distribution within the range was equal. So, during the test, I went back to this question and stared at it for like 5 minutes. I was like "uniform distribution.. uniform distribution...". And finally I put O(n) based on my assumption. Today in class the professor pretty much threw out a disclaimer about this question. So, I assume lots of people got it wrong. He explained what uniform distribution means. But he isn't going to give points to people who put O(n). It seems unfair. Should I say something to him about it? | ||
Acrofales
Spain17832 Posts
On December 03 2016 01:34 travis wrote: I scored a 64 / 70 on my last exam. yay! I did miss one question though and I want your opinions on it. The question was about bucket sort. It specified "Given a range greater than N and uniform distribution", what is the big O complexity for average case and worst case. Average case I put O(n), which is correct. Worst case, I know is typically O(n^2) for bucket sort. However, we never really went over this term "uniform distribution". At least not in detail, and it was never stressed. I assumed that meant that it meant that the elements were distributed equally among the buckets. But apparently it's a statistics term and that's not what it means, it just means that the probability for distribution within the range was equal. So, during the test, I went back to this question and stared at it for like 5 minutes. I was like "uniform distribution.. uniform distribution...". And finally I put O(n) based on my assumption. Today in class the professor pretty much threw out a disclaimer about this question. So, I assume lots of people got it wrong. He explained what uniform distribution means. But he isn't going to give points to people who put O(n). It seems unfair. Should I say something to him about it? Not sure how old you are and what requirements are for your course, but where I went to school, uniform distribution was taught at high school, so uni professors were allowed to assume you knew that stuff. However, I'm confused. The O(n^2) worst-case for bucket sort is, insofar as I remember, when your data is NOT uniformly distributed, but has clusters. If your data is uniformly distributed, then bucket sort works in O(n) time (technically I think O(n + m), with m the number of buckets), because it can neatly distribute it in buckets. EDIT: wait, what was the exact wording? Because if the wording was "distributed according to a uniform distribution", then you have a *chance* of clustering. And thus, worst case scenario is taking that chance of clustering into account. However, if it says "uniformly distributed" then you already know they are uniformly spread over the range. It's the difference between, I am going to flip a coin, what is the chance of heads? And, I have flipped a coin, and it came up heads, what is the chance that it came up heads? | ||
phar
United States1080 Posts
On December 02 2016 22:39 tofucake wrote: that guy doesn't work at google though :| He's still working out of the Kirkland office, at least last time I checked. | ||
Acrofales
Spain17832 Posts
Bucket sort runs approximately worst-case scenario if 10% (or less) of the buckets have 90% (or more) of the data. Given that you know your data is distributed according to a random variable with a uniform distribution with range N, what is the probability of this occurring? + Show Spoiler + I don't know the answer. I know how to compute the answer if the size of the data is given, but it's a shitty sum of cases in a multinomial distribution. I have no idea how to derive a general formula | ||
Deleted User 3420
24492 Posts
On December 03 2016 02:12 Acrofales wrote: Not sure how old you are and what requirements are for your course, but where I went to school, uniform distribution was taught at high school, so uni professors were allowed to assume you knew that stuff. However, I'm confused. The O(n^2) worst-case for bucket sort is, insofar as I remember, when your data is NOT uniformly distributed, but has clusters. If your data is uniformly distributed, then bucket sort works in O(n) time (technically I think O(n + m), with m the number of buckets), because it can neatly distribute it in buckets. EDIT: wait, what was the exact wording? Because if the wording was "distributed according to a uniform distribution", then you have a *chance* of clustering. And thus, worst case scenario is taking that chance of clustering into account. However, if it says "uniformly distributed" then you already know they are uniformly spread over the range. See now I know what it means for something to be uniformly distributed... But no I certainly didn't know what "uniform distribution" is. Like if there were 5 buckets, 5 elements, then it's uniformly distributed if each bucket has an element. But It didn't say it was uniformly distributed. I can't actually remember the exact wording... Something like "assume a range greater than n, and uniform distribution". See now this is the problem.... lol. Why would he even include the terminology? My opinion is that it isn't fair for us to be tested on a question where the answer depends on us understanding terminology that we weren't taught. | ||
BluzMan
Russian Federation4235 Posts
| ||
Nesserev
Belgium2760 Posts
| ||
BluzMan
Russian Federation4235 Posts
On December 03 2016 08:14 Nesserev wrote: For the worst case, the uniform probability distribution of the key values doesn't actually matter. Regardless of the probability distribution, the worst case is always the same, and it's always O(n²). It does make the worst case more improbable, but the worst case is almost always very improbable. That's not the point. However, the uniform probability distribution of the key values is very important for the average case. If it wasn't uniformly distributed, the average case would not be O(n), but probably be O(m²), where m is the size of the bucket with the most elements. That looks a lot like nitpicking, kind of like "SHA256 is unsafe because in the worst case you easily find a hash collision". Yeah, you do, but you better have a few dozen universes to look in. If you imagine this is about a practical application and ask yourself "I know my 1,000,000 items are uniformly distributed, what is the worst I have to prepare for?", you definitely won't be saying "O(n²)". I'd say the question is very poorly worded, at least in the form presented to us. | ||
Nesserev
Belgium2760 Posts
| ||
ZigguratOfUr
Iraq16955 Posts
On December 03 2016 08:43 Nesserev wrote: No, the question is fine. That's the thing though. Talking about worst case O, is very often nitpicking. Quicksort has worst case O(n²)... that doesn't stop anyone from using it. You still keep have to keep it in mind. A naive implementation of quicksort hits its worse case on a sorted list which is less than ideal. And the fact that quicksort has a worst case O(n²) does in fact keep some people from using it. The C++ standard library generic sort uses introsort instead of quick sort to avoid that worst case. | ||
Acrofales
Spain17832 Posts
On December 03 2016 08:43 Nesserev wrote: No, the question is fine. That's the thing though. Talking about worst case O, is very often nitpicking. Quicksort has worst case O(n²)... that doesn't stop anyone from using it. But if your data is uniformly distributed, you are, by definition, not in the worst case of bucket sort, because you are talking, literally, about the shape of your data and asserting there are no clusters. If, however, your data is pulled from some population with a uniform distribution and you don't know the distribution of your actual data, THEN you are right. Consider the following: Please sort, using bucket sort, the following set: [3, 4, 1, 2, 5, 7, 10, 8, 9, 6] This is pretty much a best case for bucket sort. It is uniformly distributed data. On the other hand, if all we know is that the data comes from a population with a discrete uniform distribution over the integers between 1 and 10, then it is possible (not likely, but possible), that sampling the population 10 times, we get: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1] Clearly this dataset is not uniformly distributed, despite coming from a uniformly distributed population. Now of course, the latter is what we are almost always talking about, because we are almost always sorting things of the latter type. But there is a lot of ambiguity in how the question, as it was passed to us, is given. | ||
BluzMan
Russian Federation4235 Posts
On December 03 2016 08:43 Nesserev wrote: No, the question is fine. That's the thing though. Talking about worst case O, is very often nitpicking. Quicksort has worst case O(n²)... that doesn't stop anyone from using it. Anything that allows ambiguity is not fine, and arguing against ambiguity here is very hard since many people made the same "mistake". The whole O(n) notation has been invented to be useful and nothing else. Applying it in a way that doesn't produce useful answers is a cargo cult, bad practice, teaching that is even worse. Worst case analysis needs to answer the question "what pathological, but realistic data do I have to be aware of?" In practice "realistic pathological" often equals "malicious" and worst case analysis is mainly a form of DOS protection. There's no way to produce a malicious but uniformly distributed data that would crash bucket sort down to quadratic, because it wouldn't by definition be uniformly distributed. If your distribution guarantee is actually coming from somewhere (a statistics test, for instance), there's no way to DOS your code. | ||
Neshapotamus
United States163 Posts
On December 01 2016 02:09 Prillan wrote: Maybe I'm stupid, but how would you implement a weighted graph using the above definition of an adjacency list? I was thinking you can do this: (Java)
BTW, nice job travis. Seems like you did very well on the exam. | ||
Acrofales
Spain17832 Posts
On December 03 2016 09:16 BluzMan wrote: Anything that allows ambiguity is not fine, and arguing against ambiguity here is very hard since many people made the same "mistake". The whole O(n) notation has been invented to be useful and nothing else. Applying it in a way that doesn't produce useful answers is a cargo cult, bad practice, teaching that is even worse. Worst case analysis needs to answer the question "what pathological, but realistic data do I have to be aware of?" In practice "realistic pathological" often equals "malicious" and worst case analysis is mainly a form of DOS protection. There's no way to produce a malicious but uniformly distributed data that would crash bucket sort down to quadratic, because it wouldn't by definition be uniformly distributed. If your distribution guarantee is actually coming from somewhere (a statistics test, for instance), there's no way to DOS your code. To be fair, most problems with sort speed are because of wrong assumptions about the underlying data. If you are sorting a list of names lexicographically, your first thought is that they're approximately uniformly distributed, right? Minus a few uncommon letters like Q or X, but it'd generally work? Actually, just going by first letter, there's going to be some big clumping. Just look here: http://home.uchicago.edu/~jsfalk/misc/baby_names/ Names are between 3 and 4 times more likely to start with a j or an a than most other letters, and there's significant variance in the other probabilities too. Moreover, the frequency of starting letters changes over time, with gender, and of course there are regional differences too. So, if you make a bucket sort for names and expect it to run in O(n) time, you'd (probably) be wrong for any real dataset of names. It probably won't be as slow as O(n^2) either, but rather somewhere in between. And that might trip you up if you make claims about runtime based on such estimates. | ||
Manit0u
Poland17186 Posts
![]() | ||
| ||