|
Thread Rules 1. This is not a "do my homework for me" thread. If you have specific questions, ask, but don't post an assignment or homework problem and expect an exact solution. 2. No recruiting for your cockamamie projects (you won't replace facebook with 3 dudes you found on the internet and $20) 3. If you can't articulate why a language is bad, don't start slinging shit about it. Just remember that nothing is worse than making CSS IE6 compatible. 4. Use [code] tags to format code blocks. |
On April 28 2017 02:20 enigmaticcam wrote: If anyone here is familiar with Microsoft SQL Server, I have an optimization question too:
Why does this script only take 1 second to run... *snip* ...and yet this script takes about 10 minutes. *snip* The table size is about 22 million rows. So I know the issue is that one is doing a distinct on the entire table, and one is doing a distinct only on the temp table. But I would think the second one would first filter for the smaller subset before performing the distinct, but clearly it's not. Is there a reason for that? 1. Set an index on BrandlabelCode include BrandLabelName. Also properly index VistaarExtractStagingArchive 2. distinct is slow. try to avoid it. 3. perhaps create a 'brands'-table, so you don't have to distinct on masterproduct 4. analyzing the query plan will show you the exact problems 5. you can also use the 'with'-statement that may speed up execution, for example:
with temp as( select PSID , Product , [Geography] , EffectiveMonth from vistaar.VistaarExtractStagingArchive where JobRequestId = @jobRequestId ), brands as( select distinct BrandLabelName, BrandLabelCode from Vistaar.MasterProduct ) select distinct PSID , Product as BL , b.BrandLabelName as BLName , [Geography] as Market , EffectiveMonth from temp a left join brands b on b.BrandLabelCode = a.Product
|
On April 27 2017 16:31 Manit0u wrote: Back to SQL again... With the problem I posted some time back.
We have: A has many B, C has many B.
Now, we do filtering on C matching A where C has no B or CB are a subset of AB.
The thing is, it's pretty slow as soon as you hit about a million records in the db (1.5s query), which is no good for us.
Any ideas how can you optimize it in postgres?
Right now we have join tables that are being aggregated into views (stale data is unacceptable since it's used for live time pooling and assigning C to A with race conditions and all that jazz).
Edit: I'm seriously considering dropping the join tables and simply dumping all the related ids into an uuid[] column in respective tables. It would help if you post a sample query. I'm sure postgres has an query or execution plan analyzer. Some databases can't utilize indexes properly if your conditions are in a different order than columns in your index.Also if you have a 'in()' condition, usually you should not reference tables outside the subselect.
|
On April 28 2017 05:33 supereddie wrote:1. Set an index on BrandlabelCode include BrandLabelName. Also properly index VistaarExtractStagingArchive 2. distinct is slow. try to avoid it. 3. perhaps create a 'brands'-table, so you don't have to distinct on masterproduct 4. analyzing the query plan will show you the exact problems 5. you can also use the 'with'-statement that may speed up execution, for example: with temp as( select PSID , Product , [Geography] , EffectiveMonth from vistaar.VistaarExtractStagingArchive where JobRequestId = @jobRequestId ), brands as( select distinct BrandLabelName, BrandLabelCode from Vistaar.MasterProduct ) select distinct PSID , Product as BL , b.BrandLabelName as BLName , [Geography] as Market , EffectiveMonth from temp a left join brands b on b.BrandLabelCode = a.Product
Thanks for the help! These I would definitely implement if I hadn't already got a query that gives me a return in 1 second I was asking mostly because I found it odd that the sub-query approach took so long, because I use sub-queries a lot and usually they work just fine in doing filtering like this. I'm thinking it was the distinct; somehow that made it decide to do a full table scan before filtering.
|
On April 28 2017 07:53 enigmaticcam wrote:Show nested quote +On April 28 2017 05:33 supereddie wrote:1. Set an index on BrandlabelCode include BrandLabelName. Also properly index VistaarExtractStagingArchive 2. distinct is slow. try to avoid it. 3. perhaps create a 'brands'-table, so you don't have to distinct on masterproduct 4. analyzing the query plan will show you the exact problems 5. you can also use the 'with'-statement that may speed up execution, for example: with temp as( select PSID , Product , [Geography] , EffectiveMonth from vistaar.VistaarExtractStagingArchive where JobRequestId = @jobRequestId ), brands as( select distinct BrandLabelName, BrandLabelCode from Vistaar.MasterProduct ) select distinct PSID , Product as BL , b.BrandLabelName as BLName , [Geography] as Market , EffectiveMonth from temp a left join brands b on b.BrandLabelCode = a.Product
Thanks for the help! These I would definitely implement if I hadn't already got a query that gives me a return in 1 second  I was asking mostly because I found it odd that the sub-query approach took so long, because I use sub-queries a lot and usually they work just fine in doing filtering like this. I'm thinking it was the distinct; somehow that made it decide to do a full table scan before filtering.
His point #4 about query plans is the most important. If you can learn to read them it will take all of the guess work out of trying to figure out what happened.
|
On April 28 2017 07:53 enigmaticcam wrote:Show nested quote +On April 28 2017 05:33 supereddie wrote:1. Set an index on BrandlabelCode include BrandLabelName. Also properly index VistaarExtractStagingArchive 2. distinct is slow. try to avoid it. 3. perhaps create a 'brands'-table, so you don't have to distinct on masterproduct 4. analyzing the query plan will show you the exact problems 5. you can also use the 'with'-statement that may speed up execution, for example: with temp as( select PSID , Product , [Geography] , EffectiveMonth from vistaar.VistaarExtractStagingArchive where JobRequestId = @jobRequestId ), brands as( select distinct BrandLabelName, BrandLabelCode from Vistaar.MasterProduct ) select distinct PSID , Product as BL , b.BrandLabelName as BLName , [Geography] as Market , EffectiveMonth from temp a left join brands b on b.BrandLabelCode = a.Product
Thanks for the help! These I would definitely implement if I hadn't already got a query that gives me a return in 1 second  I was asking mostly because I found it odd that the sub-query approach took so long, because I use sub-queries a lot and usually they work just fine in doing filtering like this. I'm thinking it was the distinct; somehow that made it decide to do a full table scan before filtering. If you don't have an index on the columns you use in the statement, then it must do a full table scan to determine all the different values. However, don't blindly create indexes - maybe there are already indexes on the table but the query optimizer finds them not usable (because of the fields used or something). When having performance issues in any RDBMS always try to check the execution plan. It will tell you exactly how the query is executed.
|
Thank you for the suggestions guys!
|
On April 28 2017 16:26 supereddie wrote:If you don't have an index on the columns you use in the statement, then it must do a full table scan to determine all the different values. However, don't blindly create indexes - maybe there are already indexes on the table but the query optimizer finds them not usable (because of the fields used or something). When having performance issues in any RDBMS always try to check the execution plan. It will tell you exactly how the query is executed. I forgot to mention that there is an index on the JobRequestId. My bad, that probably would've made it more clear as to why I was a bit confused as to how long that report was taking.
I know the execution plan is a big help in times like this, it's just that I've honestly never spent the time to learn how to read it. Guess I should probably do that :D
|
anyone around for a little bit that I could message a few questions about dup2 and pipes to?
edit: eh ill just post it
I am making a shell I have a parent process that does this every time a command is executed that is supposed to pipe to another command:
if (dup2(STDOUT_FILENO, pipe_fd[1]) < 0) { err(EX_OSERR, "dup2 error"); } [code]
This sets my pipe's write side to stdout
the child process does this:
[code]
if (dup2(pipe_fd[0], STDIN_FILENO) < 0) { err(EX_OSERR, "dup2 error"); }
if (dup2(pipe_fd[1], STDOUT_FILENO) < 0) { err(EX_OSERR, "dup2 error"); }
if a child process is part of the piping, it takes stdin from other commands (from the pipe) and sends stdout into the write end
now of course there is a lot more to my code than this, I have to check certain situations, but this is the gist
here is what is going on that I don't understand
If I run a command series that has 1 pipe, this works fine (which I think it shouldn't....)
echo hello | grep e
it works great. my code sees a pipe conjunction, opens a pipe, process the first command forks, child redirects to and from pipe and the first command goes into the pipe
second command processes, parent sets write end of pipe to stdout child (which actually executes the command) sets the read end and write end to both point to pipe
child does read from pipe, but apparently writes to stdout. why does it write to stdout? shouldn't it write to the pipe? the child code comes after the parent code, the child should be redirecting the write into the pipe, no? what is going on
and if I do a command like
ls | more | grep e
it doesn't work. it processes the first 2 commands correctly, and then the "more" writes to stdout and grep e reads an empty buffer.
There is something about dup2 and pipes that I am missing. For some reason if I do
if (dup2(pipe_fd[1], STDOUT_FILENO) < 0) { err(EX_OSERR, "dup2 error"); }
if (dup2(STDOUT_FILENO, pipe_fd[1]) < 0) { err(EX_OSERR, "dup2 error"); }
I can never point stdout back to my pipe. it's like that side of my pipe is gone. how am I using this incorrectly
edit: I get the impression no one here is into this stuff, lol don't blame ya
|
ok here is a question you guys will probably like more (though I suspect it's super easy)
the question is
x=0 For i=1 to n For j=i to i+1 x=x+i+j; (This is just one step)
Find a function f(n) such that the runtime is Θ(f(n)).
Big theta means both Big O and Big omega?
So the function that it wants is f(n) = n ? is this right? seems a little easy so I am probably screwing it up
|
|
On April 30 2017 09:04 travis wrote:ok here is a question you guys will probably like more (though I suspect it's super easy) the question is x=0 For i=1 to n For j=i to i+1 x=x+i+j; (This is just one step)
Find a function f(n) such that the runtime is Θ(f(n)). Big theta means both Big O and Big? So the function that it wants is f(n) = n ? is this right? seems a little easy so I am probably screwing it up I'm not that good in math, but I just put n=1 ... n=4 into a spreadsheet and fiddled with it to see a pattern. I noticed that this is the same:
x=0 xi=0 xj=0 For i=1 to n For j=i to i+1 xi++; xj++; endfor endfor x = xi + xj;
Since all that happens for j is '1+2+3+4', times n, and for i is '1+2+3' times n+1, all I need is a formula for 1+2+3+4. Looking up 1+2+3+4 lead me to Wikipedia and the formula x = n*(n+1)/2. Since you have two 1+2+3+4..., one until n and one until n+1, the resulting function would be:
nj = n+1 x = nj * (n * (n + 1) / 2) + n * (nj * (nj + 1) / 2)
There is probably a better or nicer function but this seems to work.
|
On April 30 2017 09:04 travis wrote:ok here is a question you guys will probably like more (though I suspect it's super easy) the question is x=0 For i=1 to n For j=i to i+1 x=x+i+j; (This is just one step)
Find a function f(n) such that the runtime is Θ(f(n)). Big theta means both Big O and Big? So the function that it wants is f(n) = n ? is this right? seems a little easy so I am probably screwing it up Are you sure the second loop looks like that? Because that's essentially the exact same thing as:
x=0 For i=1 to n x=x+2*i;
So f(n) = n seems spot on. But I agree that that seems rather trivial.
Edit: I'm assuming your for-loop pseudocode runs without including the end condition. If it does include the end condition the above code does not do the same, but that just means it runs twice for every i, so big-theta complexity is still n.
|
eddie are you and manitou misreading the code? I couldn't really understand your solution, eddie.
But I think you guys are misinterpreting the 2nd line
For j=i to i+1
as something like
For j = 1 to i+1
acrofales, what do you mean by "you're assuming my for-loop runs without including the end condition"? What do you mean by the end condition?
To be honest I have a suspicion that it *was* supposed to be something like j = 1 to i+1 in the 2nd loop, but I am going to go with what's on the paper, lol. If it's the professors mistake I will take advantage of it
|
On April 30 2017 20:58 travis wrote:eddie are you and manitou misreading the code? I couldn't really understand your solution, eddie. But I think you guys are misinterpreting the 2nd line For j=i to i+1
as something like For j = 1 to i+1
acrofales, what do you mean by "you're assuming my for-loop runs without including the end condition"? What do you mean by the end condition? To be honest I have a suspicion that it *was* supposed to be something like j = 1 to i+1 in the 2nd loop, but I am going to go with what's on the paper, lol. If it's the professors mistake I will take advantage of it
Does "to" mean "up to and including", or "up to, but excluding": in other words, "<=" or "<". For the time complexity it doesn't matter (it's a constant factor), but in case of the former, my refactoring of the code would be wrong.
|
On April 30 2017 20:58 travis wrote:eddie are you and manitou misreading the code? I couldn't really understand your solution, eddie. But I think you guys are misinterpreting the 2nd line For j=i to i+1
as something like For j = 1 to i+1
acrofales, what do you mean by "you're assuming my for-loop runs without including the end condition"? What do you mean by the end condition? To be honest I have a suspicion that it *was* supposed to be something like j = 1 to i+1 in the 2nd loop, but I am going to go with what's on the paper, lol. If it's the professors mistake I will take advantage of it Ah, i read it as
For j = 1 to i+1
I don't like one-letter variablenames :p
Still, just write out the first few n-values (n=1, n=2, n=3, n=4) and try to find a formula. I created a simple spreadsheet with columns for x, i and j and just filled in the values for different n.
The sum of i can be represented with: n + n^2 Sum of j = i + n = (n + n^2) + n = 2n + n^2 x = i + j = (n + n^2) + (2n + n^2) = 3n + 2n^2
|
On April 29 2017 23:54 travis wrote: anyone around for a little bit that I could message a few questions about dup2 and pipes to?
edit: eh ill just post it
...
I couldn't understand much of what you said but here's some code that should help you understand pipe and dup2.
+ Show Spoiler + #include <stdio.h> #include <stdlib.h> #include <sys/types.h> #include <unistd.h>
void close_fds(int* fd) { close(fd[0]); close(fd[1]); }
void run_pipe(char** cmds, int n) { printf("CMD: %s\n", cmds[0]); if (n == 1) { execl("/bin/sh", "sh", "-c", cmds[0], NULL); } else { int fd[2];
pipe(fd); pid_t pid = fork(); if (pid == 0) { dup2(fd[1], 1); close_fds(fd); execl("/bin/sh", "sh", "-c", cmds[0], NULL); } else { dup2(fd[0], 0); close_fds(fd); run_pipe(cmds + 1, n - 1); } } }
int main() { char *cmds[] = { "ls -l ..", "grep core", "wc"};
run_pipe(cmds, 3); }
|
this is pretty close to the implementation I ended up using for my program
though I am having a ton of trouble with soemthing like
(ls | more) | more
because I fork and then send the contents of my subshell back to my tree traversal, and when it sees the new pipe it doesn't know how to handle the difference between being in a subshell vs being out of one. When I get on the right side of the pipe in the subshell I end up having closed both ends of the pipe, and then when the "more" tries to read from the pipe there is nothing in it (or some other sort of problem), and my program breaks
this project is really hard :/
|
If some calculation can be done element-wise on whole matrices/vectors, should I still define it as a function or just put it in the calculations block? Is defining it as a function considered redundant? If it weren't possible to do it element-wise, I would need to use a for-loop and defining a function would be as obvious solution, but I'm not sure what to do in this case.
def assign_location(i, j):
L_el = np.array([2*i-1, 2*i, 2*j-1, 2*j]) return L_el
|
On May 01 2017 00:25 travis wrote:this is pretty close to the implementation I ended up using for my program though I am having a ton of trouble with soemthing like (ls | more) | more
because I fork and then send the contents of my subshell back to my tree traversal, and when it sees the new pipe it doesn't know how to handle the difference between being in a subshell vs being out of one. When I get on the right side of the pipe in the subshell I end up having closed both ends of the pipe, and then when the "more" tries to read from the pipe there is nothing in it (or some other sort of problem), and my program breaks this project is really hard :/ What's the syntax that it needs to support?
|
I don't understand the question
the syntax of the commands themselves are the same as in any shell
the syntax in the tree is a tree of structures that have various fields, the 2 most important fields being an enum that can be "pipe" "subshell" or "and", or an array of strings that represents a commmand with it's arguments
so I traverse the tree and if it is a conjunction I take the proper action (fork off a subshell, open a pipe, or do nothing if its and). if I am forking off a subshell the child sends the stuff that is under on the subshell back to the traversal function while the parent waits for the traversal/child to finish.
if it's a command I fork and the parent waits for the child to exec the command.
so I have 4 primary functions in play here
function 1 traverses the tree
function 2 reads what type of node it is and takes an appropriate action
function 3 reads commands that are found by function 2, forks and execs in the child. also handles where read and write go to if my "piping" global int is greater than 0
function 4 opens a subshell if found by function one. it forks, in the child sets write to the pipe if a pipe is open, and closes read if a pipe is open, then sets "piping" to 0. then the child sends the node below it to the tree for traversal.
This works for all cases except for where we have to open a new pipe in a subshell. that's because in the subshell I had set "piping" to 0. So when we open a new pipe, we act like there is only one pipe open, so it opens the pipe, passes it to the second command in the subshell, closes the pipe). But that sends the read to standard out rather than to the pipe that is outside of our subshell.
Hopefully you can read this stuff and stay sane.
|
|
|
|