The Big Programming Thread - Page 878

Prev 1 876 877 878 879 880 1032 Next

Thread Rules
1. This is not a "do my homework for me" thread. If you have specific questions, ask, but don't post an assignment or homework problem and expect an exact solution.
2. No recruiting for your cockamamie projects (you won't replace facebook with 3 dudes you found on the internet and $20)
3. If you can't articulate why a language is bad, don't start slinging shit about it. Just remember that nothing is worse than making CSS IE6 compatible.
4. Use [code] tags to format code blocks.

supereddie

Netherlands151 Posts

April 27 2017 20:33 GMT

#17541

On April 28 2017 02:20 enigmaticcam wrote:
If anyone here is familiar with Microsoft SQL Server, I have an optimization question too:

Why does this script only take 1 second to run...
*snip*
...and yet this script takes about 10 minutes.
*snip*
The table size is about 22 million rows. So I know the issue is that one is doing a distinct on the entire table, and one is doing a distinct only on the temp table. But I would think the second one would first filter for the smaller subset before performing the distinct, but clearly it's not. Is there a reason for that?

1. Set an index on BrandlabelCode include BrandLabelName. Also properly index VistaarExtractStagingArchive
2. distinct is slow. try to avoid it.
3. perhaps create a 'brands'-table, so you don't have to distinct on masterproduct
4. analyzing the query plan will show you the exact problems
5. you can also use the 'with'-statement that may speed up execution, for example:

with temp as(
  select
    PSID
    , Product
    , [Geography]
    , EffectiveMonth
  from vistaar.VistaarExtractStagingArchive
  where JobRequestId = @jobRequestId
), brands as(
  select distinct BrandLabelName, BrandLabelCode
  from Vistaar.MasterProduct
)
select distinct
  PSID
  , Product as BL
  , b.BrandLabelName as BLName
  , [Geography] as Market
  , EffectiveMonth
from temp a
left join brands b on b.BrandLabelCode = a.Product

supereddie

Netherlands151 Posts

April 27 2017 20:42 GMT

#17542

On April 27 2017 16:31 Manit0u wrote:
Back to SQL again... With the problem I posted some time back.

We have: A has many B, C has many B.

Now, we do filtering on C matching A where C has no B or CB are a subset of AB.

The thing is, it's pretty slow as soon as you hit about a million records in the db (1.5s query), which is no good for us.

Any ideas how can you optimize it in postgres?

Right now we have join tables that are being aggregated into views (stale data is unacceptable since it's used for live time pooling and assigning C to A with race conditions and all that jazz).

Edit: I'm seriously considering dropping the join tables and simply dumping all the related ids into an uuid[] column in respective tables.

It would help if you post a sample query. I'm sure postgres has an query or execution plan analyzer.
Some databases can't utilize indexes properly if your conditions are in a different order than columns in your index.Also if you have a 'in()' condition, usually you should not reference tables outside the subselect.

enigmaticcam

United States280 Posts

April 27 2017 22:53 GMT

#17543

On April 28 2017 05:33 supereddie wrote:1. Set an index on BrandlabelCode include BrandLabelName. Also properly index VistaarExtractStagingArchive
2. distinct is slow. try to avoid it.
3. perhaps create a 'brands'-table, so you don't have to distinct on masterproduct
4. analyzing the query plan will show you the exact problems
5. you can also use the 'with'-statement that may speed up execution, for example:

with temp as(
  select
    PSID
    , Product
    , [Geography]
    , EffectiveMonth
  from vistaar.VistaarExtractStagingArchive
  where JobRequestId = @jobRequestId
), brands as(
  select distinct BrandLabelName, BrandLabelCode
  from Vistaar.MasterProduct
)
select distinct
  PSID
  , Product as BL
  , b.BrandLabelName as BLName
  , [Geography] as Market
  , EffectiveMonth
from temp a
left join brands b on b.BrandLabelCode = a.Product

Thanks for the help! These I would definitely implement if I hadn't already got a query that gives me a return in 1 second

I was asking mostly because I found it odd that the sub-query approach took so long, because I use sub-queries a lot and usually they work just fine in doing filtering like this. I'm thinking it was the distinct; somehow that made it decide to do a full table scan before filtering.

berated-

United States1134 Posts

April 28 2017 02:18 GMT

#17544

On April 28 2017 07:53 enigmaticcam wrote:

Show nested quote +

with temp as(
  select
    PSID
    , Product
    , [Geography]
    , EffectiveMonth
  from vistaar.VistaarExtractStagingArchive
  where JobRequestId = @jobRequestId
), brands as(
  select distinct BrandLabelName, BrandLabelCode
  from Vistaar.MasterProduct
)
select distinct
  PSID
  , Product as BL
  , b.BrandLabelName as BLName
  , [Geography] as Market
  , EffectiveMonth
from temp a
left join brands b on b.BrandLabelCode = a.Product

Thanks for the help! These I would definitely implement if I hadn't already got a query that gives me a return in 1 second

His point #4 about query plans is the most important. If you can learn to read them it will take all of the guess work out of trying to figure out what happened.

supereddie

Netherlands151 Posts

April 28 2017 07:26 GMT

#17545

On April 28 2017 07:53 enigmaticcam wrote:

Show nested quote +

with temp as(
  select
    PSID
    , Product
    , [Geography]
    , EffectiveMonth
  from vistaar.VistaarExtractStagingArchive
  where JobRequestId = @jobRequestId
), brands as(
  select distinct BrandLabelName, BrandLabelCode
  from Vistaar.MasterProduct
)
select distinct
  PSID
  , Product as BL
  , b.BrandLabelName as BLName
  , [Geography] as Market
  , EffectiveMonth
from temp a
left join brands b on b.BrandLabelCode = a.Product

Thanks for the help! These I would definitely implement if I hadn't already got a query that gives me a return in 1 second

If you don't have an index on the columns you use in the statement, then it must do a full table scan to determine all the different values. However, don't blindly create indexes - maybe there are already indexes on the table but the query optimizer finds them not usable (because of the fields used or something).
When having performance issues in any RDBMS always try to check the execution plan. It will tell you exactly how the query is executed.

solidbebe

Netherlands4921 Posts

April 28 2017 13:20 GMT

#17546

Thank you for the suggestions guys!

enigmaticcam

United States280 Posts

April 28 2017 16:21 GMT

#17547

On April 28 2017 16:26 supereddie wrote:If you don't have an index on the columns you use in the statement, then it must do a full table scan to determine all the different values. However, don't blindly create indexes - maybe there are already indexes on the table but the query optimizer finds them not usable (because of the fields used or something).
When having performance issues in any RDBMS always try to check the execution plan. It will tell you exactly how the query is executed.

I forgot to mention that there is an index on the JobRequestId. My bad, that probably would've made it more clear as to why I was a bit confused as to how long that report was taking.

I know the execution plan is a big help in times like this, it's just that I've honestly never spent the time to learn how to read it. Guess I should probably do that :D

Deleted User 3420

24492 Posts

April 29 2017 14:54 GMT

#17548

anyone around for a little bit that I could message a few questions about dup2 and pipes to?

edit: eh ill just post it

I am making a shell I have a parent process that does this every time a command is executed that is supposed to pipe to another command:


      if (dup2(STDOUT_FILENO, pipe_fd[1]) < 0) {
         err(EX_OSERR, "dup2 error");
      }
[code]

This sets my pipe's write side to stdout

the child process does this:

[code]

      if (dup2(pipe_fd[0], STDIN_FILENO) < 0) {
         err(EX_OSERR, "dup2 error");
      }

      if (dup2(pipe_fd[1], STDOUT_FILENO) < 0) {
         err(EX_OSERR, "dup2 error");
      }

if a child process is part of the piping, it takes stdin from other commands (from the pipe) and sends stdout into the write end

now of course there is a lot more to my code than this, I have to check certain situations, but this is the gist

here is what is going on that I don't understand

If I run a command series that has 1 pipe, this works fine (which I think it shouldn't....)

echo hello | grep e

it works great. my code sees a pipe conjunction, opens a pipe, process the first command
forks, child redirects to and from pipe and the first command goes into the pipe

second command processes, parent sets write end of pipe to stdout
child (which actually executes the command) sets the read end and write end to both point to pipe

child does read from pipe, but apparently writes to stdout. why does it write to stdout? shouldn't it write to the pipe? the child code comes after the parent code, the child should be redirecting the write into the pipe, no? what is going on

and if I do a command like

ls | more | grep e

it doesn't work. it processes the first 2 commands correctly, and then the "more" writes to stdout and grep e reads an empty buffer.

There is something about dup2 and pipes that I am missing. For some reason if I do


      if (dup2(pipe_fd[1], STDOUT_FILENO) < 0) {
         err(EX_OSERR, "dup2 error");
      }


      if (dup2(STDOUT_FILENO, pipe_fd[1]) < 0) {
         err(EX_OSERR, "dup2 error");
      }

I can never point stdout back to my pipe. it's like that side of my pipe is gone. how am I using this incorrectly

edit: I get the impression no one here is into this stuff, lol
don't blame ya

Deleted User 3420

24492 Posts

April 30 2017 00:04 GMT

#17549

ok here is a question you guys will probably like more (though I suspect it's super easy)

the question is


x=0 
For i=1 to n 
  For j=i to i+1
    x=x+i+j; (This is just one step)

Find a function f(n) such that the runtime is Θ(f(n)).

Big theta means both Big O and Big omega?

So the function that it wants is f(n) = n ?
is this right?
seems a little easy so I am probably screwing it up

Manit0u

Poland17800 Posts

April 30 2017 07:01 GMT

#17550

Won't that be n^2 + 1?

supereddie

Netherlands151 Posts

April 30 2017 07:55 GMT

#17551

On April 30 2017 09:04 travis wrote:
ok here is a question you guys will probably like more (though I suspect it's super easy)

the question is


x=0 
For i=1 to n 
  For j=i to i+1
    x=x+i+j; (This is just one step)

Find a function f(n) such that the runtime is Θ(f(n)).

Big theta means both Big O and Big?

So the function that it wants is f(n) = n ?
is this right?
seems a little easy so I am probably screwing it up

I'm not that good in math, but I just put n=1 ... n=4 into a spreadsheet and fiddled with it to see a pattern.
I noticed that this is the same:


x=0 
xi=0
xj=0
For i=1 to n 
  For j=i to i+1
    xi++;
    xj++;
  endfor
endfor
x = xi + xj;

Since all that happens for j is '1+2+3+4', times n, and for i is '1+2+3' times n+1, all I need is a formula for 1+2+3+4. Looking up 1+2+3+4 lead me to Wikipedia and the formula x = n*(n+1)/2.
Since you have two 1+2+3+4..., one until n and one until n+1, the resulting function would be:


nj = n+1
x = nj * (n * (n + 1) / 2) + n * (nj * (nj + 1) / 2)

There is probably a better or nicer function but this seems to work.

Acrofales

Spain18394 Posts

April 30 2017 08:55 GMT

#17552

On April 30 2017 09:04 travis wrote:
ok here is a question you guys will probably like more (though I suspect it's super easy)

the question is


x=0 
For i=1 to n 
  For j=i to i+1
    x=x+i+j; (This is just one step)

Are you sure the second loop looks like that? Because that's essentially the exact same thing as:


x=0 
For i=1 to n 
    x=x+2*i;

So f(n) = n seems spot on. But I agree that that seems rather trivial.

Edit: I'm assuming your for-loop pseudocode runs without including the end condition. If it does include the end condition the above code does not do the same, but that just means it runs twice for every i, so big-theta complexity is still n.

Deleted User 3420

24492 Posts

April 30 2017 11:58 GMT

#17553

eddie are you and manitou misreading the code?
I couldn't really understand your solution, eddie.

But I think you guys are misinterpreting the 2nd line


For j=i to i+1

as something like


For j = 1 to i+1

acrofales, what do you mean by "you're assuming my for-loop runs without including the end condition"? What do you mean by the end condition?

To be honest I have a suspicion that it *was* supposed to be something like j = 1 to i+1 in the 2nd loop, but I am going to go with what's on the paper, lol. If it's the professors mistake I will take advantage of it

Acrofales

Spain18394 Posts

April 30 2017 12:10 GMT

#17554

On April 30 2017 20:58 travis wrote:
eddie are you and manitou misreading the code?
I couldn't really understand your solution, eddie.

But I think you guys are misinterpreting the 2nd line


For j=i to i+1

as something like


For j = 1 to i+1

Does "to" mean "up to and including", or "up to, but excluding": in other words, "<=" or "<". For the time complexity it doesn't matter (it's a constant factor), but in case of the former, my refactoring of the code would be wrong.

supereddie

Netherlands151 Posts

April 30 2017 14:25 GMT

#17555

On April 30 2017 20:58 travis wrote:
eddie are you and manitou misreading the code?
I couldn't really understand your solution, eddie.

But I think you guys are misinterpreting the 2nd line


For j=i to i+1

as something like


For j = 1 to i+1

Ah, i read it as

For j = 1 to i+1

I don't like one-letter variablenames :p

Still, just write out the first few n-values (n=1, n=2, n=3, n=4) and try to find a formula. I created a simple spreadsheet with columns for x, i and j and just filled in the values for different n.

The sum of i can be represented with: n + n^2
Sum of j = i + n = (n + n^2) + n = 2n + n^2
x = i + j = (n + n^2) + (2n + n^2) = 3n + 2n^2

Hanh

146 Posts

April 30 2017 14:53 GMT

#17556

On April 29 2017 23:54 travis wrote:
anyone around for a little bit that I could message a few questions about dup2 and pipes to?

edit: eh ill just post it

...

I couldn't understand much of what you said but here's some code that should help you understand pipe and dup2.

+ Show Spoiler +


#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>

void close_fds(int* fd) {
	close(fd[0]);
	close(fd[1]);
}

void run_pipe(char** cmds, int n) {
	printf("CMD: %s\n", cmds[0]);
	if (n == 1) {
		execl("/bin/sh", "sh", "-c", cmds[0], NULL);
	}
	else {
		int fd[2];

		pipe(fd);
		pid_t pid = fork();
		if (pid == 0) {
			dup2(fd[1], 1);
			close_fds(fd);
			execl("/bin/sh", "sh", "-c", cmds[0], NULL);
		}
		else {
			dup2(fd[0], 0);
			close_fds(fd);
			run_pipe(cmds + 1, n - 1);
		}
	}
}

int main() {
	char *cmds[] = { "ls -l ..", "grep core", "wc"};

	run_pipe(cmds, 3);
}

Deleted User 3420

24492 Posts

April 30 2017 15:25 GMT

#17557

this is pretty close to the implementation I ended up using for my program

though I am having a ton of trouble with soemthing like


(ls | more) | more

because I fork and then send the contents of my subshell back to my tree traversal, and when it sees the new pipe it doesn't know how to handle the difference between being in a subshell vs being out of one. When I get on the right side of the pipe in the subshell I end up having closed both ends of the pipe, and then when the "more" tries to read from the pipe there is nothing in it (or some other sort of problem), and my program breaks

this project is really hard :/

maybenexttime

Poland5847 Posts

April 30 2017 15:27 GMT

#17558

If some calculation can be done element-wise on whole matrices/vectors, should I still define it as a function or just put it in the calculations block? Is defining it as a function considered redundant? If it weren't possible to do it element-wise, I would need to use a for-loop and defining a function would be as obvious solution, but I'm not sure what to do in this case.

def assign_location(i, j):

    L_el = np.array([2*i-1, 2*i, 2*j-1, 2*j])
    
    return L_el

Hanh

146 Posts

April 30 2017 15:47 GMT

#17559

On May 01 2017 00:25 travis wrote:
this is pretty close to the implementation I ended up using for my program

though I am having a ton of trouble with soemthing like


(ls | more) | more

What's the syntax that it needs to support?

Deleted User 3420

24492 Posts

April 30 2017 16:05 GMT

#17560

I don't understand the question

the syntax of the commands themselves are the same as in any shell

the syntax in the tree is a tree of structures that have various fields, the 2 most important fields being an enum that can be "pipe" "subshell" or "and", or an array of strings that represents a commmand with it's arguments

so I traverse the tree and if it is a conjunction I take the proper action (fork off a subshell, open a pipe, or do nothing if its and). if I am forking off a subshell the child sends the stuff that is under on the subshell back to the traversal function while the parent waits for the traversal/child to finish.

if it's a command I fork and the parent waits for the child to exec the command.

so I have 4 primary functions in play here

function 1 traverses the tree

function 2 reads what type of node it is and takes an appropriate action

function 3 reads commands that are found by function 2, forks and execs in the child. also handles where read and write go to if my "piping" global int is greater than 0

function 4 opens a subshell if found by function one. it forks, in the child sets write to the pipe if a pipe is open, and closes read if a pipe is open, then sets "piping" to 0. then the child sends the node below it to the tree for traversal.

This works for all cases except for where we have to open a new pipe in a subshell. that's because in the subshell I had set "piping" to 0. So when we open a new pipe, we act like there is only one pipe open, so it opens the pipe, passes it to the second command in the subshell, closes the pipe). But that sends the read to standard out rather than to the pipe that is outside of our subshell.

Hopefully you can read this stuff and stay sane.

Prev 1 876 877 878 879 880 1032 Next

Please or register to reply.

The Big Programming Thread - Page 878

Completed

Ongoing

Upcoming