The Big Programming Thread - Page 877

Prev 1 875 876 877 878 879 1031 Next

Thread Rules
1. This is not a "do my homework for me" thread. If you have specific questions, ask, but don't post an assignment or homework problem and expect an exact solution.
2. No recruiting for your cockamamie projects (you won't replace facebook with 3 dudes you found on the internet and $20)
3. If you can't articulate why a language is bad, don't start slinging shit about it. Just remember that nothing is worse than making CSS IE6 compatible.
4. Use [code] tags to format code blocks.

Manit0u

Poland17243 Posts

April 26 2017 03:55 GMT

#17521

This definitely gets me closer. I'll still need to figure out some more stuff (obviously, the system is a bit more complex than that, with different queues, pipelines, engines, multiple statuses like rejected, blocked etc. and I need to scope it out properly).

Acrofales

Spain17970 Posts

April 26 2017 10:34 GMT

#17522

On April 26 2017 12:55 Manit0u wrote:
This definitely gets me closer. I'll still need to figure out some more stuff (obviously, the system is a bit more complex than that, with different queues, pipelines, engines, multiple statuses like rejected, blocked etc. and I need to scope it out properly).

If that's the case is it really worth doin in SQL?

solidbebe

Netherlands4921 Posts

April 26 2017 12:59 GMT

#17523

So I'm doing a project where we are making a system that documents and visualizes system architecture and information flow at a company (which uses 3000+ different IT systems). I'm looking for inspiration but I'm finding it very difficult to find examples of projects similar to this online. I just get flooded in UML/sequence diagrams which is obviously not helpful. Does anyone know of anything?

The idea for now is to use javascript and the D3 library to make the visualization. D3 seems very flexible and running it in the browser means we don't have to worry about different versions for OS and updating is easy. I have pretty much no experience with this kind of project so any suggestions are welcome.

Manit0u

Poland17243 Posts

April 26 2017 14:26 GMT

#17524

On April 26 2017 21:59 solidbebe wrote:
So I'm doing a project where we are making a system that documents and visualizes system architecture and information flow at a company (which uses 3000+ different IT systems). I'm looking for inspiration but I'm finding it very difficult to find examples of projects similar to this online. I just get flooded in UML/sequence diagrams which is obviously not helpful. Does anyone know of anything?

The idea for now is to use javascript and the D3 library to make the visualization. D3 seems very flexible and running it in the browser means we don't have to worry about different versions for OS and updating is easy. I have pretty much no experience with this kind of project so any suggestions are welcome.

I'd suggest using neo4j database for that (awesome for graphs) and then you could check facebook graph api for how they generate their output:

[image loading]

Djagulingu

Germany3605 Posts

April 26 2017 14:28 GMT

#17525

3000+? Is that real? That number is 50 times the amount of different IT systems one of the telecommunications giants in my country uses.

About the project: If you are going to draw anything on the web, you can't ever go wrong with D3. We have done something similar by having JSON objects as blueprints and drawing with D3 according to the blueprint. In such a structure, SVG is much more efficient than Canvas, so D3 is the way to go. Also, don't ever think about server side rendering with this shit.

spinesheath

Germany8679 Posts

April 26 2017 16:35 GMT

#17526

Maybe gephi.org is of any help?

Deleted User 3420

24492 Posts

April 26 2017 17:39 GMT

#17527

I had a combinatoric question on an exam that blindsided me.

A class has 5 TAs and 2 teachers.
A team of 4 of them is assigned (at random) to grade papers.
What are the odds that the 2 teachers are assigned to grade papers?

Thee question came with a hint that for this one they wanted a ratio of natural numbers (fraction), and that the answer should be found by using fractions and using a process of canceling most of them out. The hint confused the shit out of me.

The only way I see to solve the problem, I THINK, is to do all the combinations where you hit the 2 teachers in 4 tries

which reasonably seems like I should use the number of permutations with the 2 teachers in 4 letters out of the total number of permutations. but calculating this uses factorials which they said not to do

I don't really know how to do it otherwise.

something like multiplying out every actual chance of hitting the teacher each try for every combination of 4 tries, which seems fucking ridiculous

so what am I not getting about that, it makes me mad. we never did anything like that in class

WolfintheSheep

Canada14127 Posts

April 26 2017 18:01 GMT

#17528

Isn't that just like...basic combinatorics?

Total combinations with 2 Teachers, divided by total combinations. So like (5C2 + 2C2) / (7C4) or something.

Deleted User 3420

24492 Posts

April 26 2017 18:27 GMT

#17529

denominator would be 7c4, yes

and I think that 5c2 makes sense for a numerator (you wouldn't add 2c2, you would multiply it.. I think)

but the stupid hint told us to use natural number fractions(meaning don't use exponents or factorials or no choose r... it literally said that). so i shouldn't be working in this way at all. and more specifically it told us to cancel fractions out. I think he really fucked me over with his "hint" is what happened

when I did the test I did solve towards 5c2/7c4 at first until i re-read it and saw his "hint"

WolfintheSheep

Canada14127 Posts

April 26 2017 18:39 GMT

#17530

Yes, so...

((5*4)/2) / ((7*6*5*4)/(4*3*2))

From there you can separate things out and cancel.

Acrofales

Spain17970 Posts

April 26 2017 18:46 GMT

#17531

On April 27 2017 03:27 travis wrote:
denominator would be 7c4, yes

and I think that 5c2 makes sense for a numerator (you wouldn't add 2c2, you would multiply it.. I think)

but the stupid hint told us to use natural number fractions(meaning don't use exponents or factorials or no choose r... it literally said that). so i shouldn't be working in this way at all. and more specifically it told us to cancel fractions out. I think he really fucked me over with his "hint" is what happened

when I did the test I did solve towards 5c2/7c4 at first until i re-read it and saw his "hint"

So write out your factorials and do the math, lol. Lots will cancel out and you're left with a normal fraction. You know, write out 5c2 as 5! / 2!*3!, etc.

Deleted User 3420

24492 Posts

April 26 2017 18:50 GMT

#17532

denom should be over 3*2*1 not 4*3*2

but yeah I see where you are going

I could have gotten all the way here just fine (well I would have left it in the form of n-choose r's). But the hint pretty much told us not to use n-choose r at all

so yeah I get 0 points even though I could have solved this fine and I am going to be pretty mad

On April 27 2017 03:46 Acrofales wrote:

Show nested quote +

So write out your factorials and do the math, lol. Lots will cancel out and you're left with a normal fraction. You know, write out 5c2 as 5! / 2!*3!, etc.

I'll quote the "hint" for you when I get my exam back and you will understand my complaint

Acrofales

Spain17970 Posts

April 26 2017 18:58 GMT

#17533

Well, you can go from first principles. It's where combinatorics comes from. Will just take longer. The probability of picking exactly the two teachers is 1 - chance of picking less than two teachers = 1 - chance of picking 0 teachers - chance of picking 1 teacher = 1 - 5/7*4/6*3/5*2/4 - 4 ways of doing this and I don't feel like writing it out.

Then math the whole thing out.

Deleted User 3420

24492 Posts

April 26 2017 19:01 GMT

#17534

yeah that's pretty much what I tried and failed to do, because I was looking for a way to do it in which terms would start canceling out and they weren't.

I mean.. it's my failure though either way in that I didn't just have more confidence in that I still was getting the correct result regardless. But it sucks to lose points when you know material.

Zocat

Germany2229 Posts

April 26 2017 20:13 GMT

#17535

On April 27 2017 03:27 travis wrote:
denominator would be 7c4, yes

and I think that 5c2 makes sense for a numerator (you wouldn't add 2c2, you would multiply it.. I think)

You are correct, it would be *2c2. You have a Hypergeometric distribution.

You have learned the urn model, right? If not, do that it will make all your combinatorics way easier. You break down your problem and find the correct urn model. Then look up what kind of distribution is associated with that model.

Manit0u

Poland17243 Posts

April 27 2017 07:31 GMT

#17536

Back to SQL again... With the problem I posted some time back.

We have: A has many B, C has many B.

Now, we do filtering on C matching A where C has no B or CB are a subset of AB.

The thing is, it's pretty slow as soon as you hit about a million records in the db (1.5s query), which is no good for us.

Any ideas how can you optimize it in postgres?

Right now we have join tables that are being aggregated into views (stale data is unacceptable since it's used for live time pooling and assigning C to A with race conditions and all that jazz).

Edit: I'm seriously considering dropping the join tables and simply dumping all the related ids into an uuid[] column in respective tables.

enigmaticcam

United States280 Posts

April 27 2017 16:59 GMT

#17537

On April 27 2017 16:31 Manit0u wrote:
Back to SQL again... With the problem I posted some time back.

We have: A has many B, C has many B.

Now, we do filtering on C matching A where C has no B or CB are a subset of AB.

The thing is, it's pretty slow as soon as you hit about a million records in the db (1.5s query), which is no good for us.

Any ideas how can you optimize it in postgres?

Right now we have join tables that are being aggregated into views (stale data is unacceptable since it's used for live time pooling and assigning C to A with race conditions and all that jazz).

Edit: I'm seriously considering dropping the join tables and simply dumping all the related ids into an uuid[] column in respective tables.

If I remember correctly, SQL is a bit slow on OR statements. Maybe do the "Has no B" and the "CB are a subset of AB" in separate queries and union them.

Also, in my experience I've found that when I have to do complex joins or filters between large datasets, sometimes it helps to do as much filtering as you can first and insert into temp tables with indexes, and then do joins from there. Not sure if that will help you here.

enigmaticcam

United States280 Posts

April 27 2017 17:20 GMT

#17538

If anyone here is familiar with Microsoft SQL Server, I have an optimization question too:

Why does this script only take 1 second to run...


select
  PSID
  , Product
  , [Geography]
  , EffectiveMonth
into #temp
from vistaar.VistaarExtractStagingArchive
where JobRequestId = @jobRequestId

select distinct
  PSID
  , Product as BL
  , b.BrandLabelName as BLName
  , [Geography] as Market
  , EffectiveMonth
from #temp a
left join (
  select distinct BrandLabelName, BrandLabelCode
  from Vistaar.MasterProduct
) b on b.BrandLabelCode = a.Product

drop table #temp

...and yet this script takes about 10 minutes.


select distinct
  PSID
  , Product as BL
  , b.BrandLabelName as BLName
  , [Geography] as Market
  , EffectiveMonth
from (
  select 
    PSID
    , Product
    , [Geography]
    , EffectiveMonth
  from vistaar.VistaarExtractStagingArchive
  where JobRequestId = @jobRequestId
) a
left join (
  select distinct BrandLabelName, BrandLabelCode
  from Vistaar.MasterProduct
) b on b.BrandLabelCode = a.Product

The table size is about 22 million rows. So I know the issue is that one is doing a distinct on the entire table, and one is doing a distinct only on the temp table. But I would think the second one would first filter for the smaller subset before performing the distinct, but clearly it's not. Is there a reason for that?

Thaniri

1264 Posts

April 27 2017 17:44 GMT

#17539

It looks to me as though the second query is a correlated subquery. The inner query is re-executed for every product found.

If my understanding is right, you can have a worst case of 22m * 22m results.

The first script doesn't have a subquery at all, so as a whole it never has to be re-executed.

Mind you, I'm not a professional. This is me just regurgitating what I learned from one class where we happened to learn about these optimizations.

edit: You could make it even faster without using a temp table, but perhaps for safety or concurrency reasons you might want to still use it.

Manit0u

Poland17243 Posts

April 27 2017 19:18 GMT

#17540

On April 28 2017 01:59 enigmaticcam wrote:

Show nested quote +

We've removed the join columns entirely now. Switched to just storing ids in an array as a table column. It's super effective but required us to implement some back-end mechanisms for syncing stuff (not that hard since we're only interested in a scenario when a record whose id should go into the array is deleted which should be a rare case).

Prev 1 875 876 877 878 879 1031 Next

Please or register to reply.

The Big Programming Thread - Page 877

Completed

Ongoing

Upcoming