For larger employers, is gathering/cleaning/organizing data to be used for ML generally done by a different person than whoever creates the ML models themselves? Is there a typical sort of organization in regards to who does what?
The Big Programming Thread - Page 985
| Forum Index > General Forum |
Thread Rules 1. This is not a "do my homework for me" thread. If you have specific questions, ask, but don't post an assignment or homework problem and expect an exact solution. 2. No recruiting for your cockamamie projects (you won't replace facebook with 3 dudes you found on the internet and $20) 3. If you can't articulate why a language is bad, don't start slinging shit about it. Just remember that nothing is worse than making CSS IE6 compatible. 4. Use [code] tags to format code blocks. | ||
|
Deleted User 3420
24492 Posts
For larger employers, is gathering/cleaning/organizing data to be used for ML generally done by a different person than whoever creates the ML models themselves? Is there a typical sort of organization in regards to who does what? | ||
|
Excludos
Norway8196 Posts
On December 27 2018 23:38 travis wrote: for those working in ML, here's a question out of personal interest: For larger employers, is gathering/cleaning/organizing data to be used for ML generally done by a different person than whoever creates the ML models themselves? Is there a typical sort of organization in regards to who does what? From my experience, cleaning data to be used for ML is often a full time job for a team of people, who have little to no knowledge about what people who are working with the ML does. But this is anecdotal and could vary from company to company I guess. But generally I would assume for ML to be useful you need such vast amount of data that it's a full time job to take care of it. | ||
|
mantequilla
Turkey779 Posts
On December 27 2018 18:34 Manit0u wrote: I guess I need to refresh my knowledge of Java world Got a small project for my friend which could be a good way to get back in touch with Spring. Is Hibernate still a thing?Yes hibernate is still around, with jpa annotations on entities instead of hibernate ones. You can also use mongodb and annotate entities with spring data annotations instead. Check out spring boot (hassle free spring) and liquibase (db migrations). Also look at jhipster, I like that project, they are always on bleeding edge ![]() | ||
|
Manit0u
Poland17450 Posts
On December 28 2018 02:47 mantequilla wrote: Yes hibernate is still around, with jpa annotations on entities instead of hibernate ones. You can also use mongodb and annotate entities with spring data annotations instead. Check out spring boot (hassle free spring) and liquibase (db migrations). Also look at jhipster, I like that project, they are always on bleeding edge ![]() I created a project with spring boot, hibernate, postgresql (will need a relational db for this) and flyway for migrations. Will see how it goes. Thankfully I've got a senior Java dev next to me at the office who's worked for big companies like 2 Sigma so I should get it to work ![]() | ||
|
mantequilla
Turkey779 Posts
There are N peers, p1, p2...pn. One peer wants to send a message to all other peers. If every peer receives the message, everyone accepts the message. If someone can't get the message, no one accepts it. I don't care about being optimal or fast etc. just delivering the message to everyone is fine. For the simplicity, let's assume there are only two peers. And we are delivering only a single packet, so there's no packet ordering etc. problem. There are two peers, p1 and p2. p1 wants to send a message to everyone (well, just p2 in this example). I build a dictionary called "global state". It includes everyone's point of view of of who received the message: At the very start it's like this, no one heard the message:
p1 wants to send a message, since p1 is the initiator, it knows that it heard the message (duh). So it marks this knowledge on global state object:
p1 then sends the above state info to p2. It just continually tries to send in a while loop, since it runs on UDP (course requirement). When p2 finally receives the message: - p2 knows that p1 heard the message since it's the sender. - p2 has also heard the message now So p2's global state info becomes like this:
Then asks this question: who doesn't know that I heard the message? Obviously the answer is only p1. If there were more than 2 peers, there would be more peers who didn't know that p2 has heard the message. So p2 sends above global state info to these peers, in this case p1. There is a loop that continnually sends message to everyone who didn't hear (*) p1 receives the above message. It's coming from p2 so it means p2 has heard the message. p1 marks its state info with this (it just OR's its internal state with incoming message)
now p1 knows that everyone has heard the message. But p2 doesn't still know that p1 knows this. So p1 should somehow say to p2 "yes I know that everyone heard the message, stop bugging me!" My algorithm fails here. I just can't terminate it because p2 can't be sure p1 got the message. If p1 sends an ack that says "I know everyone has heard the message", it can't be sure p2 has heard the ack... It goes on and on without terminating... ![]() | ||
|
WarSame
Canada1950 Posts
If p1 sends to p2, p2 acks to p1(so p1 knows p2 received it, but p2 doesn't know that p1 knows), p1 acks the ack to p2(p2 knows that p1 knows) then you're good. You only need to receive the ack back for this to know that they both know p2 received the message. In regular networking the header will send the number of frames to the receiver and then number those frames as they're sent so that they can be acked seperately. If any are missed they are resent. If any are not acked then they are resent. If every frame is acked then the message has been passed. Similarly, you could make a message that has an ID and then send updates containing that ID and who has acked the message every time p1 receives an ack. When all receivers have acked you can add p1 to the ackers to signify that all acks have been received. This does seem inefficient, but maybe that's the price we pay for reliability. | ||
|
mantequilla
Turkey779 Posts
On January 01 2019 05:57 WarSame wrote: You would only need to go 1 layer deep of acks, right? If p1 sends to p2, p2 acks to p1(so p1 knows p2 received it, but p2 doesn't know that p1 knows), p1 acks the ack to p2(p2 knows that p1 knows) then you're good. You only need to receive the ack back for this to know that they both know p2 received the message. In regular networking the header will send the number of frames to the receiver and then number those frames as they're sent so that they can be acked seperately. If any are missed they are resent. If any are not acked then they are resent. If every frame is acked then the message has been passed. Similarly, you could make a message that has an ID and then send updates containing that ID and who has acked the message every time p1 receives an ack. When all receivers have acked you can add p1 to the ackers to signify that all acks have been received. This does seem inefficient, but maybe that's the price we pay for reliability. How can p1 be sure p2 got the ack to its ack? My brain doesn't seem to get this kind of stuff, sorry.in a tcp like scenario: A sends a message to B and A wants to be sure B got the message my scenario is like: above + B also wants to be sure that A knows that B got the message | ||
|
WolfintheSheep
Canada14127 Posts
On January 01 2019 06:07 mantequilla wrote: How can p1 be sure p2 got the ack to its ack? My brain doesn't seem to get this kind of stuff, sorry.in a tcp like scenario: A sends a message to B and A wants to be sure B got the message my scenario is like: above + B also wants to be sure that A knows that B got the message When P2 sends the ACK to P1, it will expect a response (ACK for the ACK). If it doesn't receive it, it will resend it's original ACK. If P1 only receives 1 ACK, then it can assume both it's Message and ACK were received. You could run into scenarios where P2 is actually completely shut down and thus won't resend any messages, but then you should also be tracking which peers are actually still alive. | ||
|
Manit0u
Poland17450 Posts
Let's assume you have n peers p and one control service c. p1 sends message to c, c broadcasts it to all the other peers. Each peer that got the message acknowledges it to c. This makes c the single place where you can check the state of each peer for each message. It can retry etc. If your peers need to know if everyone received their message they can simply ask c about it. Best way to introduce it would be something like this: p1 sends to c with a set timeout. c rebroadcasts to other peers and waits for their acks. When all peers send their acks to c it sends an ack to p1. If everything was within the timeout limit it is a success, if not you mark it as failure. This also gives you more flexibility since you can put retry logic etc. either in c or in each peer (you can even put it everywhere and you use c by default if p doesn't provide it, if it does it overrides c). You can also introduce different logic - if you don't want it to be timeout based you can make peers periodically ask c if their message was delivered to everyone. This way you avoid this circle of hell where all the peers know about each other and have to constantly check each other (this gets really inefficient and stupid when you get to higher numbers of peers, also whenever you introduce a new peer you'd need to update them all with this information). | ||
|
WolfintheSheep
Canada14127 Posts
But if not, then w/e. | ||
|
Manit0u
Poland17450 Posts
On January 01 2019 07:45 WolfintheSheep wrote: I'm kind of guessing by the use of the word "peer" that the intent is a decentralized system. But if not, then w/e. Well, even in the p2p world you still have trackers and what not ![]() The problem at hand was multicast so I assumed it should work more like messaging queues with fan-out approach. | ||
|
mantequilla
Turkey779 Posts
On January 01 2019 07:42 Manit0u wrote: Why making it overly complex? What you need is a control object and all the peers should communicate through it. This way you only have one place in your system where you have to track the information and you don't have to share state between peers (they don't have to know about other peers, how many there are etc.). Let's assume you have n peers p and one control service c. p1 sends message to c, c broadcasts it to all the other peers. Each peer that got the message acknowledges it to c. This makes c the single place where you can check the state of each peer for each message. It can retry etc. If your peers need to know if everyone received their message they can simply ask c about it. Best way to introduce it would be something like this: p1 sends to c with a set timeout. c rebroadcasts to other peers and waits for their acks. When all peers send their acks to c it sends an ack to p1. If everything was within the timeout limit it is a success, if not you mark it as failure. This also gives you more flexibility since you can put retry logic etc. either in c or in each peer (you can even put it everywhere and you use c by default if p doesn't provide it, if it does it overrides c). You can also introduce different logic - if you don't want it to be timeout based you can make peers periodically ask c if their message was delivered to everyone. This way you avoid this circle of hell where all the peers know about each other and have to constantly check each other (this gets really inefficient and stupid when you get to higher numbers of peers, also whenever you introduce a new peer you'd need to update them all with this information). it's distributed systems course's project, must be p2p architecture where all peers being equal and not a centralized server :/ Don't know if I can fit this into project description though, maybe there's a hole in definition that would allow a centralized server | ||
|
Lmui
Canada6216 Posts
Why do you need to know the state from every other peer's point of view before accepting the message? You should just need to know that all peers have read your message from each individual's point of view before the message can be accepted, which reduces your global state to just N entries from N^2 on each peer. I'm assuming messages are sent to a random unmessaged peer with every S seconds (Where S is a random number 1<S<10 ) since performance doesn't seem to be a requirement The initial states if you have 3 peers, with peer 1 receiving the initial message: global_state_p1 = { p1: true, p2: False, p3: False } global_state_p2 = { p1: False, p2: False, p3: False } global_state_p3 = { p1: False, p2: False, p3: False } It first sends a message to P2 (how you do reliability is up to you) with the message, and global_state_p1 global_state_p1 = { p1: true, p2: true, p3: False } global_state_p2 = { p1: true, p2: true, p3: False } global_state_p3 = { p1: False, p2: False, p3: False } P1 (or P2) messages P3 after knowing that P2 got the message and the states are now: global_state_p1 = { p1: true, p2: true, p3: true } global_state_p2 = { p1: true, p2: true, p3: False } global_state_p3 = { p1: true, p2: true, p3: true } And P1/P3 accept the message because they know all recipients have received all messages. At some point, P2 will reach out to P3 since P3 has not yet received the message from its standpoint. P3 will return its global state back to P2, at which point the message will be accepted by P2. There's one primary limitation to this but I'll leave it as an exercise for the reader ![]() | ||
|
Manit0u
Poland17450 Posts
This way you pass the message "around the table" and you know where it came from so you can later send it back to the originator as a final ack that everyone got it. It still feels really not very efficient. Personally I'd send the message from the originator to all peers at the same time, waiting for an ack. If only the originator needs to know it's been delivered to everyone then that's it (peers do not broadcast if they're not the originator). If all peers need to know about the status (if all the others also got it) then there's the next step involved which is sending final ack to all peers by the originator once it got all acks it needed. You're sending more messages (because of back-and-forth communication) but overall it is way more efficient since it happens all at once in parallel (which is what you really want from a distributed system). | ||
|
mantequilla
Turkey779 Posts
![]() Luckily project deadline is extended for a few days. If I can work out a working algorithm then I will need to plot some graphs write some reports about it etc.. | ||
|
Manit0u
Poland17450 Posts
Is this anything like what you are after? | ||
|
Manit0u
Poland17450 Posts
http://250bpm.com/blog:17 http://250bpm.com/blog:5 http://250bpm.com/blog:20 And an interesting note on (supposed) superiority of C vs C++ from a perspective of maintaining software for 5 years: http://250bpm.com/blog:4 http://250bpm.com/blog:8 | ||
|
Deleted User 3420
24492 Posts
My HTTP knowledge is pretty low... and my knowledge of security/authentication is even lower. I have a ML model to predict outcomes of events (betting online). I want to automate it. I use a betting website, that is fairly complicated GUI wise. I know when bets come up, though. I know what fields in my browser, I imagine I can examine the http in chrome or something? So, at specific times, I want to run my model, see the results, and if I like the results I want to place the bet automatically with a python script, rather than having to manually do it in my browser. How hard is this? Can I learn how to do this and get it done in 1 day? (tomorrow). keep in mind that it's with real money so I also need to learn whatever is required to open a secure session with authentication and whatever. | ||
|
Acrofales
Spain18132 Posts
On January 09 2019 10:20 travis wrote: alright, how hard is the following to do? (in python) My HTTP knowledge is pretty low... and my knowledge of security/authentication is even lower. I have a ML model to predict outcomes of events (betting online). I want to automate it. I use a betting website, that is fairly complicated GUI wise. I know when bets come up, though. I know what fields in my browser, I imagine I can examine the http in chrome or something? So, at specific times, I want to run my model, see the results, and if I like the results I want to place the bet automatically with a python script, rather than having to manually do it in my browser. How hard is this? Can I learn how to do this and get it done in 1 day? (tomorrow). keep in mind that it's with real money so I also need to learn whatever is required to open a secure session with authentication and whatever. Use selenium and it should be fairly easy (maybe a bit more than a day if you're completely clueless about HTML and JS, but not very long). You can also look if the web you're interested in has a REST API, in which case you can use that, and just skip the whole GUI part. As for ssl, Python makes that extremely simple. With selenium, the browser will take care of it. If connecting directly to an API, just use an httpsconnection instead of http. | ||
|
Silvanel
Poland4733 Posts
| ||
| ||

Got a small project for my friend which could be a good way to get back in touch with Spring. Is Hibernate still a thing?