Thread Rules 1. This is not a "do my homework for me" thread. If you have specific questions, ask, but don't post an assignment or homework problem and expect an exact solution. 2. No recruiting for your cockamamie projects (you won't replace facebook with 3 dudes you found on the internet and $20) 3. If you can't articulate why a language is bad, don't start slinging shit about it. Just remember that nothing is worse than making CSS IE6 compatible. 4. Use [code] tags to format code blocks.
A more general question but perhaps one others have found.
Retraining as a 30 year old, undergrad number 2 will hopefully lead to more gainful employment!
Anyway there’s an unfathomable amount of information at one’s fingertips, how do you sort of keep focused and on task over being overloaded with sheer information?
if you intend on working in a very small team that produces working software used by non techies i recommend reading the rest of this post. + Show Spoiler +
Narrow your focus by staying focused on building a piece of software that functions properly for a non-technical end user. If your software works properly and can be used effectively by the employees you've passed a big hurdle. Build some basic proof-of-concept lego blocks and build from there. When I work with a new programming language the first thing I do is learn how to pump out very basic reports that I can prove are accurate. Reports that other employees understand using the terminology and "view of the database" they have. I then create very basic user interfaces that facilitate proper changes to the database. As primitive as this might be... its a start you can build upon.
1/4 of my income comes from maintaining Foxpro apps built in the 1990s by a high profile published author and software design trainer who suffered a traumatic health event a few years ago. As long as the employees can use the software effectively.. I've done my job. How well my modifications adhere to esoteric object oriented design philosophies this famous author espouses doesn't mean much. Maybe at the time this famous author guy was building the most beautiful, elegant, amazing piece of coding wizardry. Who knows? WHo cares? Of course, he has labelled me a "heretic" by stating such things. Who cares man?
I'm trying something really odd and it's not working for me so I'd like some advice. First, I'll explain the approach since it might just be stupid from the start. Then, I'll detail the issues I'm facing in case it's not stupid and the issues can be avoided.
Approach:
I am implementing automated testing for Scala Spark jobs. It seems like most people don't do this. Overall, this is working well for me. I have taken CSV exports of our environment and am using that to count rows, etc.
Issues:
It's taking a long time to load all DFs into the environment as table for a test that might need only ~3 DFs to work. I am therefore implementing an approach to let each test load the DFs they need, and to load all DFs if there are no tables when I go to read one.
My conceptual approach was to simply override the SparkSession .table and .sql functions in order to add a step to IF check the catalog to see if there are any tables, then load all DFs if there were none. SparkSession has a private constructor so I can't extend it.
I then figured maybe I can create a shared Trait with the functions I need, but I can't make the SparkSession extend the Trait. How can I get the 2 sessions to share their interface and most of their logic, but hijack those few functions I need if the thing I'm extending has a private constructor?
i find it fascinating that object oriented design has been around for 35 years and the biggest payoff from proper OOD remains a struggle. I recall Mike Morhaime talking about his first forays into OOD in the early 1990s.
I maintain some 20+ year old applications and they are very much a mixed bag. Some of the stuff contains brilliant pieces of OOD and others.... are just garbage.
On November 12 2020 01:17 WarSame wrote: I'm trying something really odd and it's not working for me so I'd like some advice. First, I'll explain the approach since it might just be stupid from the start. Then, I'll detail the issues I'm facing in case it's not stupid and the issues can be avoided.
Approach:
I am implementing automated testing for Scala Spark jobs. It seems like most people don't do this. Overall, this is working well for me. I have taken CSV exports of our environment and am using that to count rows, etc.
Issues:
It's taking a long time to load all DFs into the environment as table for a test that might need only ~3 DFs to work. I am therefore implementing an approach to let each test load the DFs they need, and to load all DFs if there are no tables when I go to read one.
My conceptual approach was to simply override the SparkSession .table and .sql functions in order to add a step to IF check the catalog to see if there are any tables, then load all DFs if there were none. SparkSession has a private constructor so I can't extend it.
I then figured maybe I can create a shared Trait with the functions I need, but I can't make the SparkSession extend the Trait. How can I get the 2 sessions to share their interface and most of their logic, but hijack those few functions I need if the thing I'm extending has a private constructor?
This is in Scala.
I'm not super familiar with Scala (only did some quick changes to small microservices in the past a few times) but I'd like to challenge your approach on a higher, language-independent language.
Are you trying to read live data in the tests? That's what I gathered from your post. In my experience you want to make your tests completely abstracted from any real stuff and have a controlled environment for it. Usually you achieve it by mocking stuff and having sample data sets that never change. This way when testing some functionality that can be done with just unit tests (I think integration tests should only be done for mission-critical parts of the app) you don't have to bootstrap too much stuff or even need to access external resources (I've been mocking Redis, JSON responses etc. so I can have my tests isolated). This speeds things up a lot but has a downside of not telling your test if there were breaking changes in those mocked objects so you gotta judge here how critical it is. I've run into tests before where just too much was being mocked so the test was green but it was completely out of touch with reality of how other parts of the system work (other parts have changed a lot over the years but this set of tests never picked up on it because of everything being mocked).
Proper testing is an art that I'm still trying to git gud at.
i find it fascinating that object oriented design has been around for 35 years and the biggest payoff from proper OOD remains a struggle. I recall Mike Morhaime talking about his first forays into OOD in the early 1990s.
I maintain some 20+ year old applications and they are very much a mixed bag. Some of the stuff contains brilliant pieces of OOD and others.... are just garbage.
The inmates are running the asylum... that is why I love working in this industry.
POODR is actually a really good book. In any case, most of the problems with OOD/OOP stem from the fact that it was implemented wrongly along the way (basically, they used classes when they wanted modules). It doesn't help that most languages that are self-prophetized OOP languages are not so (Java, I'm looking at you).
Now, if you want to really grasp OOP you should take a true OOP language like Ruby, where everything is an object and there are no primitives (null is also an object in Ruby, as are true and false, which aren't even of the Boolean class). Ruby is also using modules heavily and to a great effect allowing it to perform some true "magic" that you simply cannot achieve with other languages (at least not the popular ones, I think you can do anything in Lisp since according to the data I managed to find it's the most feature-rich language in existence).
On November 12 2020 01:17 WarSame wrote: I'm trying something really odd and it's not working for me so I'd like some advice. First, I'll explain the approach since it might just be stupid from the start. Then, I'll detail the issues I'm facing in case it's not stupid and the issues can be avoided.
Approach:
I am implementing automated testing for Scala Spark jobs. It seems like most people don't do this. Overall, this is working well for me. I have taken CSV exports of our environment and am using that to count rows, etc.
Issues:
It's taking a long time to load all DFs into the environment as table for a test that might need only ~3 DFs to work. I am therefore implementing an approach to let each test load the DFs they need, and to load all DFs if there are no tables when I go to read one.
My conceptual approach was to simply override the SparkSession .table and .sql functions in order to add a step to IF check the catalog to see if there are any tables, then load all DFs if there were none. SparkSession has a private constructor so I can't extend it.
I then figured maybe I can create a shared Trait with the functions I need, but I can't make the SparkSession extend the Trait. How can I get the 2 sessions to share their interface and most of their logic, but hijack those few functions I need if the thing I'm extending has a private constructor?
This is in Scala.
I'm not super familiar with Scala (only did some quick changes to small microservices in the past a few times) but I'd like to challenge your approach on a higher, language-independent language.
Are you trying to read live data in the tests? That's what I gathered from your post. In my experience you want to make your tests completely abstracted from any real stuff and have a controlled environment for it. Usually you achieve it by mocking stuff and having sample data sets that never change. This way when testing some functionality that can be done with just unit tests (I think integration tests should only be done for mission-critical parts of the app) you don't have to bootstrap too much stuff or even need to access external resources (I've been mocking Redis, JSON responses etc. so I can have my tests isolated). This speeds things up a lot but has a downside of not telling your test if there were breaking changes in those mocked objects so you gotta judge here how critical it is. I've run into tests before where just too much was being mocked so the test was green but it was completely out of touch with reality of how other parts of the system work (other parts have changed a lot over the years but this set of tests never picked up on it because of everything being mocked).
Proper testing is an art that I'm still trying to git gud at.
No, I'm using CSV extracts to supply my data, then reading that data into the Spark environment at runtime. I haven't mocked anything because unfortunately to get the Spark API you need to have the whole Spark object.
Since I couldn't mock anything, and it takes ~20 seconds to start Spark up, I made the executive decision to have my "unit tests" be the tests that validated the Spark plan(i.e. made sure the job ran and schemas were correct), whereas the "integration tests" would .count the output DataFrames so that I could ensure data was being returned, and the same amount of data every time.
So unfortunately I'm pretty much breaking new ground as far as I'm aware here, which means resources available to me are limited. I've made it work though! I have a Medium article that is going to be published soon to hopefully avoid other people having the same issues I've run into.
At this point I'm just running into very specific issues mostly, but I figured while I was asking about them I could use a verification of my approach.
As a follow-up on my issue I was asking about before, I'm trying to use a Scala Object to store the Environment's state(i.e. if I have loaded all tables yet or not, and a Set of which tables have been loaded). This is so that an individual test can simply add the names of its source files to the Object to have them loaded into the environment, and to avoid every test that doesn't specify the names of its source files from loading EVERY source file into the environment.
Logically, this is the setup:
1. If running a test that names the source files it needs: load those into the environment 2. If running a test that does not name the source files it needs: load every source file into the environment 3. If running multiple tests that all specify their source files: load only those source files into the environment 4. If running multiple tests that don't all specify their source files: load every source file into the environment
Getting this to work has been... difficult. It seems like it should be easy but I keep running into issues.
Wait, why use the Scala Object? If you really need some form of global state I think you should use a singleton. Maybe even an abstract class where you store the data you need in the class variables so you don't even need to instantiate it and every other object can access those variables through static methods.
I don't remember if Scala offers such things. In Ruby you can even redefine class constants during runtime - not a great idea because they're not really constants any more but incredibly useful for some stuff that changes extremely rarely. I usually use it if I need to access some system settings that are stored in the db. The first time they're being accessed they are assigned to constants and then I can access them throughout the system by just calling Obj::const without ever needing to instantiate an object or make a call to the db. I then only need to create a trigger on the db settings so that the constants are being redefined right after the change. The idea behind using abstract classes (or modules) for that is that since you can't have an instance of it you're reducing the risk of stale data remaining somewhere in the memory before the object is garbage collected.
TL:DR - I like the concept of using abstract classes to hold some state that needs to be shared across the system.
Why would you label those values as constants if you were planning on changing them? I would assume just labelling them as vars and keeping in mind they don't change often should be enough. I'm not familiar with Ruby so maybe there's something weird going on.
Well, I'm using the constants construct leveraging the capabilities of the language in this case, rather than using constants as actual constants (I'm using them as a form of in-code caching). I know it's weird but once you start working with big systems where you have to handle millions of requests and have db with thousands of tables where each has billions of rows and hundreds of relations you can pretty much kiss most good practices and nice patterns goodbye and you have to hack your way through everything. In the end it doesn't matter if your code is the cleanest and purest stuff, all that matters to the business side of things is that you can deliver what is expected in under 100ms (you could still deliver it and have nice and clean code but it would take minutes).
Gotta start thinking big. Every request costs money in one way or another, and even if it's just pennies if you have tens of millions of them each day the sums start getting pretty big pretty fast.
Edit: To further illustrate how far some companies will go in micro-optimization I know a guy who specializes in such things and he gets bonuses that allow him to go for a month's vacation in the tropics for shaving off 2ms of a request.
That's true, but at the same time speeding up the development time through clean code and reducing defects can be a counter-balancing effect. Generally I'm okay with the hacky logic as long as it's wrapped in an abstraction to avoid others having to run into it. Similar to how rust has their unsafe code blocks that can be wrapped to be safe to the outside.
Honestly, I've generally found that using patterns can speed up the processing as well since there's less duplicated code portions, less duplication of resources, etc. There are some instances in which you cannot afford them since the overhead itself will bite you, but that must be rare.
Of course, using patterns and clean code whenever possible is of utmost importance. The situations I mentioned usually come up late in the software's lifecycle. They're also quite rare and really only affect some huge systems where response time is paramount (banking, stock, ads, etc.).
Hmm, I'm honestly having a hard time picturing the scenarios you're talking about since I've never worked in any super response time heavy industries. I imagine you would mean that they cannot afford any abstractions at all and are written compiled, very close to the metal. So I would imagine you're talking about some CPP or Rust or C system where that low level of control is required.
I actually did work in banking for a bit using C but the response time requirements were not there. I don't even think they tested for it, as long as manual testing wasn't ridiculously slow.
On November 13 2020 06:59 WarSame wrote: Hmm, I'm honestly having a hard time picturing the scenarios you're talking about since I've never worked in any super response time heavy industries. I imagine you would mean that they cannot afford any abstractions at all and are written compiled, very close to the metal. So I would imagine you're talking about some CPP or Rust or C system where that low level of control is required.
I actually did work in banking for a bit using C but the response time requirements were not there. I don't even think they tested for it, as long as manual testing wasn't ridiculously slow.
medical is very response time sensitive. the flipside of the sometimes onerous requirements ... they pay BIG.
every second a patient bleeding to death must wait for cross-matched, whole blood to hit their veins increases the probability of brain damage and/or major organ damage and/or death.
my grandma ran the blood bank and lab at mississauga hospital for 30 years... i learned a lot from her about time sensitivity requirements of medical applications.
However, I wonder how much that is true in this particular instance. I'm going to take a guess and say their system is ~40 years old(from 1980 or so), and is one of those old Matrix-green style interfaces. When they need something changed they probably go in and hack around to make the small bandage they need.
In that case I would guess they are actually not running close to optimally. I would also guess they are built for a specific infrastructure which is relatively ancient, and that they could theoretically update to something with a much faster processor to get simply faster results, even with abstractions et. al factored in.
I imagine in these instances you wouldn't want to use cloud since you probably aren't guaranteed fibre(not sure if this runs in ambulances), but with a stronger modern local processor you could certainly improve response times.
Hi folks, quite early in my programming journey (yay undergrad number 2). Things are still rudimentary level wise currently, looking forward to clearing this semester and pushing on, really quite into it thus far and have some personal projects I’d like to do.
One thing I’ve found thus far is I learn way, way better from textbooks than online tutorials and the likes. I hadn’t quite expected this but whether it’s how the information is organised, the medium, or keeping me away from distractions it’s definitely providing more effective for me thus far.
Was wondering if you guys had any weighty textbooks you’d recommend, I think I’ll only be using Python, Java and C in the foreseeable future but open to other languages if there’s a good reason (heard Ruby is quite rigid in terms of OOP so is quite good to refine your OOP chops for example), or just more general books that you all swear by.
The bible I think is CLRS' Data Structure and Algorithms. It's so well known it has its own Wikipedia page. It's commonly called the Big Blue Book. It's one of very few textbooks(literally 1 of 2, the other being economics) I have pulled back out of my closet since grad, so I would strongly recommend getting a physical hard cover copy. This should teach you a very good level of Data Structures and Algorithms.
I would also highly recommend Cracking the Coding Interview, which is ostensibly about getting you prepared for job interviews, but I found it made me understand the utility of Data Structure and Algorithms much more, and gave a bit more of a "pragmatic" grounding for the usage.
Design Patterns by the Gang of Four is a highly recommended one. I've tried it a bit myself and didn't particularly like it, but I know a lot of people swear by it, and my some of my coworkers literally keep it by their desk for reference pretty frequently.
Clean Code by Uncle Bob is an amazing one that I wish everyone had to read. It teaches you to be a lot more considerate about your code, making sure the abstractions are good, the maintainability is good, etc. If you keep reading this one while working on a longer term solo project of 3-4 months, enough time for you to come back and forget what you were originally doing, you will see the truth, since you will wonder what the hell you had been doing previously. You will then realize if you had made your code cleaner you wouldn't have to wonder and piece it all together again for such a long time. I cannot recommend this one enough, it is my absolute favourite.
The Phoenix Project is a very good book to understand DevOps, and a lot of the business approaches, including Agile. It can also help explicate why Agile is an effective approach when used well, since a lot of places just use Cargo Cult Agile, which gives you all of the headache without any of the benefit. It will let you understand the WHY of a lot of the approaches, which allows you to obtain their benefits. It's especially good for understanding the business benefits of these approaches.
Finally, I've heard Learn Python the Hard Way is recommended for Python learning, though I haven't tried it myself. It's supposed to build a very thorough, deep understanding of the language.
CLRS and Design Patterns are mostly intended as reference material, and I wouldn't recommend reading them cover-to-cover. Therefore you should get the hardcover editions. All of the rest of them can/should be done cover-to-cover, but aren't as reference materially, so you can get the digital editions and be fine.
I think if you were to read all of these books and really understand/practice them I would consider you a very good beginner programmer, in my area you would be prepared for a roughly ~70k CAD/year job(though obviously difficulty and pay scale independently).
you may not need the coding interview book at all. I've never had an interview where I had to code that was for an actually good job (one I did have was for a job crafting email campaigns, yuck). The "coding" in interviews is never anything practical that you'll do on the job, they are just the second line of defense against people who bullshitted through the phone screen. If you can program, you generally don't need to worry about it.
Things are different for SV but SV sucks and all their "coding interviews" just throw leetcode problems at you.
On November 13 2020 06:59 WarSame wrote: Hmm, I'm honestly having a hard time picturing the scenarios you're talking about since I've never worked in any super response time heavy industries. I imagine you would mean that they cannot afford any abstractions at all and are written compiled, very close to the metal. So I would imagine you're talking about some CPP or Rust or C system where that low level of control is required.
I actually did work in banking for a bit using C but the response time requirements were not there. I don't even think they tested for it, as long as manual testing wasn't ridiculously slow.
Well, I was working in such place last year and they were not using C that much. There was some C++ but it wasn't much (mostly for some server-side calculations) since it all ran over the internet with plenty of microservices and third-party integrations (there was everything there, C++, Python, Ruby, Scala, Java, Swift, what have you). Sure, you could write it all in assembly if you really wanted but the thing is that some pieces of this software have to be worked on by different people over the course of many years - constantly being upgraded to newer standards, language versions etc. (technical debt is company killer after all) so there's still pretty high level of abstraction. You just strip it away in some bottleneck parts (you either strip away abstractions there or you strip away this entire functionality and move it to some different tech stack altogether).
On November 13 2020 10:06 tofucake wrote: you may not need the coding interview book at all. I've never had an interview where I had to code that was for an actually good job (one I did have was for a job crafting email campaigns, yuck). The "coding" in interviews is never anything practical that you'll do on the job, they are just the second line of defense against people who bullshitted through the phone screen. If you can program, you generally don't need to worry about it.
Things are different for SV but SV sucks and all their "coding interviews" just throw leetcode problems at you.
Why not just learn something that will be useful on the job instead? Such things also help with general interviews as they make you more comfortable and more knowledgeable about some real life applications of the code.
On November 13 2020 08:44 WombaT wrote: Hi folks, quite early in my programming journey (yay undergrad number 2). Things are still rudimentary level wise currently, looking forward to clearing this semester and pushing on, really quite into it thus far and have some personal projects I’d like to do.
One thing I’ve found thus far is I learn way, way better from textbooks than online tutorials and the likes. I hadn’t quite expected this but whether it’s how the information is organised, the medium, or keeping me away from distractions it’s definitely providing more effective for me thus far.
Was wondering if you guys had any weighty textbooks you’d recommend, I think I’ll only be using Python, Java and C in the foreseeable future but open to other languages if there’s a good reason (heard Ruby is quite rigid in terms of OOP so is quite good to refine your OOP chops for example), or just more general books that you all swear by.
Thanks in advance!
Here's some of the textbooks I found most useful through undergrad to where I am now. I work in systems programming doing a fair bit of low level driver development and network programming, in an application domain where you also need strong knowledge of distributed systems.
A second vote for CLRS. For learning algorithms it's the go-to.
Operating Systems Concepts by Silberschatz. I've found having a strong knowledge of systems and how the OS interacts with programs to be very useful. I ended up reading this cover-to-cover as part of my operating systems course, and I consider it one of the most useful courses I took. I've also heard good things about the Tanenbaum OS book.
Computer Architecture: A Quantitative Approach by Hennessy and Patterson. I find the way this book is presented is very good in that design decisions are contrasted and compared empirically. So in addition to giving more strong fundamentals in architecture, it also shows a good approach for how to evaluate and design systems.
If you have an interest in reinforcement learning, I'd recommend Reinforcement Learning: An Introduction by Sutton and Barto. This is a very good introduction, and is complementary to other statistical machine learning books if you're interested in the area.
If you're interested in learning more about theory, I'd recommend Introduction to the Theory of Computation by Sipser. This is a pretty standard textbook in the area and should be complementary to the CLRS treatment of this topic.
For interview style questions or just for fun, I really like Competitive Programming by Steven and Felix Halim. It covers a range of algorithms types and also presents full code examples to solve them that are designed to be time-efficient to implement. I did 'competitive programming' for fun during university, so basically programming solutions to algorithms questions under time constraints. I found the skills I developed doing that had a fair amount of crossover with what I consider to be a common (but poor) interview style of throwing algorithms questions at candidates.
Otherwise for books that I use more as a reference now and then but haven't read in full I like:
Advanced Programming in the Unix Environment by Stevens TCP/IP Illustrated, also by Stevens The Linux Programming Interface by Kerrisk