Thread Rules 1. This is not a "do my homework for me" thread. If you have specific questions, ask, but don't post an assignment or homework problem and expect an exact solution. 2. No recruiting for your cockamamie projects (you won't replace facebook with 3 dudes you found on the internet and $20) 3. If you can't articulate why a language is bad, don't start slinging shit about it. Just remember that nothing is worse than making CSS IE6 compatible. 4. Use [code] tags to format code blocks.
english is the default communication standard nowadays and localization to and from is a huge pain and takes time away from engineers, and is very dry job material for most people. if you want to not be hit by random unicode/language issues, use english only, because it's looking like that's the main medium of communication and programming for the next 100 years.
saying that these moments drive all those promising newcomers away from coding is a bit pretentious of you..
every half decent programmer encounters day long struggles with seemingly trivial things in hindsight. if the process or chain broken, hey, you can try your hand at fixing it, because there's probably a reason it was broken in the first place.
On August 09 2014 09:07 Blisse wrote: english is the default communication standard nowadays and localization to and from is a huge pain and takes time away from engineers, and is very dry job material for most people. if you want to not be hit by random unicode/language issues, use english only, because it's looking like that's the main medium of communication and programming for the next 100 years.
saying that these moments drive all those promising newcomers away from coding is a bit pretentious of you..
every half decent programmer encounters day long struggles with seemingly trivial things in hindsight. if the process or chain broken, hey, you can try your hand at fixing it, because there's probably a reason it was broken in the first place.
Is using a German version of Windows also pretentious? And by "English-only" you really mean "ASCII-only". Which implies no math symbols, ugly typography, etc. The fact that UTF-8 is still not the default format of text interchange for many applications/programming languages is absurd. And struggling at a task as easy as reading a text file definitely can scare away newcomers, why would you deny that?
On August 09 2014 09:07 Blisse wrote: english is the default communication standard nowadays and localization to and from is a huge pain and takes time away from engineers, and is very dry job material for most people. if you want to not be hit by random unicode/language issues, use english only, because it's looking like that's the main medium of communication and programming for the next 100 years.
saying that these moments drive all those promising newcomers away from coding is a bit pretentious of you..
every half decent programmer encounters day long struggles with seemingly trivial things in hindsight. if the process or chain broken, hey, you can try your hand at fixing it, because there's probably a reason it was broken in the first place.
I know, but others dont. You would be suprised how much students drop in the first semester because of some stupid reasons that could be easylie avoided. sure they are minor and could be solved by the students themselve, but especially in coding, people get pushed away by the old "you have to figure everything out yourself"-mentality, which origins from the time where it actually was true.
And to be more specific: localization might cost time, but using a proper codec would fix the problem with wird symbols of other languages and wouldnt really take time away from the engineers, besides the one time where it has to be build(if they just where some magic codec that includes all symbols/at least the symbold of relevant enough countrys). The problem with these things is, every atempt to get something unifying only results in yet antoher codec/format/standard that is competing with the others. Maybe this problem wont drive people away, since it is a very niche one, but there are tons of problems that could be resolved in a much better way. Also, I dont know if you have to work with python, but if you are new to the language, you geht overwhelmed by outdatet information, which definitly pushes newcomers away. I dont mean to offend you, but I understand your post as the manifestation of one of the big misunderstandings in the mind of programmers: just because everyone has to deal with something, it doenst mean that it is mandatory to do so and leaning by doing/figuring stuff out by yourself is often far from time-efficient. I did not intend to attack your native language, if german was the language of programming and therefore had no problems with the german umlaut/ ß, I would still be annoyed if signs of other languages wouldnt be supported. I once was guest in a talk that was basicly about this topic: Simple things in programming/IT in general, that get complicated for no good reason and could be avoided with very little effort by the right people. They even had some amazing stats about some exampels, where they managed to estimate how much money/time could be saved by investing in a few minor changes. Sadly, the reason why something is broken often boils down to stupidity/cockiness/greed/tradition/something else.
But back to topic: I decided to put my problem on hold and just use the script on the linux-partition of my laptop. So for now I dont have to deal with this nonsense and can focus on getting my ISP to fullfill his contract properly... a problem that might have affected my rant, keep that in mind before calling someone pretentious. good internet is more important than anything else
EDIT: also... you're canadian... i dont know about your laws, but as a german, you are quite fed up with having to do things in a special way/order, just because of some weird basic condition that could be easily resolved, if it wherent for power/money/interests/tradition/whatever. mybe I am a tad sensitive regarding that
waffelz, my native language isn't also English. It's cyrillic, letters such as "абв" (abc). I also have a problem with that for a Java program which has to support several languages. There is a default encoding for ResourceBundle which doesn't support cyrillic. Long story short, I use Java's native2ascii, so it is a good enough fix once you know what to do.
The reason I tell you this is because I think you haven't looked enough for a solution. I think you should first find one, then complain to language developers about this via their communication channel. For example, have you read the following solutions for Python?
If you think that umlauts and cyrillic are hard to deal with, wait until you enjoy the wonders of utf-8/16 incompatible east asian formats. Or when you have to handle a gazillion different formats in the same code. Also relevant:
Sorry if I wsnt clear enough, at the moment I only read or talk in english and barely write in it, so my skills in that regard suffer. I thought I made it clear that I am not ranting about umlauts only. It wasnt even about just codecs, it was about things that can be avoided if there where just some konsens. I was pointing my finger at the exact situation, that is pictured in the xkcd-comic, and if you think that this isnt a problem/shouldnt be solved, you are part of the problem. But well... why do I even response, there will be another one that picks one small part of my post and will try to drag the discussion on, instead of just admiting: "Yes, these small stumbling blocks are hurting and shouldnt be there, they are most likely there for some stupid reason and for that same reason they wont go away anytime soon/every attempt to solve it might only make it worse by adding another piece." For those who are unable to fully admit when someone is right, you can add a "You are right, they shouldnt be there, but they are and nothing will change, so why bother?".
Just as a last note: Yes, I found darkness's solution. It lets python read the file, but äöü's are still missing. It doesnt crash anymore, but is still far from perfect.
No the point is that the different codecs should be there. The reason that, for instance, you run into character encodings that don't work with each other is simply because some are worse for different situations.
ascii, utf-8 work great for things which have low byte count for the majority of their letters. They're absolutely atrocious for anything using cjk, especially if it has to accurately render older stuff. You end up wasting like 2x/3x the space, so people use different codecs.
Yes it is a problem, but it is also one you have to deal with (at least for the short term, when your code is going to conceivably going to go over the wire with shit bandwidth & latency).
If your code does not have to be transmitted anywhere, or interact with anyone else's code, then this is a solved problem, just use UTF-32 or UTF-16 everywhere and be done with it.
On August 10 2014 04:41 phar wrote: No the point is that the different codecs should be there. The reason that, for instance, you run into character encodings that don't work with each other is simply because some are worse for different situations.
ascii, utf-8 work great for things which have low byte count for the majority of their letters. They're absolutely atrocious for anything using cjk, especially if it has to accurately render older stuff. You end up wasting like 2x/3x the space, so people use different codecs.
Yes it is a problem, but it is also one you have to deal with (at least for the short term, when your code is going to conceivably going to go over the wire with shit bandwidth & latency).
If your code does not have to be transmitted anywhere, or interact with anyone else's code, then this is a solved problem, just use UTF-32 or UTF-16 everywhere and be done with it.
Oh, a text file in UTF-8 uses 1.5 KB instead of 1 KB (assuming it's not an HTML/XML file with ASCII tags, which would give the advantage back to UTF-8)? Which ends up being exactly the same after compression? Sounds dramatic, especially in the era of multiple-TB HDDs. The real advice is: just use UTF-8 for sharing text files. And UTF-16 is the most retarded encoding ever, very few people even realize that a single character can take up two bytes, which results in broken UTF-16 de/encoders. Also, UTF-16 can be encoded in two different byte orders, that's twice the fun, right?
And if you're not sharing text with any other persons/applications, then obviously use whatever makes the most sense. Which is also UTF-8 99% of the time.
I mean in the real world, you're going to run into ascii, utf-8, koi8, shift-jis, and who knows wtf else. Why? Because a) legacy, and b) some people have really shit internet, so cutting down filesize still helps. Shit in a lot of places people will just turn off data on their phone when they aren't explicitly using it because it's exorbitantly expensive.
UTF-16 is by definition always at least 2 bytes per character (sometimes 4). I don't know what you mean by "can take up to 2 bytes".
Also, UTF-16 can be encoded in two different byte orders, that's twice the fun, right?
This is generally true of every single thing on a computer ever:
Meh, encoding/decoding within the same app is hell, but not as big of a hell as when you try to make it work with other apps. Did you know, that if you make your app generate a file that's compatible with MS Office it might not be compatible with MS Office set up on a machine with different locale? Man, I hate Microsoft for that. Even if you get one-languge version of MS Office it won't be compatible with pick-another-language version that's running on the same machine...
How do they even manage to do that?
And don't get me started on the nightmares of creating an app that has to encode/decode files between 2 different formats (of which one is closed) and preserve all of the special characters for any given language. Not cool part of programming.
On August 10 2014 12:08 phar wrote: I mean in the real world, you're going to run into ascii, utf-8, koi8, shift-jis, and who knows wtf else. Why? Because a) legacy, and b) some people have really shit internet, so cutting down filesize still helps. Shit in a lot of places people will just turn off data on their phone when they aren't explicitly using it because it's exorbitantly expensive.
That's definitely true (except for the fact that you can save much more bandwidth by enabling compression than by switching to an obsolete encoding). All I'm saying is that everything and everyone should use/default to UTF-8, unless they have a very good reason not to.
UTF-16 is by definition always at least 2 bytes per character (sometimes 4). I don't know what you mean by "can take up to 2 bytes".
There is, or at least used to be, a reason why most processors work in the counter-intuitive little-endian mode. However, before you send those integers over the network, they're converted to network (big-endian) order (well, some people are evil and define "network order" as "little-endian" in their protocols, anyway, there's some kind of standard). But why would a character encoding explicitly allow for different endiannesses? What's the benefit?
Sorry I can respond to my previous statement some other time, a bit of miscommunication.
@del, I'm sure you learned this in CS already but
Why Are There Endian Issues at All? Can't We Just Get Along? Ah, what a philosophical question.
Each byte-order system has its advantages. Little-endian machines let you read the lowest-byte first, without reading the others. You can check whether a number is odd or even (last bit is 0) very easily, which is cool if you're into that kind of thing. Big-endian systems store data in memory the same way we humans think about data (left-to-right), which makes low-level debugging easier.
But why didn't everyone just agree to one system? Why do certain computers have to try and be different?
Let me answer a question with a question: Why doesn't everyone speak the same language? Why are some languages written left-to-right, and others right-to-left?
Sometimes communication systems develop independently, and later need to interact.
On August 10 2014 12:08 phar wrote: I mean in the real world, you're going to run into ascii, utf-8, koi8, shift-jis, and who knows wtf else. Why? Because a) legacy, and b) some people have really shit internet, so cutting down filesize still helps. Shit in a lot of places people will just turn off data on their phone when they aren't explicitly using it because it's exorbitantly expensive.
That's definitely true (except for the fact that you can save much more bandwidth by enabling compression than by switching to an obsolete encoding). All I'm saying is that everything and everyone should use/default to UTF-8, unless they have a very good reason not to.
There is, or at least used to be, a reason why most processors work in the counter-intuitive little-endian mode. However, before you send those integers over the network, they're converted to network (big-endian) order (well, some people are evil and define "network order" as "little-endian" in their protocols, anyway, there's some kind of standard). But why would a character encoding explicitly allow for different endiannesses? What's the benefit?
Ha ok I think we're saying the same thing then. Yea, in an ideal world we'd all be standardized, but
The Egyptian government broke one of my unit tests by changing their timezone information. @#$^(*#@&^)(#&^
Hi, any android developers here? Can somebody point me to a guide or something that discuss about how to put a guide or template on the camera? Like a guide on taking a picture that must align with the grid/template?
On August 10 2014 12:08 phar wrote: I mean in the real world, you're going to run into ascii, utf-8, koi8, shift-jis, and who knows wtf else. Why? Because a) legacy, and b) some people have really shit internet, so cutting down filesize still helps. Shit in a lot of places people will just turn off data on their phone when they aren't explicitly using it because it's exorbitantly expensive.
That's definitely true (except for the fact that you can save much more bandwidth by enabling compression than by switching to an obsolete encoding). All I'm saying is that everything and everyone should use/default to UTF-8, unless they have a very good reason not to.
UTF-16 is by definition always at least 2 bytes per character (sometimes 4). I don't know what you mean by "can take up to 2 bytes".
OK, I meant "2 code units", not bytes.
Also, UTF-16 can be encoded in two different byte orders, that's twice the fun, right?
This is generally true of every single thing on a computer ever:
There is, or at least used to be, a reason why most processors work in the counter-intuitive little-endian mode. However, before you send those integers over the network, they're converted to network (big-endian) order (well, some people are evil and define "network order" as "little-endian" in their protocols, anyway, there's some kind of standard). But why would a character encoding explicitly allow for different endiannesses? What's the benefit?
Ha ok I think we're saying the same thing then. Yea, in an ideal world we'd all be standardized, but
The Egyptian government broke one of my unit tests by changing their timezone information. @#$^(*#@&^)(#&^