|
anyone have ideas on floating points? i missed class today and found out i had homework. i googled floating points and read through it, and i dont quite understand it. even though the homework gives answer, i want to know how the answer is derived from it? 3.14 for example. the floating point would be 3? i know how to convert that to binary, and that to hex, but i have no idea how floating point works, someone give me a clue? something like this, thats one of the sites the homework gave. http://babbage.cs.qc.edu/IEEE-754/Decimal.html but i want to know how it works. "convert 3.14159265 to 32bit FP"
Help?!
|
umm i was in the understanding that floating point numbers are numbers with decimal places. the float of 3.14 is 3.14. if you convert that to an integer, then you'd get 3. i'm not sure how to convert floating points into binary/hex though.
|
Floating point numbers are stored in a special way in binary. The implementation is dependent on the CPU, but the most common is IEEE 754. Your class is probably only gonna make you handle single-precision floats, which are 32-bits in length.
Oooh, I found this for ya. Really good: http://en.wikipedia.org/wiki/IEEE_754
Skip to the "Single Precision" section and start reading. Do some sample problems using the algorithm detailed there. Use this website to cross-check your answers: http://www.h-schmidt.net/FloatApplet/IEEE754.html
Once you get the bit representation, you convert to hex/octal/whatever in the usual way. Also, most FP numbers can't be represented in just 32-bits, so the rounding of the mantissa is conventionally performed (this is why you get rounding-warnings from your compiler when you're not careful with numerical types/casts).
|
Oh man, this is intense, INTENSE, alrighty. >_<
|
Per PM request:
OK, so there's 3 sections:
The (s)ign = 1-bit THe (e)xponent = 8-bits The (m)antissa = 23-bits
total = 32-bits
v = 3.14159265 where: v = s * 2^e * m
OK, let's rock:
-The sign bit is easy: 0 (because positive number) -Now write out the number in UNSIGNED, NON-TWOS COMPLEMENT: 11.001001xxxx
The "11." part is easy; it's just '3' in binary, and the decimal sign tagged on the end. FOr the righthand part, I use the result in each step for 2 things: a) the MSB of the result is the next-bit you write down b) the fraction of the result (behind the decimal) is used in the next step.
Start with original number's fractional part:
0.14159265 x 2 = 0.2831853 (fraction-bit #1 = 0) 0.2831853 x 2 = 0.5663706 (fraction-bit #2 = 0) 0.5663706 x 2 = 1.1327412 ("" = 1) 0.1327412 x 2 = 0.2654824 ("" = 0) 0.2654824 x 2 = 0.5309648 ("" = 0) 0.5309648 x 2 = 1.0619296 ("" = 1) (keep going until you get all zeros in the decimal, or until you've populated 23-bits).
And that's how I get: .001001xxxx (tediously 23-bits long, you finish it) and together, we have 11.001001xxxx
Now you "normalize" your result by moving the decimal such that only a single "1" is to its left: 11.001001xxxx = 1.1001001xxxx * 2^1 (shift amount = 1)
OK, now add the bias to the shift-amount: 1 + 127 = 128 = 10000000
Now mash everything together: [s][exponent][mantissa] = 1 10000000 1001001xxxx
ba-zing!
|
haha i remeber FP conversions back when i did microcontrollers, learning the conversion is the kind of thing you learn once before the exam, then forget for the rest of your life
|
Yeah! We're using them for microcontrollers/MP's.
|
|
|
|