Ah! Floating point help!!

Raithed

China7078 Posts

November 14 2007 03:42 GMT

anyone have ideas on floating points? i missed class today and found out i had homework. i googled floating points and read through it, and i dont quite understand it. even though the homework gives answer, i want to know how the answer is derived from it?

3.14 for example.

the floating point would be 3? i know how to convert that to binary, and that to hex, but i have no idea how floating point works, someone give me a clue?

something like this, thats one of the sites the homework gave. http://babbage.cs.qc.edu/IEEE-754/Decimal.html but i want to know how it works.

"convert 3.14159265 to 32bit FP"

Help?!

Meta

United States6225 Posts

November 14 2007 04:51 GMT

umm i was in the understanding that floating point numbers are numbers with decimal places. the float of 3.14 is 3.14. if you convert that to an integer, then you'd get 3. i'm not sure how to convert floating points into binary/hex though.

HeadBangaa

United States6512 Posts

November 14 2007 05:07 GMT

Floating point numbers are stored in a special way in binary. The implementation is dependent on the CPU, but the most common is IEEE 754. Your class is probably only gonna make you handle single-precision floats, which are 32-bits in length.

Oooh, I found this for ya. Really good:
http://en.wikipedia.org/wiki/IEEE_754

Skip to the "Single Precision" section and start reading.
Do some sample problems using the algorithm detailed there. Use this website to cross-check your answers:
http://www.h-schmidt.net/FloatApplet/IEEE754.html

Once you get the bit representation, you convert to hex/octal/whatever in the usual way.
Also, most FP numbers can't be represented in just 32-bits, so the rounding of the mantissa is conventionally performed (this is why you get rounding-warnings from your compiler when you're not careful with numerical types/casts).

Raithed

China7078 Posts

November 14 2007 05:36 GMT

Oh man, this is intense, INTENSE, alrighty. >_<

HeadBangaa

United States6512 Posts

November 14 2007 07:22 GMT

Per PM request:

OK, so there's 3 sections:

The (s)ign = 1-bit
THe (e)xponent = 8-bits
The (m)antissa = 23-bits

total = 32-bits

v = 3.14159265
where:
v = s * 2^e * m

OK, let's rock:

-The sign bit is easy: 0 (because positive number)
-Now write out the number in UNSIGNED, NON-TWOS COMPLEMENT:
11.001001xxxx

The "11." part is easy; it's just '3' in binary, and the decimal sign tagged on the end.
FOr the righthand part, I use the result in each step for 2 things:
a) the MSB of the result is the next-bit you write down
b) the fraction of the result (behind the decimal) is used in the next step.

Start with original number's fractional part:

0.14159265 x 2 = 0.2831853 (fraction-bit #1 = 0)
0.2831853 x 2 = 0.5663706 (fraction-bit #2 = 0)
0.5663706 x 2 = 1.1327412 ("" = 1)
0.1327412 x 2 = 0.2654824 ("" = 0)
0.2654824 x 2 = 0.5309648 ("" = 0)
0.5309648 x 2 = 1.0619296 ("" = 1)
(keep going until you get all zeros in the decimal, or until you've populated 23-bits).

And that's how I get: .001001xxxx (tediously 23-bits long, you finish it) and together, we have 11.001001xxxx

Now you "normalize" your result by moving the decimal such that only a single "1" is to its left:
11.001001xxxx = 1.1001001xxxx * 2^1 (shift amount = 1)

OK, now add the bias to the shift-amount:
1 + 127 = 128 = 10000000

Now mash everything together:
[s][exponent][mantissa] = 1 10000000 1001001xxxx

ba-zing!

liosama

Australia843 Posts

November 21 2007 00:11 GMT

haha i remeber FP conversions back when i did microcontrollers, learning the conversion is the kind of thing you learn once before the exam, then forget for the rest of your life

Raithed

China7078 Posts

November 21 2007 05:32 GMT

Yeah! We're using them for microcontrollers/MP's.

Please or register to reply.

Ah! Floating point help!!

Completed

Ongoing

Upcoming