When using this resource, please read the opening post. The Tech Support forum regulars have helped create countless of desktop systems without any compensation. The least you can do is provide all of the information required for them to help you properly.
If he can only play BF4 on Medium with a 770 his CPU is bottlenecking.
CPU bottleneck = high CPU usage holding back performance, FPS being equally bad on med vs high settings etc - he's just running into VRAM problem probably. Bf4 is the first game i've seen where you have to be cautious with 2gb of VRAM on 1920x1080/1920x1200, with the exception of some moddable games, but i ran out of VRAM on 64 player quite quickly until i lowered textures and maybe another setting or two a little, even without using AA
-----------------------
Benching shadowplay ATM, got a few interesting observations. I came to realize though that the smallish stutter in sc2 engine happens exactly on every 10 second mark on the game timer, at least with lots of units around. It seems maybe bigger too with high supplies, though i'm not sure - i don't think i could grab it with fraps, but i might be able to with shadowplay. Might be pretty hard to show on youtube too, i'll try. Actually, could probably just run a shadowplay video at half speed, fraps it, re-encode and upload the video with all frames intact at half speed, slow-mo would probably help to see it if it's shown.
On November 10 2013 07:45 Incognoto wrote: Well, yeah we hypothetically assume that our case can keep everything cool enough and that the card's cooler is also capable. The voltage is unlocked. Such a card is going to be more expensive than a reference PCB with an OK-but-not-great cooler.
I'm just wondering out loud how much extra is it worth paying to get such a card.
All the R9 280x are using custom PCBs afaik. ASUS, Gigabyte, and Sapphire Toxic uses 8+2 power phase design while HIS and XFX uses 6+2. Only MSI uses the reference 5+2 design but its PCB is still custom. ASUS is already running at a higher voltage apparently so it being voltage locked doesn't necessarily matter? There's a lot of mixed reports of whether a card is voltage locked or unlocked lol but I think the general consensus is that the HIS, XFX, and ASUS Matrix is unlocked.
In theory the Gigabyte card can clock the highest. Gigabyte uses the 60A IR3553B Power phases on all it's GPU's (well I don't know about entry level cards). They are slightly more efficient than what the other boards use. So even though lets say a GPU like the 780 Ti with it's 265w board limit (106% limit), even though all the cards can pull 265w, the Gigabyte card has slightly higher useable power. In practice though it doesn't really help clockwise, just slightly lower power consumption.
Then there is the Classified cards from EVGA. They are completely unlocked once you hit a switch on the PCB.
I'm pretty sure you're mixing up IR components - IR3550 = 60A - IR3553 = 40A - IR3563B(?) = a pwm controller
What components do the other cards use? TI NexFETs (for example) aren't inferior to IR powerstages.
If he can only play BF4 on Medium with a 770 his CPU is bottlenecking.
CPU bottleneck = high CPU usage holding back performance, FPS being equally bad on med vs high settings etc - he's just running into VRAM problem probably. Bf4 is the first game i've seen where you have to be cautious with 2gb of VRAM on 1920x1080/1920x1200, with the exception of some moddable games, but i ran out of VRAM on 64 player quite quickly
If he can only play BF4 on Medium with a 770 his CPU is bottlenecking.
CPU bottleneck = high CPU usage holding back performance, FPS being equally bad on med vs high settings etc - he's just running into VRAM problem probably. Bf4 is the first game i've seen where you have to be cautious with 2gb of VRAM on 1920x1080/1920x1200, with the exception of some moddable games, but i ran out of VRAM on 64 player quite quickly
Thats why it runs no problem with a 1GB HD6870...
He says he can run bf4 on med settings fine, but when he puts it up to ultra it starts lagging like hell.
You said he can run it at medium fine, you can run it at medium fine. I don't see what the problem is?
He can likely hold much of his performance going to max settings without AA (or with fxaa) with textures and/or a few other settings turned down a few notches (the right ones to take vram usage down - if he watches his vram and experiments)
You're not running max settings with AA "no problem with a 1gb HD6870" and he doesn't get issues til he tries, from what you've said.
Learn to read what? What's your problem? On the subject of power btw just to be sure; My 770 at 1293mhz pulls about ~185w in bf4 or general games (unigine heaven is very close to most games) according to the GPU boost sensor.
If a 780 can pull ~350w with nothing but the pci-e slot and 6+8 connectors, i wouldn't imagine any major problems on a stock 770 (0.66x as many cores at lower voltage) with dual 6-pins and the pci-e slot
Currently, tools like OpenGL and DirectX are "abstraction layers", meaning they write their own API with features that all graphics cards have in common. Abstraction layers are never 100% efficient for a given piece of hardware, because the basis of abstraction layers is to make assumptions so that it's easier to develop with. Something like Mantle is lower level, meaning that you have to have a better understanding of both how the hardware works and exactly what you need from the card, which lets you simply write faster code, at the expense of simplicity. While I haven't looked at the specifics of the Mantle API, I imagine a similar comparison would be comparing C to Assembly languages; a C program that is 10 lines and very simple to code would be, for example, 50 lines and very complicated if written in Assembler. However, the Assembler version would run faster because you can make optimizations and assumptions that a compiler can't!
This isn't really a simple explanation, but I tried to write it so that it can be read without being an experienced programmer!
TL;DR abstraction layers like OpenGL/DirectX are farther from the hardware and make assumptions so that code is easier to write. Mantle is really close to the hardware, so it's harder to write but runs much faster.
We'll know more once it's released to the public. AMD is claiming 9x more draw calls/second, but it won't be an order of magnitude faster for most applications.
Uploading a couple videos now, will edit post to include pictures and info (shadowplay) will take a while because of internet. Takes me ~78 minutes at this bitrate and upload rate to show 1 minute of content - that's the price to pay with beta software when the only button is "go!"
^This is ~17.7% higher 99'th percentile FPS (and ~11.25% higher average) if you disable shadowplay, a notable hit, but much less than other methods. Some of the performance is taken from higher-end fps though, which pulls at least the average down, without perceptively changing performance because low end was consistently significantly below that level of performance anyway.
*game crashed when i started third and there's slight variance in results so i didn't re-open
^There's not a chance in hell that i could tell you if it was running or not in a blind test. Invisible on the FPS meter and i don't seem able to feel anything differently (at least so far).
All recording done with 1080p60 manual recording - there's no way to change resolution, framerate or bitrate/quality, while shadow mode does not work for me right now at least in sc2. Yea, performance apparently went up.
The >BIGGEST< thing i got curious about again and pinned down though, is this:
The stutter in sc2 engine coincides exactly with every 10 seconds on the game timer. It doesn't always happen unless there's quite a lot of supply on the map i think - but it's like clockwork. You can increase or decrease the game speed, but boom, it'll happen at 20:10, 20:20, 20:30, 20:40.
^Above you see two different shots benchmarked at the same time of the game, one on 4770k @4.6ghz + 770, one on 4770k @4.5ghz + 7950. HT off on both i think - the lower FPS one has physics and reflections on, effects maxed, while the other one does not (physics+reflections off, effects medium) and both are otherwise maxed @1920x1080. CPU settings not enough to change performance as much as they maybe could have been (because not much stuff dieing at the time of recording) but that's to explain the performance difference.
^On two completely different systems. It's even visible, very much so, when recording with Shadowplay. I'm uploading a video to show that too, at half speed so that you can see every frame of a 60fps video. It's extremely jarring and visible.**
Crashed and lost upload. Damn, first time in weeks and not sure why
**Should be able to see it on the original vid. I can't record a video without the record method stuttering unless i use a lower resolution than 1080, and seems difficult to make it look at all decent without the video taking half an hour to upload. If i see an easier way to get decent results i might still do that, depending how the original video turns out after youtube re-encodes it and cuts half of the FPS
4770k @4.6 core, 4.0 uncore, RAM @2400 10-12-13-31-2t, GPU = 770 @~1241, 7600 (forgot to put core offset back on) though first test is partially GPU bound i think, the lower FPS one is nowhere close. All on Cyro vision/cam with follow on and main base building selected (hatchery then lair/hive)
Wow, murdered quality. It's pretty damn close to lossless on the desktop, though the only option for quality uses ~55mbits for such a scene @1080p60 (highest i've ever seen.. tons of zerglings, health bars etc)
You can't really see the stutter there, maybe i'l upload a vid at half speed after all. Look out for it at 10 sec mark on timer
^Seems harder to see on youtube than ingame, or even on the recorded video, but it's still visible on both of those vids and all of the graphs above
Please post thoughts on this! Takes a surprisingly long amount of time to benchmark stuff properly
Things like that, I would usually blame on the programming of a program. It looks like what you would expect if a program uses a language that provides garbage collection for memory management. The garbage collector might be run more often when a lot of things change as in a lot of units running around the map and getting killed. I don't think the programmers would make such a mistake in a game.
I think I see the same you see with fraps and that viewer program. Here's about two minutes around the twenty minute mark of your replay with the exact same settings: http://i.imgur.com/E5oGBhb.png
If I zoom in to a time window of the same size as in your screenshots, it seems to be pretty similar, but I don't think it's fixed 10 seconds for that one frame stutter.
The system might get tripped up keeping various things in sync regarding the game engine and the graphics driver and graphics hardware and the current point in time.
You might want to go into the BIOS and disable something called "high precision event timer" or "HPET".
Alternatively... do the reverse. Force Windows to use HPET all the time. You do this through the command:
bcdedit /set useplatformclock yes
You will then see it under the windows boot loader entry in "bcdedit". You can remove the setting through this:
Here's how it looks on Windows 8 without touching anything: + Show Spoiler +
Interesting is the frequency at the very top. Those 3 MHz are not HPET. But HPET is definitely enabled and running on the PC, the device showing up in the device manager. Windows 8 seems to straight up ignore it for normal operations.
The other interesting thing is the performance of "QueryPerformanceCounter()". This is so fast because it's not using some external hardware on the board. It's instead simply using an internal CPU register that counts with the clock cycle. This is what might be different on Windows 7.
The frequency will be at about 14MHz if HPET is being used instead of 3MHz. Those 3MHz are some old clock.
The speed of QueryPerformanceCounter() will be at over 1 microsecond instead of just those 6 nanoseconds if HPET is being used for it. This is what you check to see if that "useplatformclock" boot option is being used.
There's a Linux kernel programmers post somewhere answering some person about HPET and TSC. The current Linux kernel also ignores HPET for timing questions if TSC works correctly on the machine. This is the same behavior Microsoft introduced with Windows 8. TSC is that internal CPU register. It counts the clock cycle ticks since the CPU was turned on. This timing method got broken in the past because of multi-core CPUs and power saving features being introduced, but it's fixed on the current CPUs.
If you search around, there's a lot of discussions about HPET and useplatformclock and whatnot. There are suggestions for both, people saying there's latency with one or the other, there's stutter, how to get SLI to run smooth, etc.
If I zoom in to a time window of the same size as in your screenshots, it seems to be pretty similar, but I don't think it's fixed 10 seconds for that one frame stutter.
I get them only exactly on every 10 second mark on ingame timer (20:00, 20:10, 20:20, 20:30, 20:40) consistently, i counted like 20 stutters in a row hitting such marks on the timer and got them on faster x2 and also on normal speed, also coinciding with exactly the same times on the timer - i grabbed shot from gumbi on a different system, same replay same times and it lined up exactly too, so i think that was pretty conclusive.
If I zoom in to a time window of the same size as in your screenshots, it seems to be pretty similar, but I don't think it's fixed 10 seconds for that one frame stutter.
There's other stuttering, like slight resource stalls, that show up if you ctrl+alt+f. My shots were taken after x8ing through replay on no cam, then going back to start, seeking to end, then going to cam/vision i was going to use, playing through the part i wanted, then skipping back to a bit before it to start benchmark. Gotta be pretty thorough to make sc2 load stuff properly, because the engine seems to load stuff in a really minimalistic way (probably for memory footprint as sc2 can't use more than ~2047mb of RAM and hits it if you keep the game open, unloading stuff afaik, but it doesn't like to load it in the first place)
I wonder how much people would pay blizzard for a revised engine with some stuff fixed, better scaling across multiple CPU cores (with perfect scaling you're talking like 2.5x performance on a fast quad core or over 5x on 8-core fx, perfect is unrealistic but anything would be nice for basically everyone) as well as being able to address >2gb of RAM so that they can add an option to fully load the game and remove the annoying stuttering associated with that*, and meanwhile add a dx11 option like they did with WoW so that we can capture the game with less performance hit for streaming, would be nice announcements :D but not sure if they're even aware of anything being "wrong" with the engine. Oh yea, frametimes closer together, that's a big one because FPS meter is so inflated. 1/3 of my frames take twice as long as the other 2/3 of them, which means if the faster frames are @100fps and the slower ones at 50fps, the FPS meter would read in the 80's while actual experience would be much much closer to 50fps. Sc2 engine is one where you can feel the difference between 100fps and 150fps (on the meters) and that's pretty scary considering you're dropping sub-60 with overclocked haswell in 1v1
*They talked a few times about being unable to add new skins/animations etc because of being backed into a RAM wall
Since you benchmarked the same; what were your CPU/RAM settings? Here's my benchmark on the same display type and scale vs yours:
The time where they start etc are different, but you can see the frametimes getting slower with the baneling hits etc on both
It's a GTX 560 Ti 1GB somewhat overclocked on v331.65 drivers, 3570k at 4.8GHz with offset voltage and everything about power saving enabled, RAM on its XMP profile which is 1866MHz with not-so-good timings. The screen is 1920x1200, AA is off, VSync off, game is fullscreen, "max pre-rendered frames" at 1 in the nvidia settings.
From what I remember, I think I ran through the replay at x8, but not to the very end. I then backtracked a little to about 19:50 in the replay, and that's when I had fraps start record its data. I don't know what all those extra spikes are. I think that part still had everything already loaded. I had the view following "Cyro" and there were some lings scratching cannons and banelings rolling over marines and towards some PF.
I don't think high fps is that unrealistic. I bet it could be done by intentionally making everything a little broken. If programmers were allowed to let the rules be slightly crappy when there are large groups and a lot of units collide with each other, that should open up options for optimizing algorithms.
Parallel programming would risk being a waste of time. Basically, I instead want the base to run fine on a 233MHz Pentium. Like in old games where units were existing in a grid because the CPU was too slow to do it better.
I don't know what's up with SC2 regarding that. If you seek in a replay, the screen is black but it still goes pretty slow. Perhaps the game still prepares a lot of things that are needed for displaying the units even if the screen stays black, and the base of the engine might already be fast.
If things are only held back by preparing graphics, using more cores might be possible. But if the game engine without graphics is already slow, I don't think that's easy.
I don't know what's up with SC2 regarding that. If you seek in a replay, the screen is black but it still goes pretty slow.
The entire game is recreated using only the commands in the replay file, pathing, collision, combat and all, the game doesn't even know which units have died or health values etc without doing that. It'll pin a CPU thread @100% when seeking IIRC
Interesting to see my performance stronger than yours across the board though. Bench at min settings would be better to isolate GPU because my load was actually considerable at times (~40% peaks in second half of the game)
Same driver, i set max pre-rendered frames to program controlled for sc2.exe a while ago, 1920x1200 though.. same here for screen, but using in sc2, ew
Apparently there are game developers who are going to start using it. It's going to be very exciting to see a game with Mantle, this is very good for AMD. I feel very happy with my choice of buying the 7970 now.
Such late. Haha I'm like the slowbro meme. well regardless i'm still happy. :p
I've been playing around with very light overclocks on my 7970. Got it from 950 MHz to 1050 MHz no problem, temperatures seem all right, I'm still at stock volts, I haven't touched memory clock. I've been using Unigine Heaven to bench the card.
Isn't it a bit silly that the card would get a much higher score with Tesselation off than on? It's as if the score only reflects FPS.
I benched it with extreme settings @ 1600x900 with tess on and then did the exact same test with tess off (1050 MHz, i'll go higher when I have more time). Well, scores are 1123 (tess on) vs 1798 (tess off).
On November 11 2013 20:52 Incognoto wrote: Such late. Haha I'm like the slowbro meme. well regardless i'm still happy. :p
I've been playing around with very light overclocks on my 7970. Got it from 950 MHz to 1050 MHz no problem, temperatures seem all right, I'm still at stock volts, I haven't touched memory clock. I've been using Unigine Heaven to bench the card.
Isn't it a bit silly that the card would get a much higher score with Tesselation off than on? It's as if the score only reflects FPS.
I benched it with extreme settings @ 1600x900 with tess on and then did the exact same test with tess off (1050 MHz, i'll go higher when I have more time). Well, scores are 1123 (tess on) vs 1798 (tess off).
Tesselation vastly increases the work that GPU has to do to render 1 frame, so you'll have way lower FPS and scores with it. People bench heaven with the 1600x900 extreme preset, or manually set to max 1920x1080 with 8x AA (i think that's OCN rules)
I have ~53fps with stable settings on the 1600x900 preset, which is extreme, what're you looking at @1050mhz with stock memory?
On November 11 2013 21:29 Gumbi wrote: What model 7970 you have, do you have unlocked voltage, and what are your maximum temps on core and VRMs after a run of Valley/Heaven?
SC2 engine FTW.
Sapphire 11197-03-40G, with voltage unlocked. I haven't touched voltages yet however. Afterburner doesn't give VRM temps (I'm sure another program does, though). Core temp after a run at 1050 MHz gave me 63°C, max. Ambient air temp is probably around 20°C and I have a Design Core 1000 with a single case fan.
I get 44.6 FPS @1050 MHz and stock memoryclock / voltage (extreme preset). I'm not really looking for big numbers, just want to get a feel for oc. ~53 fps is a lot compared to me however my card isn't being pushed. Though I've heard Tahiti gets unstable at temperatures over 65-70°C. Doesn't matter, at least I have a reference point so thanks for that. ^^
My point in last post was that I found it curious that tesselation being on or off didn't matter in the score. A GPU will work way harder with tess on than off, as you said, the thing is that score doesn't account for that at all. Just thinking out loud really. (my fps/score hits 198/4987 with everything @ lowest settings xD)
"Score" in unigine is just another way of saying FPS that accounts for minimum and max in some weird way so nobody pays attention to it because it's more variable than the average FPS reading, AFAIK. Getting a 1k score on 1920x1080 is waay better than getting a 1.5k score on 960x540, same as 10fps on 1920x1080 is way better than 15fps on 960x540