Shortcuts

AMD A10 Kaveri APU Details Leaked

This post has 22 Replies | 3 Followers

Top 10 Contributor
Posts 26,699
Points 1,207,610
Joined: Sep 2007
ForumsAdministrator
News Posted: Mon, Dec 2 2013 1:05 PM
There's a great deal riding on the launch of AMD's next-generation Kaveri APU. The new chip will be the first CPU to incorporate significant architectural changes to the Bulldozer CPU AMD launched two years ago and the first chip to use a graphics core derived from AMD's GCN (Graphics Core Next) architecture, which debuted on the desktop two years ago as well. A strong Kaveri launch could give AMD back some momentum in the enthusiast business -- and that's something the company could use.

Now, leaked slides point to a Kaveri APU that's coming in hot -- possibly a little hotter than AMD anticipated.



Kaveri's Steamroller CPU core separates some of the core functions that Bulldozer unified and should substantially improve the chip's front-end execution. Unlike Piledriver, which could only decode four instructions per module per cycle (and topped out at eight instructions for a quad-core APU), Steamroller can decode four instructions per core or 16 instructions per quad-core module. That's not going to double performance in and of itself, but it should offer a significant uplift.

Unfortunately, AMD's new CPU comes with a significant clock loss. The A10-7850K reportedly has a base clock of 3.7GHz and a maximum Turbo Mode of 4GHz compared to a 4.1GHz base / 4.4GHz Turbo for the A10-6800K. Cutting the clock speed by 10% may bring the chip into the desired power envelope, but it would also explain why we've heard reports that Kaveri's CPU improvement isn't very good. If Kaveri outperforms Piledriver by, say, 15-20% clock-for-clock, but then drops its clock speed by 10%, most of the gain evaporates.

The A10-7850K will offer a 512 core GPU while the A10-7700K will be a 384-core part. Again, GPU clock speeds have come down, from 844MHz on the A10-6800K to 720MHz on the new A10-7850K. This should be somewhat offset by the gains from moving to GCN from the VLIW4 architecture used in previous parts, but the chip is taking a 15% clock hit. Graphics performance should still be 15-20% better thanks to a vastly wider core.  The top-end part should improve performance over Trinity/Richland by 15-20% depending on the game, while the A10-7700K should be modestly faster than the previous hardware. The new GPUs will support Mantle and AMD's TrueAudio standard.

If the chip overclocks well, the A10-7850K could give AMD a serious improvement over previous parts, though it's not going to close the gap dramatically with Haswell. Current indications are that the A10-7850K is a 95W TDP. That implies there's going to be a little headroom for TDP, but not a huge amount. 

One thing we don't know yet is whether Kaveri will support the new Crossfire DMA engine that improves the R9 290X and R9 290's performance in multi-GPU mode. Kaveri could be an ideal dual graphics solution if frame pacing is improved between the asymmetric GPU configuration.
  • | Post Points: 80
Not Ranked
Posts 41
Points 355
Joined: Sep 2012

"Unfortunately, AMD's new CPU comes with a significant clock loss."

You have to remember that it's quite a different architecture (both CPU and GPU) than that of the A10-6800k. The A10-6800k was just an overclocked Trinity. Much like during the Trinity launch, clock speeds start low. As time goes on, yields get a bit better, and they can push out slightly higher clocked chips.

  • | Post Points: 35
Not Ranked
Posts 7
Points 95
Joined: Dec 2013

I agree. This one article actually flies in the face of many more out there which "show" a 20% CPU performance increase and a 30% GPU performance increase. I honestly think AMD is being conservative here. What he also failed to directly mention is with Kaveri AMD have developed their own version of Hyper Threading. You'll notice on some WCCFTech slides that Kaveri is a 4-core/8-thread chip. So you'll get a very decent, actually to date the best, iGPU with a quad core that could handle 8 threads at near FX-8350 performance levels. I believe that based on the many articles I've read, AMD's slides, and the performance demonstrations at APU13 that Kaveri will be on par with an Ivy Bridge i7 depending on HSA/OpenCL support. Yes, that's still a generation behind in terms of raw CPU performance, but keep in mind this thing will have THE BEST iGPU combined with Mantle and True Audio. Mantle/HSA will leverage the iGPU, and an R7-260X if you'd like, to offload not just the more paralleled workloads, but also the floating point math from the CPU cores to the GPU cores which help isolate the APUs strengths. Not to mention each part will be an equal citizen when it comes to RAM utilization so we'll actually be able to start utilizing RAM efficiently for the first time this decade! I'm thoroughly excited for Kaveri because it'll be a benchmark in processing history if HSA takes off. What I really appreciate is AMD has nearly completely embraced the Open Source community. They don't have half the resources Intel or nVidia have, yet they're right there in terms of commitment and support of the Linux Kernel. The FX-8350 was a monster under Linux, actually it matched the i7-3770, and Kaveri is slated to be even better. I just hope the Wintel..err I mean Windows at-least tries to fully utilize Kaveri. It's sad when a single partnership can stranglehold an entire industry. 

 

Chris

Top 100 Contributor
Posts 1,081
Points 11,700
Joined: Jul 2009
Joel H replied on Mon, Dec 2 2013 9:08 PM

"This one article actually flies in the face of many more out there which "show" a 20% CPU performance increase and a 30% GPU performance increase."

Kaveri is clocked 10% below Richland. That means that if Kaveri is 20% faster than Richland clock for clock, you wind up with a 10% net performance gain. The idea that Kaveri would pick up more than 20% on Richland is not backed up by the changes AMD has made to the architecture or an evaluation of the Bulldozer core's biggest problem.

Kaveri does not increase the L1 data cache size. It does not increase the set associativity of the L1 cache. The L1 is still write-through, which means the L1's perf is still going to be tied to the L2's performance. It *does* increase the size of the L1 instruction cache and improve the chip's branch prediction, but Bulldozer's branch prediction was weak to start with. 

2). "with Kaveri AMD have developed their own version of Hyper Threading. "

Incorrect. AMD has never used Hyper-Threading. Bulldozer and Piledriver/Richland use CMT, or Cluster-Multi-Threading. This is nothing like Hyper-Threading.

Hyper-Threading interleaves instructions from two separate threads into the same execution cycle. It does not add execution units. These diagrams are old, but explain the basic difference: First, here's CMT, which is what AMD uses. Ignore the CMP_CMT diagram on the right, it's not accurate for this purpose.

http://farm3.staticflickr.com/2496/3908957858_b3766c3538.jpg

Now, here's Hyper-Threading: http://archive.arstechnica.com/paedia/images/figure-4.html

Note that execution units are duplicated in the first diagram, but not in the second.

"that Kaveri is a 4-core/8-thread chip."

Kaveri is a two module / 4-core chip. Bulldozer/Piledriver parts lose about 20% of their possible performance when running in this configuration as opposed to running with one thread per module. There are no plans to launch a quad-module / eight-core Kaveri.

Kaveri will be marketed as a quad-core processor. It is not an eight-thread chip.

See here. Four cores: http://images.bit-tech.net/news_images/2013/11/amd-kaveri-apu-details-and-release-date-ann/amd-kaveri-2-1920x1080.jpg

"Kaveri will be on par with an Ivy Bridge i7. "

No, it won't. AMD has already stated that they cannot go head to head with Intel on this one. Kaveri would need to improve on Richland's single-threaded performance by 50-75% in order to do that. The best-case for AMD is that Kaveri offers Shanghai / Deneb levels of single-threaded performance. This would be 20% higher than their current rate.

  • | Post Points: 5
Not Ranked
Posts 80
Points 815
Joined: Jun 2013
basroil replied on Mon, Dec 2 2013 9:50 PM

"The FX-8350 was a monster under Linux, actually it matched the i7-3770, and Kaveri is slated to be even better"

hwbot.org/benchmark/multi_core_linpack/

Chris, the world would like to have a word with that nonsense post. Clock for clock the 3770 is almost 50% faster, and even the fastest 8350 is ~14s while a 3770 at only 70% the clock speed gets ~10.5s. Saying the 8350 matched the 3770 is like saying your turbocharged ford focus matched a stock ferrari, not even a close match when both are tuned to optimal levels.

  • | Post Points: 65
Top 100 Contributor
Posts 1,081
Points 11,700
Joined: Jul 2009
Joel H replied on Mon, Dec 2 2013 10:13 PM

He's just subscribing to the same tired theory that somehow, some combination of Linux and magic pixie dust makes the FX family fly. I reviewed the FX-9590 for a different website in a comprehensive gaming match-up between it and the Core i7-4960X. In high-end gaming, the two chips are actually fairly competitive -- Intel does better equipped with the 7990 than the AMD does; the gap is closer when tested with the single-GPU 290X. 

But that's gaming. And of course, it doesn't compare power consumption. In workstation applications or high-end content creation, the FX family struggles, even at 5GHz. 

  • | Post Points: 20
Top 100 Contributor
Posts 1,081
Points 11,700
Joined: Jul 2009
Joel H replied on Mon, Dec 2 2013 10:18 PM

To illustrate how great the gap is: 

In the latest, just-released version of Cinebench 15, the FX-9590 scores 115cb single-thread and 727 in the multi-threaded test. The Core i7-4770K scores 165cb single-threaded and 822 multi-threaded. 

That's a single-thread performance gap of 43%, and that's before we touch the fact that the Intel chip is running at 4GHz while the AMD core is running at 5GHz for the single-thread test. Adjust for clock speed, and the clock-for-clock gap between Intel and AMD is 1.79x. An Intel core is literally nearly twice the speed of an AMD core. 

The multi-threaded gap is much smaller at 13% unadjusted, 30% adjusted. But even if Kaveri improved single-core performance by 30% and simultaneously boosted multi-core scaling (which it should), it cannot close the gap with Ivy Bridge or Haswell. The gap is too large. The very top-end, best-case scenario was that Kaveri would be able to match Sandy Bridge performance. I no longer expect this, and am hoping that it can manage to match Thuban. 

  • | Post Points: 5
Top 100 Contributor
Posts 1,081
Points 11,700
Joined: Jul 2009
Joel H replied on Mon, Dec 2 2013 10:21 PM

It's not "quite a different architecture." Architectural details here: 

http://cdn3.wccftech.com/wp-content/uploads/2013/07/AMD-Steamroller-vs-Bulldozer.jpg

It is a modest tweak to Bulldozer that hopefully addresses some of Bulldozer's greatest deficiencies. 

  • | Post Points: 5
Top 10 Contributor
Posts 8,756
Points 104,935
Joined: Apr 2009
Location: Shenandoah Valley, Virginia
MembershipAdministrator
Moderator

I'm willing to see how they perform upon release. I can wait to see as I'm in no rush to buy.

The A10-6800 is pretty good for what it costs and I could make do with one of them if these are a bust. :)

If these new A series APUs do well, then maybe I'll go for one of them instead.

Dogs are great judges of character, and if your dog doesn't like somebody being around, you shouldn't trust them.

  • | Post Points: 20
Not Ranked
Posts 7
Points 95
Joined: Dec 2013

-I’m going to try and respond to everyone in one post –

http://www.phoronix.com/scan.php?pag..._vishera&num=1

To Quote Michael Larabel at Phoronix:

"From the initial testing of the brand new AMD FX-8350 "Vishera", the performance was admirable, especially compared to last year's bit of a troubled start with the AMD FX Bulldozer processors.

For many of the Linux computational benchmarks carried out in this article, the AMD FX-8350 proved to be competitive with the Intel Core i7 3770K "Ivy Bridge" processor. Seeing the FX-8350 compete with the i7-3770K at stock speeds in so many benchmarks was rather a surprise since the Windows-focused AMD marketing crew was more expecting this new high-end processor to be head-to-head competition for the Intel Core i5 3570K on Microsoft's operating system. "

So yes, according to Phoronix under Linux the FX-8350 was competitive with the i7-3770(K) since Linux was further optimized to use AMD's architecture. Unfortunately Windows has always favored Intel which is where the “Wintel” pet name comes from.

"Kaveri + Hyper Threading" When I said Hyper Threading I was using it as a general term since most people do not understand the different architectures and think of core multi-threading as "Hyper Threading" since Intel has been far more successful with it's endeavors in this regard, so more people are familiar with that term. I should have explained it differently. Yes I know about CMT but I appreciate you sharing that info rather than simply blasting me. I never said it'd be an 8 core chip however. Based on a slide within an article I read it's supposed to be 2 modules, 4 cores, and handle 8 threads. AMD made many changes to its Steamroller core so it's now "SteamrollerB". I haven't found much information on the exact differences, but see the following article which illustrates that it is a 4-core/8-threaded chip. ->http://wccftech.com/amd-announces-a107850k-kaveri-apu-specifications-architectural-details-launching-14th-january-512-gcn-cores-28nm-steamroller/   [correction - this article was updated redacting the "4/8" on the Kaveri slide. Kaveri is NOT able to handle 8 threads]

And where did you get the idea that it’d be on par with Deneb as far as single threaded performance goes? 

-->http://cpuboss.com/cpus/AMD-FX-4350-vs-AMD-A10-6800K The A10 Piledriver matches the quad core FX-Piledriver for the mysterious reason of it being the exact same CPU architecture.

-->http://cpuboss.com/cpus/AMD-Phenom-II-X4-965-vs-AMD-FX-4350 The quad core FX chip beats the Deneb

-->http://cpuboss.com/cpus/AMD-Phenom-II-X4-965-vs-AMD-A10-6800K The A10 beats the Deneb quad core in single threading and matches it in multi-threading performances.

So if we all accept that Kaveri will be “some measure” better than Richland, then Kaveri will also be “some measure” better than the Piledriver quad core and the Deneb Phenom quad core according to the benchmarks cited on CPUBoss. Truly I’m not a huge CPUBoss fan, but it is flashier than CPU-World.

What you’re missing from my statements is that with the advent of HSA, OpenCL, and Mantle AMD will be able to leverage its better assets, namely its Radeon cores, to help out its weaker x86 cores. It’s the exact inverse of Intel who has a weaker iGPU and stronger CPU core which is why they invented Crystal Well with it's eDRAM L4 cache. So overall, granting that HSA, OpenCL, and Mantle for games and possibly content creation software, is utilized properly Kaveri could give the stock Ivy Bridge i7 a run for its money under the right conditions. Intel, NVidia, Adobe, and Apple also have a stake in OpenCL, and HSA has AMD, ARM, Samsung, TI, and Qualcomm. Plus both have various other Open Source supporters and along with built in support under Linux & Mac OS. So it’s safe to say that both Open Source standards will be utilized in the future.

-->http://hsafoundation.com/

-->http://www.khronos.org/opencl/

Also don’t forget that Mantle is supported by all 3 next gen consoles, and R-X GPUs, so it will also be adopted very well within the near future. Since it has to be for anyone to enjoy new games on the next gen consoles. To give you a taste of it check out this video from APU13 showing Kaveri out-do an i7-4770K paired with a GeForce 630 while playing BF4

-->http://www.youtube.com/watch?v=HjAM2zYNqko

Truthfully, we all know if you’re using GeForce 630 you would use an i5 and not i7, but the point they’re making about threading is valid.

Here's the bottom line - I love my 3930K and in the past I only used Intel CPUs because I wasn't much of a gamer and kind of a Mac fan to begin with. After someone at U of M finally talked me into buying an FX-8350 for VMs under Linux for some labs we set up I found myself questioning a lot of the synthetic benchmarks out there after I played with around with it.

Check out these pages:

-->http://www.tomshardware.com/reviews/fx-8350-vishera-review,3328-8.html

-->http://www.phoronix.com/scan.php?page=article&item=amd_fx8350_visherabdver2&num=4

-->http://techreport.com/review/23750/amd-fx-8350-processor-reviewed/10

-->https://teksyndicate.com/videos/crysis-3-benchmarks-amd-fx-8350-vs-intel-i7-3770k-both-overclocked

Then I discovered that not only was Windows specifically optimized for Intel's architecture, but many of the compilers under Windows were as well. The GCC compiler, which Intel and AMD support quite a bit, is a good example of what can be done with AMD's module approach when fully utilized. The FX-8350 under those benchmarks above is nearly on par with the i7-3770K under the right conditions. Also, as Michael Larabel shows on Phoronix, the FX-8350 performs great under Linux and on par with the i7-3770(K). These are facts that many try to suppress out of nothing other than bias. I personally use both Intel and AMD processors now and both have their place depending on how they’re used. Intel definitely has its advantages, and so does AMD. AMD’s price to performance ratio is pretty legendary and compelling with many of its products, although not all. 

I’ve been working with computers since my first PowerPC back in the early 90s before IBM’s architecture took over. I’ve seen both of these companies go at each other time and again, and the different approaches they’ve taken to do so. AMD has been in a pretty dismal position the last 2-3 years, but just as they did in the 90s and early 2000s they’ll bounce back and bring the competition to Intel which is good for all of us. In some ways AMD is a tad more innovative than Intel, but they have to be since Intel has a lot more resources to leverage. AMD’s chip-sets tend to be more resourceful and last a long time while still being relevant. The 990FX chipset, in my opinion, was more compelling than what Intel used for Sandy Bridge. Plus, how happy have we been with the dismal increases from Sandy Bridge to Ivy and Haswell? I personally feel as though AMD has been working on its "module" pretty aggressively since Bulldozer flopped on the scene. I was very surprised with the A10-6800K’s performance and floored by Kaveri’s potential. In the mobile arena AMD without a doubt has given Intel a real run for their money, and both have almost made low to mid-grade mobile Nvidia dGPUs obsolete. 

The undeniable truth is this: we all need AMD and others (ARM) to be successful, no matter what our bias may be, because without real competition Intel has shown that it’s a tyrant and cares little for the consumer or a free market economy. Without competition Intel can charge whatever ridiculous price they wish by their own terms. They’d simply rather “out-buy” their competition and place a stranglehold on the entire industry as they did in the 90s and early 2000s with Andy Grove's “always be paranoid” motto. Until the FCC, the European Trade Commission, Japanese FTC, and even the trade commissions in China and some in South America found Intel guilty of violating fair market trade agreements and restricting consumer choice by forcing retailers to only sell Intel products, or else Intel wouldn’t sell to them, and in exchange Intel would give them hefty lump sum bonuses which is illegal across the globe. Now Intel HAS something to be “paranoid” about.

For those of you in denial about that last fact, here’s a pretty unbiased article lightly chronicling Intel’s many run-ins with the law.

-->http://www.pcworld.com/article/184882/A_History_of_Intels_Antitrust_Woes.html

You don’t have to agree with any of my opinions, but facts are simply facts. Don’t let your bias cloud your judgment. Both companies have great products to offer across multiple markets.

Chris

Not Ranked
Posts 7
Points 95
Joined: Dec 2013

I typed that up at work, so I know it reads rough.

  • | Post Points: 5
Not Ranked
Posts 3
Points 30
Joined: Dec 2013
Louis94 replied on Tue, Dec 3 2013 6:06 PM

Clock for clock you can't compare them since those architectures are radically different. I think the world would like to have a word with your nonsense! :) LOL!

  • | Post Points: 20
Not Ranked
Posts 3
Points 30
Joined: Dec 2013
Louis94 replied on Tue, Dec 3 2013 6:08 PM

So the author got a little ticked and threw on a skirt to defend his lousy article? Good job "Joel H." LOL!

  • | Post Points: 5
Not Ranked
Posts 3
Points 30
Joined: Dec 2013
Louis94 replied on Tue, Dec 3 2013 6:18 PM

I like the APUs. I don't care about benchmarks too much, but I think they're a great value for what you get. I don't think single core performance is all that important anymore since software seems to be a bottleneck these days. Either way, I don't care. I think the APUs represent the best bang for the buck and if this new chip is even better than my A10-6800K then I'll upgrade since I can afford to do so with how low cost they are. Have fun guys! LOL ; )

AMD v Intel.....haven't heard this one before!

  • | Post Points: 5
Top 100 Contributor
Posts 1,081
Points 11,700
Joined: Jul 2009
Joel H replied on Tue, Dec 3 2013 6:29 PM

Chris,

I've seen Kaveri running in task manager. Two modules, four cores, four threads.

  • | Post Points: 5
Top 100 Contributor
Posts 1,081
Points 11,700
Joined: Jul 2009
Joel H replied on Tue, Dec 3 2013 6:35 PM

1). I promise you, Steamroller (Steamroller B would be most likely to refer to a stepping, not a new architecture) is directly derived from Bulldozer and Piledriver. Clock-for-clock, Bulldozer is substantially less efficient than the old K10 Thuban architecture in most workloads. There are a handful of exceptions. For AMD to regain Thuban-level single-thread performance and scaling would be a significant improvement. 

2). It does not contain eight threads. I have seen the chip. It doesn't. No Hyper-Threading, no octal-threading.

3). http://techreport.com/review/23750/amd-fx-8350-processor-reviewed/14 Is a better comparison for value. An AMD eight-core performs like a quad-core from Intel.

4). Expect evolutionary performance gains, nothing more.

I'm sorry you continue to believe the situation is something other than what it is. Expect Kaveri to be an evolution of Piledriver.

And 5). I agree with you regarding the importance of AMD and general misconduct of Intel. Unfortunately, all the misconduct in the world does not equate to a performance advantage for AMD.

  • | Post Points: 20
Not Ranked
Posts 1
Points 20
Joined: Dec 2013

When you measure  total processing power in FLOPs  AMD's  HSA enabled APUs will destroy Intel's best processors.

The author and virtually every current benchmark is CPU biased. 

When  graphic intensive software is  coded to take advantage of  HSA innovations:

1) shared memory

2) h Queuing

Kaveri tests well against Intel's  top of the line chips.

Kaveri will be a $150 or less chip compared to a $650 Iris Pro... 

There is NO question which is the better value.

Even more important, is that  AMDs'  APU roadmap results in greatly increasing  power thru the GPGPU  processing,  using HSA innovations.

MSFT is supporting HSA!

AMD has quietly informed some journalists, and will be more publicly  elaborating on MSFT support soon....

Did anyone watch this Video from a top  engineer at  Amazon Web Services:

Amazon's James Hamilton: Why Innovation Wins

http://www.youtube.com/watch?v=BOYdKht1YwE

 

Cloud providers are working closely with AMD to replace Intels' overpriced solutions.

Cloud providers do NOT need Windows servers.

WHY would a top engineer at AWS be publicly supporting AMD?

Its because AWS  has been working closely with AMD in the creation of  semi-custom ARM 64 bit chips that will be used in SeaMicro server installations.

Google, Facebook, MSFT,  are in the same boat.

 

The public  Cloud is the biggest change in IT since the Internet  became a public assess communication medium  20 years ago. 

WHY do you suppose  SAMSUNG, QUALCOMM, IT, ARM,  have joined AMD as founding members of  the HSA?

Its  because HSA innovations will be the architecture of the future!

People who blindly focus on OLD benchmarks will be left eating dust!

  • | Post Points: 20
Not Ranked
Posts 7
Points 95
Joined: Dec 2013

I'm typing this up on my Nexus so I apologize if it reads sloppy. How did you get a review/marketing sample? I thought AMD wasn't sending those out until after CES and as far as I know one wasn't on display at APU13? They just showed the video. Can you post a screen shot of the task manager or CPUZ? I agree with you about being the underdog not automatically granting you some level of artificial success. You disputed my claims about Linux and I posted the real world benchmarks from Phoronix backing them up. I also included several other articles that did the same. Those are legitimate real world benchmarks however and not theorized synthetic ones that you and I know are useless outside of trying to convince the uninitiated to buy this over that. I don't agree about the single core performance between Piledriver and Deneb being in favor of the latter. That's an old claim from the hysteria that went viral when Bulldozer came out. Bulldozer sucked, let's not argue that point. In almost every real world benchmark I see or test the Vishera chip wins hands down on multi and single threading...finally. For non-gaming workloads I also disagree with the idea that the 8 core FX is on par with an i5. Did you check out the links I provided? For gaming performance right now 2 threads is almost all you need and Intel leads in single threaded performance. Except Tek Syndicate did show the FX chip beating the Ivy i5 and checking in just behind the i7-3770- I posted that link above (Logan is an Intel Fanboy who openly admits it). But for work oriented tasks like those benchmarks I shared the FX chip's 8 physical integer cores are a force to be reckoned with. Under professional grade software like Blender, Sony Vegas, or even Adobe those 8 physical integer cores give Intel's (Ivy) 8 virtual cores a run for their money since those programs are optimized to use many threads and Hyper Threading isn't that efficient. Some developers purposely disable HT in the BIOS for performance reasons. For virtual machines those 8 physical cores also shine out against the i7s (Ivy) virtual ones. You can actually pass those FX cores on to the VM. Now yes, my 3930K handily beats the FX everywhere but I paid $169 for the FX and $569 for the 3930K. At the end of the day going back to the APU and the article AMD is saying that even with the 10% clock reduction SteamrollerB is still 20% more efficient than Piledriver and the same goes for the iGPU with a 30% increase, after a clock reduction, over the A10's 8670 "Devastator" which was pre-GCN anyways. Kaveri will not be an answer to Haswell. Some sites are claiming it'll compete with a Haswell i5, but I'm very skeptical. I do think it will beat an Ivy Bridge i5 and perhaps give Ivy's i7 a run based on those increases over Piledriver and the inclusion of, again, OpenCL, HSA, & Mantle. You can't keep throwing up the "single core speed" banner when you have variables like those in play. A large part of the reason why Intel is faster is because compilers, Windows, and even some benchmarking software are purposefully optimized to favor Intel's architecture. The lesson that AMD has needed to learn is that it's not necessarily the hardware, but the software that makes the chip great. If AMD could get the same software optimization advantage that Intel has, then the performance differences between the two would shrink. This is what we see under Linux since it's a community driven neutral OS, and behold Intel's i7-3770(K) has barely any lead on the FX Piledriver if any at all. This is what AMD is doing in partnering with the HSA Foundation and Khronos with their OpenCL standard for computation. For gaming Mantle makes the CPU cores almost irrelevant. True Audio is also interesting as is the ARM co-processor. Opinions aside I appreciate the forum we have going on here. I will confess though that you're right about the threads it seems. WCCF updated the article and redacted the "4/8" they had on the slide chart. If I could find a way to upload an image to this thread I'll happily post a screenshot of the original article they posted showing the 8 threads just to back up my sanity. I think WCCF is about to be dropped from my feed...So for that point I stand corrected. 

Not Ranked
Posts 7
Points 95
Joined: Dec 2013

Whoa Ken...where'd you come from?! Nice addition to the discussion. I had no idea about AWS jumping ship. I deal with them quite a bit and they're a pretty big player in the arena. Many developers from what I'm seeing are hyped about HSA and the advantages it holds for future processing. Kaveri's computational power with HSA enabled is impressive. I think you meant public "access" though. It read pretty funny the first time through though regarding their "public assess". Wink 

Not Ranked
Posts 7
Points 95
Joined: Dec 2013

Yeah, you have a point Neil, perhaps we take things too far Cool and it is simply a price/performance issue. Especially since users can't tell the difference for most normal everyday use anyways. It is what it is, but according to another's post: I like taking my super-charged economy car with an Edelbrock manifold and street race it with Ferraris on the weekend. So simple and normal may have already flew out the Windows......at least since "8" anyways Wink 

 

Oh and I apparently like pixie dust and magical Linux-Sutra to justify my hardware habits! How magical...Big Smile

  • | Post Points: 5
Top 100 Contributor
Posts 1,081
Points 11,700
Joined: Jul 2009
Joel H replied on Tue, Dec 3 2013 10:54 PM

Chris,

I was at APU13 and had time enough with some of the test beds to check their basic stats. No screenshots or CPU-Z data, but I got a look at clock speeds and core counts. The second-gen engineering samples were running slightly slower than the 3.7GHz / 4GHz model that's been forecast as the top-end part, but they were quad-core, quad-threaded chips.

If the FX-8350 competes against Intel chips in Linux, I'm not familiar enough with the Linux environment to challenge that. I'll leave that to the Phoronix people, who do it very well.

AMD has priced the FX-8350 at $169 at NewEgg. The cheapest Intel quad-core is $179.That's a reasonably good comparison for multi-threaded workloads, by which I mean I'd expect the eight-core AMD chip to perform approximately like the four-core + HT Intel CPU. As for why I compare against Shanghai, let's use Cinebench 11.5 as a good example. I choose it because it scales well, it's readily available, and the figures are widespread.

Scores drawn from http://anandtech.com/bench/product/203?vs=697

Cinebench 11.5 Single-Thread:

X6 1100T (3.6GHz): 1.10

FX-8350 (4.2GHz): 1.11

Now, divide the CB score by the clock speed in GHz to get the efficiency of the processor in this particular test.

X6 1100T =0.305.

FX-8350 = 0.264

Let's check multi-threading:

X6 1100T: 5.90

FX-8350: 6.89

We can perform the same calculation using the total GHz speed of all the cores. For the Thuban, that's 6x3.3, for Piledriver it's 8x4.0

X6 1100T efficiency: 0.2979

FX-8350 efficiency: 0.215

We can perform this calculation with other tests. Compare the x264 encode tests, which the FX-8350 wins. Divide the frame rates by the clock speed of the chip, and the result is as follows: X6 1100T: 3.87. For Piledriver: 2.8.

Check the 7zip benchmark, which Piledriver also wins. Thuban's 18,416 divided by 19,800 = 0.930. Piledriver's 0.731. 

Once we normalize for clock speed and core count, Thuban is more efficient than Piledriver in the vast majority of tests. The situation could be considered analogous to the P3 / P4 days, when the P3 was far faster than the P4 clock-for-clock, but the P4 eventually pulled ahead thanks to clock frequency and Hyper-Threading. Nonetheless, if the P4 had regained P3 *efficiency* at any point, the result would have been a far faster chip. 

Piledriver is crippled by two things:

1). Poor scaling. This was an *inevitable* consequence of sharing resources. If you run four threads across four modules (1 thread per module) and then run 4 threads on two modules (two threads per module), Piledriver is 15-20% faster in the first configuration than the second. That means an eight-core Piledriver is more like a six-core Thuban, *period*, in almost every workload.

2). It's not as efficient. This is born out amply in single-threaded tests, where a 4.2GHz Piledriver matches a 3.6GHz Thuban.

Therefore: If Kaveri is as efficient as Thuban in single-threaded tests and scales like K10 in multi-threaded tests, the result will be a substantially faster processor. Even when the FX-8350 is faster than the X6 1100T, it's *not* as fast as an eight-core, 4.2GHz K10 would have been.

To sum it allllll up:

If Kaveri increases single-threaded performance and multi-threaded scaling by 15-20%, it will match Thuban on both counts clock-for-clock.

  • | Post Points: 20
Top 100 Contributor
Posts 1,081
Points 11,700
Joined: Jul 2009
Joel H replied on Wed, Dec 4 2013 2:19 AM

Louise,

Don't be ridiculous. The point of clock-normalized comparisons is to compare the efficiency of any two chips in the same workload. I can compare Intel against AMD or the original Pentium against Haswell. Deriving numbers in this fashion does not tell me why performance looks as it does, but it's a standard method of gauging the efficiency of two processors.

If I took the derived efficiency and divided by power consumption multiplied by time-to-execute, I can calculate CPU efficiency per watt.

  • | Post Points: 5
Not Ranked
Posts 7
Points 95
Joined: Dec 2013

Your point regarding efficiency is well made. Maybe I misunderstood you however; I thought you were talking about the overall performance between Deneb and Piledriver. Your clock-for-clock comparison only illustrates one aspect, namely efficiency, of the overall performance between the two architectures. At the end of the day, as your data even shows, Piledriver is still more powerful "overall" than Deneb, but yes, it has its own quirks since it’s a different architecture. Yes though, I completely agree with what you said about P3/P4 comparing it to BD and the K10. It's simply not as efficient clock for clock. Deneb also had better FPUs from what I can tell. Assuming the clock efficiency will only be on par with K10, the inclusion of HSA, OpenCL, and Mantle for gaming, will theoretically solve those serial/parallel processing efficiencies. What I think many don't understand is AMD isn't competing with Intel solely based on x86 raw core performance, which is essentially what Rory Read said. They believe the innovation they need to break through Murphy's ceiling, and keep up with a much more resource laden Intel, is find a way to integrate and leverage multiple system resources to work more closely in unison. Hence the much talked about buzzword "HSA". Intel is doing something similar with Crystal Well and their 128MB eDram L4 cache. Personally I find Intel’s solution very interesting and I'd like to see it compare against Kaveri with HT disabled.  

Many people even within the tech industry don’t use Linux; which is too bad and I think they’re missing out. I natively boot Ubuntu & Arch, and I run Windows just for development, and maybe a game here and there, in virtualization. Linux, in my opinion, is a much more optimized OS and by far the most advanced. I admire Mac OS and AQUA's GUI layer, but at the end of the day while it has many novelties it just doesn’t offer what Linux does...and I just can't justify the cost of a Mac anymore. I am happy that it’s allowed BSD-UNIX to return to the spotlight however. I'm not much of a Windows fan though.

Off subject - I didn't realize you were Joel Hruska from ExtremeTech and ARS. I read your article regarding the 9590 vs. Ivy-E which was well written. The 9590 is a bit ridiculous in my opinion though. I’d still go for the Ivy-E at their price points ($500 for the AMD and $550 for the Intel?). What are your thoughts on the speculation surrounding the decline of x86 and the possible ARM saturation in the low power server market? The latter seems reasonable, but I’d imagine there would need to be many changes made that would cost quite a bit of money. 

Thanks for the information Joel, especially the explanation regarding efficiencies which I had never considered.  

Again, I wrote this at work so I apologize if it’s choppy. 

Normal 0 false false false EN-US X-NONE X-NONE

 

Page 1 of 1 (23 items) | RSS