CPU Startup Combines CPU+DRAM

rated by 0 users
This post has 12 Replies | 3 Followers

Top 10 Contributor
Posts 26,110
Points 1,183,840
Joined: Sep 2007
ForumsAdministrator
News Posted: Sat, Jan 21 2012 8:45 PM
The CPU design firm Venray Technology announced a new product design this week that it claims can deliver enormous performance benefits by combining CPU and DRAM on to a single piece of silicon. We spent some time earlier this fall discussing the new TOMI (Thread Optimized Multiprocessor) with company CTO Russell Fish, but while the idea is interesting; its presentation is marred by questionable conceptualizing and suspect analytics.

The Multicore Problem:

There are three limiting factors, or walls, that limit the scaling of modern microprocessors. First, there's the memory wall, defined as the gap between the CPU and DRAM clock speed. Second, there's the ILP (Instruction Level Parallelism) wall, which refers to the difficulty of decoding enough instructions per clock cycle to keep a core completely busy. Finally, there's the power wall--the faster a CPU is and the more cores it has, the more power it consumes.

Attempting to compensate for one wall often risks running afoul of the other two. Adding more cache to decrease the impact of the CPU/DRAM speed discrepancy adds die complexity and draws more power, as does raising CPU clock speed. Combined, the three walls are a set of fundamental constraints--improving architectural efficiency and moving to a smaller process technology may make the room a bit bigger, but they don't remove the walls themselves.

TOMI attempts to redefine the problem by building a very different type of microprocessor. The TOMI Borealis is built using the same transistor structures as conventional DRAM; the chip trades clock speed and performance for ultra-low low leakage. Its design is, by necessity, extremely simple. Not counting the cache, TOMI is a 22,000 transistor design, as compared to 30,000 transistors for the original ARM2. The company's early prototypes, built on legacy DRAM technology, ran at 500MHz on a 110nm process.



Instead of surrounding a CPU core with a substantial amount of L2 and L3 cache, Venray inserted a CPU core directly into a DRAM design. A TOMI Borealis core connects eight TOMI cores to a 1Gbit DRAM with a total of 16 ICs per 2GB DIMM. This works out to a total of 128 processor cores per DIMM. Because they're built using ultra-low-leakage processes and are so small, such cores cost very little to build and consume small amounts of power (Venray claims power consumption is as low as 23mW per core at 500MHz).

It's an interesting idea.

The Bad:

When your CPU has fewer transistors than an architecture that debuted in 1986, it's a good chance that you left a few things out--like an FPU, branch prediction, pipelining, or any form of speculative execution. Venray may have created a chip with power consumption an order of magnitude lower than anything ARM builds and more memory bandwidth than Intel's highest-end Xeons, but it's an ultra-specialized, ultra-lightweight core that trades 25 years of flexibility and performance for scads of memory bandwidth.



The last few years have seen a dramatic surge in the number of low-power, many-core architectures being floated as the potential future of computing, but Venray's approach relies on the manufacturing expertise of companies who have no experience in building microprocessors and don't normally serve as foundries. This imposes fundamental restrictions on the CPU's ability to scale; DRAM is manufactured using a three layer mask rather than the 10-12 layers Intel and AMD use for their CPUs. Venray already acknowledges that these conditions imposed substantial limitations on the original TOMI design.

Of course, there's still a chance that the TOMI uarch could be effective in certain bandwidth-hungry scenarios--but that's where the Venray Questionable Train goes flying off the track.

The Disingenuous and Questionable



Let's start here. In a graph like this, you expect the two bars to represent the same systems being compared across three different characteristics. That's not the case. When we spoke to Russell Fish in late November, he pointed us to this publicly available document and claimed that the results came from a customer with 384 2.1GHz Xeons. There's no such thing as an S5620 Xeon and even if we grant that he meant the E5620 CPU, that's a 2.4GHz chip.

The "Power consumption" graphs show Oracle's maximum power consumption for a system with 10x Xeon E7-8870s, 168 dedicated SQL processors, 5.3TB (yes, TB) of Flash and 15x 10,000 RPM hard drives. It's not only a worst-case figure, it's a figure utterly unrelated to the workload shown in the Performance comparison. Furthermore, given that each Xeon E7-8870 has a 130W TDP, ten of them only come out to 1.3kW--Oracle's 17.7kW figure means that the overwhelming majority of the cabinet's power consumption is driven by components other than its CPUs.

The only existing TOMI chips are prototypes built on a 110nm process. Venray's power figures are for a 42nm part -- which means that neither side of the comparison is anything more than a made-up number.

In his literature, Fish makes his points about power walls by referring to unverified claims that prototype 90nm Tejas chips drew 150W at 2.8GHz back in 2004. That's like arguing that Ford can't build a decent car because the Edsel stunk.

After reading about the technology, you might think Venray was planning to market a small chip to high-end HPC niche markets... and you'd be wrong. The company expects the following to occur as a result of this revolutionary architecture (organized by least-to-most creepy):

  • Computer speech will be so common that devices will talk to other devices in the presence of their users.
  • Your cell phone camera will recognize the face of anyone it sees and scan the computer cloud for backround red flags as well as six degrees of separation
  • Common commands will be reduced to short verbal cues like clicking your tongue or sucking your lips
  • Your personal history will be displayed for one and all to see...women will create search engines to find eligible, prosperous men. Men will create search engines to qualify women. Criminals will find their jobs much more difficult because their history will be immediately known to anyone who encounters them.
  • TOMI Technology will be built on flash memories creating the elemental unit of a learning machine... the machines will be able to self organize, build robust communicating structures, and collaborate to perform tasks.
  • A disposable diaper company will give away TOMI enabled teddy bears that teach reading and arithmetic. It will be able to identify specific children... and from time to time remind Mom to buy a product. The bear will also diagnose a raspy throat, a cough, or runny nose.
Conclusion:

Fish has spent decades in the microprocessor industry--he invented the first CPU to use a clock multiplier in conjunction with Chuck H. Moore--but his vision of the future is, in our opinion, distorted enough to scare mad dogs and Englishmen.

His idea for a CPU architecture is interesting, even underneath the obfuscation and questionable representation, but too practically limited to ever take off. Google, an enthusiastic and dedicated proponent of energy efficient, multi-core research said it best in a paper titled "Brawny cores still beat wimpy cores, most of the time."

"Once a chip’s single-core performance lags by more than a factor of two or so behind the higher end of current-generation commodity processors, making a business case for switching to the wimpy system becomes increasingly difficult... So go forth and multiply your cores, but do it in moderation, or the sea of wimpy cores will stick to your programmers’ boots like clay."
  • | Post Points: 125
Top 100 Contributor
Posts 1,016
Points 10,925
Joined: Dec 2010
Location: Mcallen, Texas
OSunday replied on Sun, Jan 22 2012 9:09 PM

SO much potential in the opening lines, that slowly flowed down the drain as the article progressed...

Good in theory but micro processing technology is best left in the hands of the people who've been doing what they do successfully and satisfying Moore's Law (that the number of transistors that can be placed inexpensively on an integrated circuit doubles approximately every two years, a.k.a. processing power doubles every two years), AMD and Intel

  • | Post Points: 5
Top 50 Contributor
Posts 3,236
Points 37,910
Joined: Mar 2010
AKwyn replied on Sun, Jan 22 2012 9:42 PM

Good idea but it's sort of super optimistic, which kind of dilutes what he was writing about. I have to disagree about the whole thing mainly because if it goes through then it'll limit the upgradability of RAM; I mean if they implemented that method then basically there'd be no way to upgrade the RAM without there being a "wall" and even with the new method, it's still possible that there'd be a wall due to the lack of ways they can connect the installed RAM to the CPU. It's a good idea but I think that it needs a bit more work.

 

"The future starts with you; now start posting more!"

  • | Post Points: 20
Top 10 Contributor
Posts 8,622
Points 103,905
Joined: Apr 2009
Location: Shenandoah Valley, Virginia
MembershipAdministrator
Moderator
realneil replied on Sun, Jan 22 2012 11:16 PM

I don't want it if it can't run Crysis on "full".

Dogs are great judges of character, and if your dog doesn't like somebody being around, you shouldn't trust them.

  • | Post Points: 5
Top 500 Contributor
Posts 272
Points 2,170
Joined: Jan 2012
Location: Mississauga, Ontario
karanm replied on Sun, Jan 22 2012 11:31 PM

"TOMI Technology will be built on flash memories creating the elemental unit of a learning machine... the machines will be able to self organize, build robust communicating structures, and collaborate to perform tasks."--> thus skynet will be born lol.

  • | Post Points: 5
Top 150 Contributor
Posts 495
Points 4,825
Joined: Jan 2012
Location: Brighton, MA

well i have to say it---this is crazyyyyyyyyyyyyyy and impressive,never passed tru my mind DRAM + processor same die O_o....Also, RAM limited what if you want more ram? you have to buy another processor ? not cooo + guys remember this is new, which means of course it will have its limits and we never know what can this bring in the future? remember before it was one core O_o now?  you answer it.

  • | Post Points: 5
Top 100 Contributor
Posts 1,072
Points 11,625
Joined: Jul 2009
Joel H replied on Mon, Jan 23 2012 10:15 AM

I don't think DIMM size is the real problem here. In high-end systems, you'd solve that by having normal DIMMs and TOMI slots. I suppose it'd be possible to build a slot that could take either/or, but you'd probably just separate them.

Also, keep in mind that server DIMMs are up to 16GB these days. Certainly large enough to allow for a test system that utilized this sort of concept without running into a RAM wall.

  • | Post Points: 5
Not Ranked
Posts 1
Points 5
Joined: Jan 2012
ALuca replied on Mon, Jan 23 2012 10:57 AM

This is a nice concept. The problem is that people try to make a link with actual running systems. We still struggle writing decent software for multicore cpus with single core designed programming languages and libs. This is why we still have under 10 cores per processor on the market.

I belive a LEGO approach will do just fine with TOMI concept. All designed from scratch. We have the tools to build other tools.

For new hardware, new programming language, new goals. Why cant TOMI be designed to work in a team just like server clouds today but on the same mainboard?

  • | Post Points: 5
Top 500 Contributor
Posts 123
Points 910
Joined: Oct 2011
Location: Canada

The idea may not be too great upon close examination, but I like that people are considering other methods. x86 is becoming limited so perhaps it's time to consider alternate tech.

  • | Post Points: 20
Top 150 Contributor
Posts 495
Points 4,825
Joined: Jan 2012
Location: Brighton, MA

for those who doesn't know how TOMI (Thread Optimized Multiprocessor) works here is a video from their site pretty detailed and easy to understand https://www.venraytechnology.com/index.htm

  • | Post Points: 20
Not Ranked
Posts 3
Points 15
Joined: Apr 2012
Alannis replied on Tue, Apr 17 2012 9:26 AM

what can this bring in the future? remember before it was one core O_o now?  you answer it.

  • | Post Points: 5
Not Ranked
Posts 1
Points 20
Joined: Feb 2013

The article jumps too quickly to the Google "brawny cores beat wimpy cores" paper. These are wimpy cores in one dimension: roughly instructions per cycle. They are brawny cores in another dimension: memory bus bandwidth. Google regularly writes single core algorithms that saturate the memory bus. Stick 24 of those processes on a hyperthreaded multi-core Intel processor, and you might as well be using 24 wimpy cores while twiddling your thumbs waiting for memory.

The TOMI processor does not reduce power consumption in the traditional way by scaling back the clock frequency; it reduces power consumption by eliminating the overheads of transferring data between a memory chip and a cpu chip.

We do not struggle to write software for multi-core chips. Running multiple daemon processes is dead easy. Chrome forks off separate processes for each tab. We don't need your text editor to run faster. We might need your movie editor to run faster. We might want to run more daemons on your desktop, or, more likely, phone so that your computer can keep you better connected to more information. And we might want to run much more advanced applications on your computer than you've previously used before.

Yes, moving back to a 32-bit world and a sub-4GByte address space would be a pain. But, geeze, we buy the memory and get the cpu for free. Surely we can find a way to take advantage of all that cpu.

  • | Post Points: 20
Not Ranked
Posts 4
Points 35
Joined: Aug 2011
iqbal51 replied on Fri, Jul 19 2013 10:39 AM

We do not struggle to write software for multi-core chips. Running multiple daemon processes is dead easy. Chrome forks off separate processes for each tab. We don't need your text editor to run faster. We might need your movie editor to run faster. We might want to run more daemons on your desktop, or, more likely, phone so that your computer can keep you better connected to more information. And we might want to run much more advanced applications on your computer than you've previously used before.

  • | Post Points: 5
Page 1 of 1 (13 items) | RSS