Notiusweb wrote:by linvanchene » Wed Apr 06, 2016 5:35pm
I got the impression that they wanted to focus on developers only at GTC.
Yeah, wonder which business segment fuels Nvidia more, the consumer-level products, or the big enterprise/academia compute products. I would have been interested to see them do something new with the A.I. UI, as opposed to hearing specs and an update on self driving cars.
When he said about the race car, "that's a great looking GPU,". I was waiting for him to say next, "and here's an even better one for you...introducing the new PTX-150, 32GB VRam, Cuda 6,144, core clock 2,064..."
The big question in my mind was, is Pascal 'ready' or is it 'delayed', because it had been touted as this amazing successor to the previous generation consumer level GPU and now it only got mentioned as part of the enterprise products. I wonder why they didn't even make mention of an upcoming consumer level GPU. I say this as someone who will buy the next GPU...I'm all for AI and the coming singularity, but hurry up...too slow!
Those business segments are likely to pay more per unit than we'd every imagine. Consider how many cars and other units that could use this technology. Junish is now for them.
http://wccftech.com/nvidia-tesla-p100-gp100-june-2016/ . According to Wccftech, Computex 2016 [late May -
http://www.businesswire.com/news/home/2 ... e-COMPUTEX ] may hold the announcement for GTX users. So, it's likely to poke it's head up this summer. But Titan's aren't first release cards. In the meantime, here're some things to consider:
1)
http://cdn.wccftech.com/wp-content/uplo ... -GPU_1.jpg
2) GPU COMPARISON Tesla Maxwell GM200* VS. Tesla Pascal GP100*
SMs =
24 VS.
56
TPCs =
24 VS.
28
FP32 CUDA Cores / SM =
128 VS.
64
FP32 CUDA Cores / GPU =
3072 VS.
3584
FP64 CUDA Cores / SM =
4 VS.
32
FP64 CUDA Cores / GPU =
96 VS.
1792
Base Clock =
948 MHz VS.
1328 MHz
GPU Boost Clock =
1114 MHz VS.
1480 MHz
FP64 GFLOPs =
213 VS.
5304
Texture Units =
192 VS.
224
Memory Interface =
384-bit GDDR5 VS.
4096-bit HBM2
Memory Size = Up to 24 GB VS. 16 GB
L2 Cache Size =
3072 KB VS.
4096 KB
Register File Size / SM = 256 KB VS. 256 KB
Register File Size / GPU =
6144 KB VS.
14336 KB
TDP =
250 Watts VS.
300 Watts
Transistors = 8 billion VS. 15.3 billion
GPU Die Size = 601 mm² VS. 610 mm²
Manufacturing Process = 28-nm VS. 16-nm
GPU COMPARISON Maxwell GM200 VS. Pascal GP100
Compute Capability =
5.3 VS.
6.0
Threads/Warp = 32 VS. 32
Max Warps/Multiprocessor = 64 VS. 64
Max Threads/Multiprocessor = 2048 VS. 2048
Max Thread Blocks/Multiprocessor = 32 VS. 32
Max 32-bit Registers/SM = 65536 VS. 65536
Max Registers/Block =
32768 VS.
65536
Max Registers/Thread = 255 VS. 255
Max Thread Block Size = 1024 VS. 1024
CUDA Cores/SM =
28 VS.
64
Shared Memory Size/SM Configurations (bytes) =
96K VS.
64K
Info Source for Tesla P100 Comparison is [
http://wccftech.com/nvidia-gp100-pascal ... uda-cores/ ] . I added "*" to highlight factor changes that interested me. Where the "*" appears in a header, than means that every change below that header was of interest to me. Also, I bolded the ones that were most interesting to me.
3) We know that, historically, the GTX card component specs that best approximate those of Tesla's is the Titan. Additionally, we know that, historically, the GTXs that compare most closely to the Teslas run lots faster. So GTX Titan Pascal owners may likely experience the performance of a card that has, among other things, about 117% more CUDA cores {with fewer cores per SM, but 233% more SMs} clocked much higher {1.4x} (like about 1500 MHz base for Titans and, even a bit higher, for TIs), with 133% more memory [with a 133% larger cache] that is lots faster, all on a GTX card with, at a minimum, two eight pin PCIe power connectors
*/. I still believe that all of the changes will result in a GTX Titan Pascal that is twice as fast at rendering as a Titan X, especial if our 3d rendering software authors can find a way to take full advantage of all that mixed precision computing offers, especially FP16
**/ (twice as fast as FP32), to the maximum extent possible.
*/ Just like my GTX 590s, 690s and Titan Zs.
**/ "Deep learning workloads represent a perfect scenario where mixed precision can be leveraged to pretty much double the performance. These workloads inherently require less precision and using FP16 instructions would result in very significant reductions in memory usage that will allow deep learning to occur in considerably larger networks. Essentially allowing machines to learn much more effectively.
Because each Pascal CUDA core can run two FP16 operations at once and each 32-bit register can store two FP16 values at once, the GP100 GPU can effectively do FP16 compute work at twice the speed of FP32, and this is where that doubling in performance comes from." [
http://wccftech.com/nvidia-gp100-pascal ... uda-cores/ ]
Read more:
http://wccftech.com/nvidia-gp100-pascal ... z454TvXAuK .
P.S. Interesting Iray Development [
http://www.nvidia.com/object/nvidia-iray.html ] - 90 day free trial of Iray:
http://www.nvidia.com/object/iray-plugin-trial.html .
Because I have 180+ GPU processers in 16 tweaked/multiOS systems - Character limit prevents detailed stats.