Refreshing Comments...

In case you're wondering about pricing: the V100 costs $1260/GPU/month and this A100 will have about 2.5x its performance [1]. A n2d-highmem-96 instance is $4377 per month. So for the maxed out a2-megagpu-16g I would expect around $54k per month before usage discounts etc.

1. https://developer.nvidia.com/blog/nvidia-ampere-architecture...

Disclosure: I work on Google Cloud.

I think the right way to think about the economics here is either “I would pay $X/hr for this short-lived job” or “I want to compare with buying it” (3-yr committed use discount in our case, RIs / Instance Savings Plan for AWS). Unless you are an ML research lab (Google Brain, FAIR, OpenAI, etc.) or an HPC style site sharing these, you won’t get 100% utilization out of your “I just bought it” purchase. Worse, in ML land, accounting math about N-year depreciation is pretty bogus: if the A100 is 2.5x faster, you’d have been better off with a 1-yr CUD on GCP and refreshing, rather than buying Voltas last year.

One amusing thing that’s not clear about “just buy a DGX” is that many people can’t even rack one of these. At 400 watts per A100, our 16x variant is 6.4 kW of GPUs. That’s before the rest of the system, etc. but there are (sadly) a lot of racks in the world that just can’t handle that.

Many excellent points and the one of interest is power usage - it'snot just the cost of buying the kit, but also powering, maintaining (admin) aspects. So the Bottom line of ownership is far bigger and as you say, you really need to get 100% utilisation to capitalise in that level of outlay. Hence the on-demand cloud option on many levels becomes cost effective, even if it does seem not cheap. Sure it is not cheap, but overall, it can and does work out way cheaper for many use cases.

Makes you wonder what kind of power costs you would incur running one 100% utilisation. Certainly even with best prices, be looking at several thousand a year I would of thought, that's not even factoring in provision. Which would mean 3 phase power for that type of load and then you have to balance out the phases. So many little details that become more an issue when you start getting to datacenter level power usage. Then UPS load/capacity costs/planning, networking. So whilst the costs of these units are high, the other costs that add up, sure do add up fast.

Oh, I didn’t mean to suggest that the powering of the system was unreasonable. For example, Hetzner will charge you about $250 per rack per month and between 15 and 30 cents (USD roughly) per kWh [1]. I assume they include the cooling and so on, but that means you’re basically looking at a ~4 kW draw for the usual 8xA100 system with some cores and RAM. So budget just under $1/hr for the power and $1000/month all in (depending on location). Most colo power in the US is closer to the Finland calculation than the Germany one.

I’d say keeping a spare system in case of failure is probably a bigger deal than the $10k/yr to house it :).

[1] https://www.hetzner.com/colocation

Eh, the Titan V I bought last year broke even vs AWS inside of a month ($2000 vs $2234).

Obviously there are factors going in to that -- I could live without paying the Tesla tax (didn't need to virtualize, didn't need the vram, did need the fp64), I bought used, I didn't have a problem keeping it fed, I didn't need to burst, etc, but my point is that for some GPU workloads the cloud GPUs are really expensive and the break-even utilization is far south of 100%, more like 5%.

This comes up in nearly every thread, but NVIDIA’s EULA doesn’t permit using their drivers on “consumer” parts in “datacenters”. So you didn’t include the main reason for the Tesla tax: compliance :).

The “consumer” parts are certainly popular ML workstations, and rightly so.

I'm currently renting LambdaLabs V100 instances at 100% utilization for training https://vo.codes

It's really expensive, and I think I should lean into buying hardware at this point.

I want to build a high end GPU rig, but was wondering how easy the setup was. I've only built "consumer" systems before (2x 1080Ti). Is there any appreciable difference?

Do you have a single card? Multiple? What motherboard do you use?

Do you have any takeaways or resources you can share?

I only have one card. I swapped the Titan V with my 1x1080 in my old, cheap motherboard and it just worked. I had to hook up the water cooling, but I did that because it came with a block, not because I optimized the thermal design. To verify motherboard compatibility, I looked up in the nvidia specs how many PCIe lanes and at what speed I should expect and confirmed in HWinfo that they were active in that configuration -- much like I'd look at a network interface to make sure my 10/5/2.5GbE hadn't turned into 1GbE on account of gremlins in the wires.

I'm not using this for machine learning, so you might want to talk to someone who is before pulling the trigger. In particular, my need for fp64 made the choice of Titan V completely trivial, whereas in ML you might have to engage brain cells to pick a card or make a wait/buy determination.

That has more to do with Nvidia policy than cloud but I completely agree. For someone who can just plop 4 Geforce GPUs into their PC it's a much better deal than AWS.
You're definitely not going to be able to shove it into just any datacenter, but there's enough of a demand for it where specialty datacentes are popping up.

We're in Colovore, which has a fantastic power density and is running roughly 1000k DGXs in their datacenter. It really wasn't all that difficult to get up and running. For us it made total sense, but we utilize physical hardware completely and have to scale into the cloud fairly regularly.

Right. I’m not saying you can’t, but I am saying that when someone goes to their IT team and says “Hey, we want to buy a few of these” they’ll get met with a groan :).

Did you move all your other gear into Colovore? (That’s one of the challenges, you often need to be close to the rest of your systems / data)

We have a similar situation to yours and have had a very positive experience with Colovore.

We also run at 100% utilization 24/7 and frequently use the cloud for burst before we go out and buy more cards/servers.

Colovore was super easy to get up and running and we will save at a minimum hundreds of thousands of dollars with this setup over exclusively using cloud instances.

> Worse, in ML land, accounting math about N-year depreciation is pretty bogus: if the A100 is 2.5x faster, you’d have been better off with a 1-yr CUD on GCP and refreshing, rather than buying Voltas last year.

Oh, so cloud providers are just giving away free resources? How generous!

Seriously, let's do the math:

If a V100 instance cost $9000 new and you bought it a year ago, you could still sell it today for over $3000. On AWS, an instance on a 1-yr CUD costs more than $1400/month, for a total of over $16000. You don't even need 50% utilization to break even. It doesn't matter how fast the A100 is.

In essence, this is a supercomputer.
Kind of. I think a giant pile of them hooked together with good networking would be :).

But also, you can rent a single slice of one for tens of seconds and then walk away. I personally probably do about 20-ish GPU hours a month while dabbling (and only because I’m lazy and have my GPUs attached while I’m working, even if it’s just debugging the Python bits). For a V100 that’s $50/month, which is in the noise compared to dealing with owning infrastructure (and would be even less if I’d get by with a T4).

I feel like if you’re maxing one of these clusters for an entire month at a time, it might still be cheaper to just buy one.
Ah, but when you use a large cloud, you're not just paying for the cost of the server. You're paying for access to a large high performance network and storage system that is extremely close to your server. Replicating those properties will cost you far more (well, it depends on your specific needs) than the cloud provider.
That's true for any cloud server.
untrue

Intel Xeon Gold 5120 14Core @ £1200 used

renting 1Core @ £1.20/mo.

would take 6 years to fully pay for that single core alone, and that's excluding the 512MB RAM, 10GB SSD and unlimited 400Mb/s bandwidth

Where are you getting $1.20/mo on 1 core/512MB ram/10GB ssd and half a GB bandiwdth? Amazon for example was something like $50/mo for bandwidth alone
it's not AWS-grade, just a budget VPS, but has been just as reliable and performant so far. it's out there should you care to seek. the correct currency of £ will help
You are renting an overprovisioned timeslice, not a dedicated core. Even Hetzner's time slice instances are twice the price. Their dedicated CPU instances are almost 10x the price.
possibly, but nothing to suggest that 6 months in. they suit the use cases perfectly
It's definitely a timeslice, and that throws off your comparison wildly. Otherwise you'd be able to rent a 32 core dedicated machine at the same place for $48 a month. Can you?
it mentions "dedicated resources", but it's unclear whether this applies to CPU

it might throw the comparison off if there was variation in performance at different times, but there isn't

cores scale with RAM, you cannot rent 32 with 512MB

They're probably talking about dedicated RAM, although even then I wouldn't trust it not to be overprovisioned up to 2:1 on NVME swap.

Hetzner dedicated cores come with 4gb of memory per vCPU for reference.

1.2 GBP/mo for a dedicated core is wildly off market price. There's no way they'd be able to make a profit, so either they're going out of business tomorrow or it's not dedicated.

hourly benchmarks confirm it's definitely DDR4 RAM, standard SATA SSD with acceptable IOPS, and decent enough network. it's the budget branch of the biggest hosting provider so maybe they can afford to. how long it lasts only time will tell
Compare it to gattling guns GAU-8. They shoot at 3900 rpm but only for a few seconds. The reason is that a firing solution may only be correct for a split second. It's all about peak performance because you only shoot when you really need to.

The same thing happens with these large instances. Because these instances are so much bigger that you won't use them for the entire month. You'll use them to get results within hours or days.

I understand it's exciting to see introductions of new machine types and new GPUs, but for it to mean anything Google should instead get its house in order on the GPUs they already offer. Getting an n1 instance with a Tesla T4 GPU in any datacenter I've tried has a <50% success rate on any given day ("resource unavailable" more often than not, they just don't seem to have enough of them), which is _hugely_ damaging to our ability to rely on the cloud for our workload. Worse, there's no way for me to work around it: I'd be willing to switch zones, or machine type, or GPU type, but there is no dashboard or support guidance that'll tell me if there's any such configuration that'll be reliably available.

Because of that, seeing this A100 announcement is just a bummer, as I fear it'll be just another "resource unavailable" GPU...

Disclaimer: I work at google cloud.

Sorry to hear you have experienced this.

Customers can experience stock outs sometimes based on a variety of factors but we can surely help you out as we have T4 GPU capacity available and like you said, we may direct you to a different zone or region. Open up a support case on the issue and we can help you out.

I think this is the most in-depth article on Ampere: https://developer.nvidia.com/blog/nvidia-ampere-architecture...

Lots of architectural changes like MIG, new floating point formats, etc. Great to see GCP getting VMs out pretty soon after launch so people can start kicking the tires.

Very cool! Does anyone know how is software support for all these features? It seems that TF doesn’t support the TP32/16 types as of yet. Is this something only CUDA engineers can use right now?

It does seem a little fishy to me that NVIDIA often boasts with figures like 10x performance upgrade whilst in practice those are only possible if you use one of their non-default float types which are hardly supported in most deep learning libraries :(

Both PyTorch and Tensorflow teams have announced they'll support TF32. By design, it interoperates well with existing code, since calling code can just treat it as a regular FP32 value.
I wish google / aws would avoid the overlapping names where possible.

the "A" series on AWS = AMD instances

The "A" series on GCP = Nvidia instances.

I know - probably on no ones radar at all :)

Disclosure: I work on Google Cloud.

Even worse is that for GCE, A was for AMD originally (and N was for iNtel). In any case, this A is for Accelerator.

Are there any papers or blogs about how these GPUs are attached to the host? I find it interesting that you can get a VM with 96 vCPUs, which I assume amounts to a whole box (2x24-core hyperthreaded Xeon CPUs?) but either 8 or 16 GPUs. How does that keep from stranding 8 GPUs? Is there some kind of rack-wide PCIe switch that can attach GPUs to various hosts or ??
We sadly don’t talk about how we rack these at all, but the folks at Facebook have made their OCP designs public for vaguely similar systems.

However, I’ll note that the 16 A100s here are way more expensive than the cpu cores (and we can just run vanilla VMs on those left over cores if really needed).

Worse than that, "A"/"a" on AWS can actually mean AMD or ARM. An "a1.large" instance is an instance with a first-gen Graviton ARM CPU (whereas the second-gen Graviton2 ARM CPUs are something like "c6g.large"). A "c5a.large" is an instance with a x86 AMD CPU.
I've only had a very slight dip into the water of the GPGPU world in the past 5 years, but it is obvious that Nvidia have the lion's share of the market when it comes to hosted solutions.

Still not sure why competitors don't have their GPU's offered in cloud services. At the time, it seemed the alternatives offered a more economical alternative. I was building a GPU version of Hashcash at the time fwiw.

I think the biggest problem for ml practitioners is that amd doesn't take the middle layer seriously. Last we tried, the and equivalent of cuda/cudnn (rocm) is riddled with stability bugs so it superficially works for quick tasks but if you need to do long runs it's an open question of whether or not it will work.

This is really frustrating because in theory AMD gpgpus have a better architecture for ml.

My guess is that it's also tough to get deals with AMD for their workstation class products which are competitive with Nvidia's rtx line, and the price/performance delta (both in terms of unit cost and power cost) is not huge.

There are some consumer class amd gpus that have attractive price/performance, but they are being sunsetted and the future driver support is questionable. I looked at the Radeon 7, but it's bazel is a few millimeters out of pci spec so it literally doesn't fit in a 8u GPU chassis, and it's an architecture with an uncertain future.

Nvidia has by far the best API/software support. CUDA is much nicer to use than OpenCL. Therefore, most software is written for Nvidia GPUs and it makes way more sense to host Nvidia.
> CUDA is much nicer to use than OpenCL

I think this is also part of the problem. AMD has a nearly function-for-function compatible GPGPU stack with CUDA in ROCm/HIP that fixes many of the pitfalls in OpenCL, and even provides automated translation tools for CUDA developers switching over.

However, this stack is so poorly documented and marketed that the mass majority of developers still associate AMD == OpenCL. Add onto that the lack of Windows, MacOS and Navi GPU support, and you've removed a ton of opportunities/incentives for folks to experiment with the platform.

> Still not sure why competitors don't have their GPU's offered in cloud services.

AMD has many GPGPUs, but while they have kind of ok-ish raw numbers, their software stack makes quite poor use of them.

Users care about perf/$, and nvidia is king there. For deep-learning and other apps, hardware without software is useless.

And if the population is not uniform (and it never is, as its a growing market and new resources have different feature sets) it can be hard to deploy some hardware-sensitive applications. We needed hardware support for encryption in our teleconferencing mix node. Had to contract specially, so we could deploy only on hardware that supported that (was modern enough).
There really aren't any serious competitors. AMD and Intel have GPU offerings, but they are too busy fighting for laptop, server, desktop market. Intel's graphics IP is just a value-add for chipsets to keep competitors out of the chipset market. Any new investment for startups is going toward ML acceleration, which is like GPGPU but with different topology for NN optimization. So to answer your Q: no one can compete with NVIDIA, so no one wants to invest there. IMHO, of course.
Given the replies, seems overwhelmingly it's about the software interface rather than the hardware.

I'd got the impression that Vulkan would bridge that gap, but it's quite a low-level API and verbose. Would be interesting to hear comments on that, perhaps just for the case of the hosted GPU market being more competitive.

Not really, although off topic, one of the sessions at GTC 2020 was how Otoy gave up on Vulkan for their Octane Render, and decided to use CUDA instead, when Optix 7 added CUDA based rendering support.

Vulkan also suffers from being too late to the game.

NVidia moved CUDA into a polyglot runtime early in the game (around version 3 I think), and took a 10 year effort to redesign the hardware for optimal mapping to C++ memory model.

Vulkan might eventually get it via SYCL, now that SYSCL has turned into backend agnostic mappings for C++ heterogeneous computing.

But even if SYSCL turns out to be adopted across the industry, Vulkan will just be another backend alongside CUDA, Rom, FPGAs,....

As a ML beginner, seeing this new offerings, does a local setup(2x nvidia 2080 8gb) make sense or it's better to learn using the cloud and let hardware get even cheaper.

The cloud it's a bit scary from learners perspective because it's not exactly clear how much power one would need to practice the core concepts and see actual results.

In the other hand a pc build, it's an upfront investment, less scary because it's a fix cost, but also feels risky in case things progress quickly and hardware gets outdate soon.

Disclosure: I work on Google Cloud.

Honestly, I’d start with Colab until you decide you need “dedicated hardware”. It’s better to focus on learning before you decide “Okay, I’m serious about this now”.

I would completely avoid things like this A2 instance (and other accelerated instances on other cloud providers) if you’re a self-described beginner. The GPU-accelerated instances in public clouds are terrifyingly expensive and way overkill for any kind of beginner/practice work.

Only approach these types of instances once you actually have a handle on how much power you need.

If you have those GPUs already use them. If you are looking into buying them, check out Paperspace or Colab first. Think about the fact, that you will probably not use your own cards the whole time and with Paperspace you only pay for your usage (and some fixed amount for disk usage I think) and Colab is even cheaper in that regard.