Semi-private LLM Hosting?
Has anyone seen or heard of a "semi-private LLM" hosting provider? I'm hoping for a middle ground between renting a GPU-enabled server for $200/mo for a fully private LLM and sending all my data to OpenAI (or similar). I'm assuming one of the myriad of open source LLMs could be run in an environment like this - assuming they have a commercial license rider. Is anyone doing something like this?
I know Salad exists, and that seems like a good option for content generation since it has hourly billing, but in order to respond to an email or ticket, it would need to be on 24/7.
Comments
Hello @rockinmusicgv possible please tell my if you like dedicated server with GPU Nvidia Tesla m10
How often do you think you'll use the service? And how good do you need the GPU to be? I have some providers in mind for hourly billing.
Salad's really cheap indeed.
datalix's aff #1 fan
This might be sort of what Groq (not Grok) does. I'm not sure about the 'sending data' part, though.
Otherwise I expect we're going to see providers start to acquire good deals on bulk GPU rentals and begin to offer "shared" services like this that are always on.
NVMe VPS | Ryzen 7950X VDS | Dedicated Servers -- Crunchbits.com
I think Salad's possible use case (as I see it) is for training and for content and image generation. They have reasonable prices, but it's hourly billing. After two years, it's more reasonable to get a $7000 Mac than to use cloud GPU hosting --- Not something I'm actually against, I just don't have $7000 to get a Mac with 200GB of ram.
There are a lot of API providers like Groq. OpenAI, Mistral, and Anthropic all have their own API access. I want to avoid using the API services since it requires sending the prompts, replies, and so on to their servers.
You could setup a pool where like 5 people pay for a gpu server and use them together.
Should bring the cost down to like 20-30$ per month.
get from @crunchbits and you set
Free NAT KVM | Free NAT LXC | Bobr
I checked their site and it says they're out of stock on the 3090. Renting from a provider here (which means they have a link to a community) would be a plus. Hopefully they'll be in stock soon.
Yea, the 3090 is the sweet spot right now, A.I stuff is pretty fast on it, despite its already over 3 years old.
Mostly out of stock though however cheap compared to other deals, might be worth getting a GPU by yourself and just pay for power and share it.
Free NAT KVM | Free NAT LXC | Bobr
We've got them, a ton inbound/racking daily at the moment specific to 3090's. Open a ticket/DM/e-mail/Discord and I'll get you taken care of--out of stock because the plan is improving (based on customer feedback) for the same price: More cores and double the ram (don't ban me @bikegremlin ).
I agree. Lot's of requests for 4090's but they're a lot more difficult to properly cool @ full TDP (in bulk) as well as pricing getting a bit out of hand on them. Anyone doing AV1 encoding will be better suited by our other Ada lineup.
NVMe VPS | Ryzen 7950X VDS | Dedicated Servers -- Crunchbits.com
I am going to do it unless I get naked picks of your dog in my inbox within 24 hrs!!!
Free Hosting at YetiNode | Cryptid Security | URL Shortener | LaunchVPS | ExtraVM | Host-C | In the Node, or Out of the Loop?
No, there isn't really an inbetween offering like that. All APIs require sending data, all 24/7 available hosted solution requires paying for the full capacity (i.e. the 200) or selfhost. Closest hybrid I can think of is Cloudflare GPU workers but that is still basically an API.
I'd recommend finding an API provider you're comfortable with. Mistral is in France and should have pretty tight EU privacy rights.
Selfhosting LLMs is super interesting but frankly not particularly competitive on quality or cost.
Do check if your usage case allows AMD though. Some LLMs are now compatible with AMD and the 7900XTX is a much newer/stronger card with also 24GB. Obviously depends on how CUDA dependent you are though.
We (as in, everyone interested in using this) need this to happen. Opening up AMD's newer lineups and/or their Mi series stuff to actually being productive (without an extremely tight niche/customized software stack) would be huge. It keeps inching closer...
NVMe VPS | Ryzen 7950X VDS | Dedicated Servers -- Crunchbits.com
Well I bought 10k worth of AMD shares this morning so they had better make this stick...
Inference is already pretty point & shoot on the 79** I think. Anything outside of that in either usage case (e.g. training) or card less so.
Low End VPS, High End Portfolio.
Yeah inference is a beast, and they were making good gains in 1:1 CUDA stuff (even if a little behind, the cost was in-line). Next steps are either getting it all the way or pushing a CUDA alternative that is just as easy, then somehow making it a viable customer-facing alternative and helping with the messaging.
NVMe VPS | Ryzen 7950X VDS | Dedicated Servers -- Crunchbits.com
A cloud worker would be more in line with what I'm looking for, assuming the data can be encoded in transit. One thing I don't like about the ai providers is they have a vested interest in monitoring the output of their services. They don't want people using their services for fraud, and there have already been voice ai spearfishing attacks so they do have a good reason to keep tabs on people. The downside is that means customer data (even if it's just a name) gets transmitted back and forth and viewer by the provider.
The LES mod team and me when the word "double" is posted on the forum:
Relja of House Novović, the First of His Name, King of the Plains, the Breaker of Chains, WirMach Wolves pack member
BikeGremlin's web-hosting reviews