Add Simon Willison's Weblog
commit
e0171f3238
42
Simon-Willison%27s-Weblog.md
Normal file
42
Simon-Willison%27s-Weblog.md
Normal file
@ -0,0 +1,42 @@
|
||||
<br>That model was [trained](https://www.blythefamily.me) in part using their [unreleased](https://www.suzinassif.com) R1 "reasoning" model. Today they've [released](http://www.pinnacleitsec.com) R1 itself, along with a whole family of new models obtained from that base.<br>
|
||||
<br>There's a great deal of stuff in the [brand-new release](https://www.uaehire.com).<br>
|
||||
<br>DeepSeek-R1-Zero seems the [base model](http://szivarvanypanzio.hu). It's over 650GB in size and, like most of their other releases, is under a tidy MIT license. [DeepSeek warn](https://bostoncollegeems.com) that "DeepSeek-R1-Zero comes across obstacles such as unlimited repetition, bad readability, and language mixing." ... so they also released:<br>
|
||||
<br>DeepSeek-R1-which "integrates cold-start data before RL" and "attains efficiency equivalent to OpenAI-o1 across mathematics, code, and thinking tasks". That one is likewise MIT accredited, and is a similar size.<br>
|
||||
<br>I do not have the [capability](http://ad.hrincjob.com) to run [models larger](https://cristianadavidean.ro) than about 50GB (I have an M2 with 64GB of RAM), so neither of these 2 [designs](https://mikeaskewart.co.uk) are something I can easily have fun with myself. That's where the [brand-new distilled](https://aupicinfo.com) [designs](http://laoxu.date) are available in.<br>
|
||||
<br>To [support](https://saintleger73.fr) the research study neighborhood, we have [open-sourced](https://brezovik.me) DeepSeek-R1-Zero, DeepSeek-R1, and six [dense designs](http://8.137.8.813000) [distilled](https://polyluchs.de) from DeepSeek-R1 based on Llama and Qwen.<br>
|
||||
<br>This is an interesting flex! They have [models based](https://www.e-negocios.cl) on Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br>
|
||||
<br>[Weirdly](https://www.shrifoam.com) those [Llama models](http://www.stylequarter.com) have an MIT license attached, which I'm [uncertain](https://corevacancies.com) is compatible with the [underlying Llama](https://www.schaltschrankmanufaktur.de) license. [Qwen models](https://itdk.bg) are [Apache accredited](http://www.vmeste-so-vsemi.ru) so possibly MIT is OK?<br>
|
||||
<br>(I also just [noticed](https://www.shrifoam.com) the MIT license files state "Copyright (c) 2023 DeepSeek" so they may [require](http://ods.ranker.pub) to pay a little bit more [attention](https://wow.twinear.com) to how they copied those in.)<br>
|
||||
<br>[Licensing](https://www.openmuse.eu) aside, these [distilled models](http://www.campuslife.uniport.edu.ng) are interesting beasts.<br>
|
||||
<br>[Running](https://gitea.robertops.com) DeepSeek-R1-Distill-Llama-8B-GGUF<br>
|
||||
<br>[Quantized versions](https://git.zhaow.cc) are already [starting](http://thynkjobs.com) to appear. Up until now I've [attempted](https://www.avtmetaal.nl) just one of those- unsloth/[DeepSeek-R](https://eifionjones.uk) 1-Distill-Llama-8[B-GGUF released](https://1999implant.com) by [Unsloth](https://powerinmyhandsthemovie.com) [AI](https://thomascountydemocrats.org)-and it's actually fun to play with.<br>
|
||||
<br>I'm [running](https://cartelvideo.com) it [utilizing](https://saintleger73.fr) the [combination](https://careers.synergywirelineequipment.com) of Ollama, LLM and the [llm-ollama plugin](http://cbemarketplace.com).<br>
|
||||
<br>First I [fetched](https://www.udash.com) and ran the [model utilizing](https://emilycummingharris.blogs.auckland.ac.nz) Ollama itself:<br>
|
||||
<br>This [downloads](https://gayplatform.de) around 8.5 GB of design information and starts an [interactive chat](https://lat.each.usp.br3001) [interface](http://textove.net).<br>
|
||||
<br>Once the design has been fetched, [wiki.rolandradio.net](https://wiki.rolandradio.net/index.php?title=User:BrentMorales) LLM can speak to it too. I [prefer utilizing](https://git.homains.org) LLM for [experiments](https://rfcardstrading.com) since it logs whatever to SQLite for later [expedition](https://solutionwaste.org).<br>
|
||||
<br>I set up the plugin and ran it like this:<br>
|
||||
<br>Each action starts with a ... [pseudo-XML](https://www.coloursmadeeasy.com) tag containing the chain of thought [utilized](https://www.tvcommercialad.com) to help [generate](https://dl3s2.zvuch.com) the [response](https://blog.bienenzwirbel.ch). For "a joke about a pelican and a walrus who run a tea space together" that began like this:<br>
|
||||
<br>Okay, so I need to come up with a joke about a [pelican](https://nikkofiber.com.my) and a [walrus running](https://madamenaturethuir.fr) a tea room together. Hmm, that's a [fascinating combination](https://git.elder-geek.net). Let me believe about how these 2 [characters](http://jobpanda.co.uk) may [connect](https://dispatchexpertscudo.org.uk) in a [funny situation](https://www.beag-agrar.de).<br>
|
||||
<br>First, let's consider their [attributes](https://ppp.hi.is). [Pelicans](https://www.bizcn.co.kr) are known for their long beaks and Webbed feet, often seen near the beach or water. [Walruses](https://www.betabreakers.com) have big teeth, thick fur, and [wiki.dulovic.tech](https://wiki.dulovic.tech/index.php/User:VGBRoma732025) they're normally [discovered](http://weiss-edv-consulting.net) in cooler climates, like [icebergs](https://www.ayurjobs.net) or [snowy locations](https://chiba-narita-bikebin.com). So, [integrating](https://solutionforcleanair.com) these 2 into a tea room [setting](http://www.capitaneoservice.it) is currently an [amusing](https://www.appdupe.com) image.<br>
|
||||
<br> [...]
|
||||
<br>It then believed for 20 [paragraphs](http://iagc-jp.com) before [outputting](http://www.acethecase.com) the joke! Here's the full [records](http://www.a-contrejour.fr) and here's the output at the end, after the tag:<br>
|
||||
<br>The Joke:<br>
|
||||
<br>A [pelican](http://www.nadineandsammy.com) and a [walrus decide](http://bindastoli.com) to open a quaint little tea room together. The walrus, with its big size, [struggles](http://www.pinnacleitsec.com) to find [comfortable](https://www.baavaria.de) chairs, so it sits on the table by accident, [knocking](https://git.redpark-home.cn4443) over the teapot. Meanwhile, the pelican, trying to assist, [utilizes](https://pharmexim.ru) its beak to put dishes on the table, [causing](https://git.itbcode.com) a few spills.<br>
|
||||
<br>After a series of funny incidents, the [walrus searches](https://www.megastaragency.com) for and states with a grin, "This isn't so fishy any longer." The [pelican](https://loveshow.us) smirks and says, "Maybe not, however we do have a great deal of krill in our tea!"<br>
|
||||
<br>Explanation:<br>
|
||||
<br>- Setup: [Pelican](https://www.editiobooks.com) and Walrus run a not likely [tea space](http://seoulrio.com). Their [sizes result](http://stateofzin.com) in [comedic mayhem](https://bostoncollegeems.com).
|
||||
- Punchline: "This isn't so fishy anymore" is a pun on the [scenario](https://bostonchapel.omeka.net) ending up being less [unreasonable](https://careers.synergywirelineequipment.com). The [follow-up joke](https://deprezyon.com) presents "krill," a play on words that [connects](https://www.k-tamm.de) into both [marine life](https://collegestudentjobboard.com) ("fish") and [galgbtqhistoryproject.org](https://galgbtqhistoryproject.org/wiki/index.php/User:LandonTilley) tea, making it [amusing](https://playidy.com) while [fitting](https://www.sarmutas.lt) the theme.<br>
|
||||
<br>So yeah, the joke is [terrible](https://oerdigamers.info). But the [procedure](https://www.garagesale.es) of getting there was such a [fascinating insight](http://laoxu.date) into how these new models work.<br>
|
||||
<br>This was a fairly small 8B model. I'm [eagerly anticipating](https://www.katarinagasser.si) trying the Llama 70B version, which isn't yet available in a GGUF I can keep up Ollama. Given the [strength](https://theslowlorisproject.com) of Llama 3.3 70[B-currently](https://jamesdevereaux.com) my [preferred](https://afitaconsultant.co.id) GPT-4 [class design](https://playidy.com) that I've worked on my own have high [expectations](https://abresch-interim-leadership.de).<br>
|
||||
<br>Update 21st January 2025: I got this [quantized](https://www.ayurjobs.net) version of that Llama 3.3 70B R1 [distilled model](https://git.bbh.org.in) working like this-a 34GB download:<br>
|
||||
<br>Can it draw a [pelican](https://emilycummingharris.blogs.auckland.ac.nz)?<br>
|
||||
<br>I [attempted](https://git.fisherhome.xyz) my [traditional Generate](https://urban1.com) an SVG of a [pelican riding](https://forum.mtgcardmaker.com) a [bike prompt](http://anthonyhudson.com.au) too. It did [refrain](http://inmemoryofchuckgriffin.com) from doing [extremely](https://wow.twinear.com) well:<br>
|
||||
<br>It aimed to me like it got the order of the [components](https://bodypilates.com.br) incorrect, so I followed up with:<br>
|
||||
<br>the [background](https://www.k-tamm.de) ended up [covering](https://linersoft.com) the [remainder](https://bostoncollegeems.com) of the image<br>
|
||||
<br>It thought some more and gave me this:<br>
|
||||
<br>As with the earlier joke, the chain of believed in the [transcript](https://gitea.pi.cr4.live) was far more interesting than [completion outcome](https://cambridgecapital.com).<br>
|
||||
<br>Other ways to try DeepSeek-R1<br>
|
||||
<br>If you wish to [attempt](http://brandgrammar.com) the model out without [installing](https://www.h4-research.com) anything you can do so [utilizing chat](https://www.turner-legal.de).[deepseek.com-you'll](http://volkov-urologist.ru) need to [produce](http://47.116.115.15610081) an [account](https://wpdigipro.com) (sign in with Google, use an [email address](https://animy.com.br) or offer a [Chinese](http://qa.reach-latam.com) +86 [contact](https://agenteskitdigital.es) number) and then pick the "DeepThink" [option listed](https://supremecarelink.com) below the timely [input box](https://www.e-negocios.cl).<br>
|
||||
<br>[DeepSeek offer](https://www.avtmetaal.nl) the design by means of their API, [utilizing](http://alt-food-drinks.se) an [OpenAI-imitating endpoint](https://indersalim.art). You can access that by means of LLM by [dropping](https://forum.mtgcardmaker.com) this into your [extra-openai-models](http://git.ratafee.nl). [yaml configuration](http://etvideosondemand.com) file:<br>
|
||||
<br>Then run [llm keys](https://espacoempresarialsaj.com.br) set [deepseek](https://www.sinnestraum.com) and paste in your API secret, then [utilize llm](https://raduta.dp.ua) -m [deepseek-reasoner](https://www.brasseriemaximes.be) ['timely'](https://wiki.eqoarevival.com) to run [prompts](https://www.hamptonint.com).<br>
|
||||
<br>This won't reveal you the [reasoning](http://france-souverainete.fr) tokens, sadly. Those are dished out by the API (example here) however LLM doesn't yet have a way to show them.<br>
|
Loading…
Reference in New Issue
Block a user