WEBVTT Kind: captions Language: en 00:00:00.000 --> 00:00:01.630 align:start position:0% If<00:00:00.200> you<00:00:00.280> grew<00:00:00.440> up<00:00:00.680> watching<00:00:01.160> cool<00:00:01.360> bullets 00:00:01.630 --> 00:00:01.640 align:start position:0% If you grew up watching cool bullets 00:00:01.640 --> 00:00:03.550 align:start position:0% If you grew up watching cool bullets just<00:00:01.840> like<00:00:02.040> me,<00:00:02.240> then<00:00:02.480> we<00:00:02.680> most<00:00:02.920> likely<00:00:03.280> at<00:00:03.360> one 00:00:03.550 --> 00:00:03.560 align:start position:0% just like me, then we most likely at one 00:00:03.560 --> 00:00:05.150 align:start position:0% just like me, then we most likely at one point<00:00:03.840> thought<00:00:04.120> the<00:00:04.280> AI<00:00:04.520> in<00:00:04.600> the<00:00:04.680> future<00:00:05.040> is 00:00:05.150 --> 00:00:05.160 align:start position:0% point thought the AI in the future is 00:00:05.160 --> 00:00:06.630 align:start position:0% point thought the AI in the future is probably<00:00:05.440> going<00:00:05.640> to<00:00:05.720> be<00:00:05.840> trained<00:00:06.200> based<00:00:06.480> on 00:00:06.630 --> 00:00:06.640 align:start position:0% probably going to be trained based on 00:00:06.640 --> 00:00:09.030 align:start position:0% probably going to be trained based on some<00:00:06.840> sort<00:00:07.200> of<00:00:07.400> genetic<00:00:07.960> algorithms<00:00:08.520> or<00:00:08.720> even 00:00:09.030 --> 00:00:09.040 align:start position:0% some sort of genetic algorithms or even 00:00:09.040 --> 00:00:10.750 align:start position:0% some sort of genetic algorithms or even evolution<00:00:09.560> strategies.<00:00:10.280> Because<00:00:10.480> most<00:00:10.680> of 00:00:10.750 --> 00:00:10.760 align:start position:0% evolution strategies. Because most of 00:00:10.760 --> 00:00:12.590 align:start position:0% evolution strategies. Because most of the<00:00:10.840> game<00:00:11.120> related<00:00:11.560> AIs<00:00:11.920> back<00:00:12.120> in<00:00:12.200> the<00:00:12.280> days 00:00:12.590 --> 00:00:12.600 align:start position:0% the game related AIs back in the days 00:00:12.600 --> 00:00:14.390 align:start position:0% the game related AIs back in the days were<00:00:12.720> made<00:00:13.040> through<00:00:13.280> this<00:00:13.600> simple<00:00:14.160> yet 00:00:14.390 --> 00:00:14.400 align:start position:0% were made through this simple yet 00:00:14.400 --> 00:00:16.150 align:start position:0% were made through this simple yet powerful<00:00:14.800> idea<00:00:15.320> that<00:00:15.680> it<00:00:15.760> feels<00:00:16.000> like 00:00:16.150 --> 00:00:16.160 align:start position:0% powerful idea that it feels like 00:00:16.160 --> 00:00:18.190 align:start position:0% powerful idea that it feels like anything<00:00:16.480> can<00:00:16.800> be<00:00:16.960> trained<00:00:17.320> with<00:00:17.480> evolution. 00:00:18.190 --> 00:00:18.200 align:start position:0% anything can be trained with evolution. 00:00:18.200 --> 00:00:19.910 align:start position:0% anything can be trained with evolution. Not<00:00:18.400> to<00:00:18.480> mention<00:00:18.920> we<00:00:19.120> humans<00:00:19.480> became<00:00:19.760> this 00:00:19.910 --> 00:00:19.920 align:start position:0% Not to mention we humans became this 00:00:19.920 --> 00:00:21.510 align:start position:0% Not to mention we humans became this intelligent<00:00:20.560> thanks<00:00:20.800> to<00:00:20.920> this<00:00:21.080> natural 00:00:21.510 --> 00:00:21.520 align:start position:0% intelligent thanks to this natural 00:00:21.520 --> 00:00:23.590 align:start position:0% intelligent thanks to this natural phenomenon.<00:00:22.280> So,<00:00:22.560> I<00:00:22.720> guess<00:00:23.000> it's<00:00:23.200> not<00:00:23.440> too 00:00:23.590 --> 00:00:23.600 align:start position:0% phenomenon. So, I guess it's not too 00:00:23.600 --> 00:00:25.470 align:start position:0% phenomenon. So, I guess it's not too crazy<00:00:24.080> betting<00:00:24.400> on<00:00:24.600> evolution<00:00:25.040> strategies 00:00:25.470 --> 00:00:25.480 align:start position:0% crazy betting on evolution strategies 00:00:25.480 --> 00:00:27.550 align:start position:0% crazy betting on evolution strategies being<00:00:25.680> the<00:00:25.800> one<00:00:26.040> that<00:00:26.120> will<00:00:26.200> bring<00:00:26.440> us<00:00:26.600> to<00:00:26.800> AGI 00:00:27.550 --> 00:00:27.560 align:start position:0% being the one that will bring us to AGI 00:00:27.560 --> 00:00:29.750 align:start position:0% being the one that will bring us to AGI 10<00:00:27.840> years<00:00:28.120> ago.<00:00:28.480> However,<00:00:29.160> as<00:00:29.320> you<00:00:29.360> can<00:00:29.560> see 00:00:29.750 --> 00:00:29.760 align:start position:0% 10 years ago. However, as you can see 00:00:29.760 --> 00:00:31.550 align:start position:0% 10 years ago. However, as you can see now,<00:00:30.080> none<00:00:30.360> of<00:00:30.480> the<00:00:30.560> current<00:00:30.880> AI<00:00:31.040> methods 00:00:31.550 --> 00:00:31.560 align:start position:0% now, none of the current AI methods 00:00:31.560 --> 00:00:33.990 align:start position:0% now, none of the current AI methods incorporate<00:00:32.400> any<00:00:32.680> evolution<00:00:33.280> strategies<00:00:33.800> at 00:00:33.990 --> 00:00:34.000 align:start position:0% incorporate any evolution strategies at 00:00:34.000 --> 00:00:35.510 align:start position:0% incorporate any evolution strategies at all.<00:00:34.280> And<00:00:34.400> it<00:00:34.520> might<00:00:34.720> as<00:00:34.840> well<00:00:35.000> be<00:00:35.160> a<00:00:35.240> dead 00:00:35.510 --> 00:00:35.520 align:start position:0% all. And it might as well be a dead 00:00:35.520 --> 00:00:37.230 align:start position:0% all. And it might as well be a dead optimization<00:00:36.120> method<00:00:36.520> that<00:00:36.640> we<00:00:36.760> should<00:00:36.960> frame 00:00:37.230 --> 00:00:37.240 align:start position:0% optimization method that we should frame 00:00:37.240 --> 00:00:39.430 align:start position:0% optimization method that we should frame up<00:00:37.440> in<00:00:37.640> the<00:00:37.760> museum.<00:00:38.360> Or<00:00:38.680> not.<00:00:39.200> Because 00:00:39.430 --> 00:00:39.440 align:start position:0% up in the museum. Or not. Because 00:00:39.440 --> 00:00:41.510 align:start position:0% up in the museum. Or not. Because recently<00:00:40.080> evolution<00:00:40.600> strategies<00:00:41.160> have 00:00:41.510 --> 00:00:41.520 align:start position:0% recently evolution strategies have 00:00:41.520 --> 00:00:44.190 align:start position:0% recently evolution strategies have emerged<00:00:42.080> again<00:00:42.520> in<00:00:42.720> the<00:00:42.920> LM<00:00:43.360> literature.<00:00:44.040> But 00:00:44.190 --> 00:00:44.200 align:start position:0% emerged again in the LM literature. But 00:00:44.200 --> 00:00:46.350 align:start position:0% emerged again in the LM literature. But how<00:00:44.440> is<00:00:44.640> this<00:00:44.840> abandoned<00:00:45.440> method<00:00:45.960> suddenly 00:00:46.350 --> 00:00:46.360 align:start position:0% how is this abandoned method suddenly 00:00:46.360 --> 00:00:48.150 align:start position:0% how is this abandoned method suddenly making<00:00:46.680> a<00:00:46.760> comeback?<00:00:47.320> Well,<00:00:47.560> the<00:00:47.680> bottleneck 00:00:48.150 --> 00:00:48.160 align:start position:0% making a comeback? Well, the bottleneck 00:00:48.160 --> 00:00:50.470 align:start position:0% making a comeback? Well, the bottleneck it<00:00:48.360> has<00:00:48.960> apparently<00:00:49.440> has<00:00:49.680> been<00:00:49.840> solved<00:00:50.280> and<00:00:50.400> it 00:00:50.470 --> 00:00:50.480 align:start position:0% it has apparently has been solved and it 00:00:50.480 --> 00:00:52.590 align:start position:0% it has apparently has been solved and it came<00:00:50.800> along<00:00:51.160> with<00:00:51.400> even<00:00:51.720> more<00:00:51.960> upsides<00:00:52.440> than 00:00:52.590 --> 00:00:52.600 align:start position:0% came along with even more upsides than 00:00:52.600 --> 00:00:54.390 align:start position:0% came along with even more upsides than we<00:00:52.720> initially<00:00:53.200> expect.<00:00:53.800> But<00:00:53.880> before<00:00:54.120> we<00:00:54.200> dive 00:00:54.390 --> 00:00:54.400 align:start position:0% we initially expect. But before we dive 00:00:54.400 --> 00:00:56.630 align:start position:0% we initially expect. But before we dive into<00:00:54.720> it,<00:00:54.960> SDAI<00:00:55.440> competition<00:00:55.920> is<00:00:56.120> constantly 00:00:56.630 --> 00:00:56.640 align:start position:0% into it, SDAI competition is constantly 00:00:56.640 --> 00:00:58.630 align:start position:0% into it, SDAI competition is constantly changing.<00:00:57.360> Bouncing<00:00:57.680> around<00:00:58.040> five<00:00:58.320> different 00:00:58.630 --> 00:00:58.640 align:start position:0% changing. Bouncing around five different 00:00:58.640 --> 00:00:59.990 align:start position:0% changing. Bouncing around five different chatbots<00:00:59.120> with<00:00:59.400> five<00:00:59.680> different 00:00:59.990 --> 00:01:00.000 align:start position:0% chatbots with five different 00:01:00.000 --> 00:01:01.870 align:start position:0% chatbots with five different subscriptions<00:01:00.760> is<00:01:00.960> just<00:01:01.120> not<00:01:01.320> it.<00:01:01.600> Because 00:01:01.870 --> 00:01:01.880 align:start position:0% subscriptions is just not it. Because 00:01:01.880 --> 00:01:03.750 align:start position:0% subscriptions is just not it. Because why<00:01:02.080> even<00:01:02.360> bother<00:01:02.680> to<00:01:02.800> pay<00:01:02.960> 100<00:01:03.360> bucks<00:01:03.600> in 00:01:03.750 --> 00:01:03.760 align:start position:0% why even bother to pay 100 bucks in 00:01:03.760 --> 00:01:05.430 align:start position:0% why even bother to pay 100 bucks in subscription<00:01:04.320> fee<00:01:04.600> and<00:01:04.720> not<00:01:04.920> using<00:01:05.120> the<00:01:05.199> full 00:01:05.430 --> 00:01:05.440 align:start position:0% subscription fee and not using the full 00:01:05.440 --> 00:01:07.150 align:start position:0% subscription fee and not using the full value<00:01:05.760> of<00:01:05.920> it<00:01:06.120> when<00:01:06.240> you<00:01:06.320> can<00:01:06.440> just<00:01:06.600> pay<00:01:06.840> 10 00:01:07.150 --> 00:01:07.160 align:start position:0% value of it when you can just pay 10 00:01:07.160 --> 00:01:08.670 align:start position:0% value of it when you can just pay 10 bucks<00:01:07.480> on<00:01:07.640> Map<00:01:07.840> Muse.<00:01:08.120> And<00:01:08.280> you<00:01:08.360> don't<00:01:08.520> just 00:01:08.670 --> 00:01:08.680 align:start position:0% bucks on Map Muse. And you don't just 00:01:08.680 --> 00:01:10.990 align:start position:0% bucks on Map Muse. And you don't just get<00:01:08.920> five<00:01:09.360> other<00:01:09.600> models,<00:01:10.080> but<00:01:10.320> even<00:01:10.600> more. 00:01:10.990 --> 00:01:11.000 align:start position:0% get five other models, but even more. 00:01:11.000 --> 00:01:13.510 align:start position:0% get five other models, but even more. Ranging<00:01:11.240> from<00:01:11.440> Claude,<00:01:11.920> GPT,<00:01:12.520> Gemini,<00:01:13.000> Llama, 00:01:13.510 --> 00:01:13.520 align:start position:0% Ranging from Claude, GPT, Gemini, Llama, 00:01:13.520 --> 00:01:15.510 align:start position:0% Ranging from Claude, GPT, Gemini, Llama, Mistral,<00:01:14.000> Grok,<00:01:14.400> Deep<00:01:14.640> Sea,<00:01:14.840> Perplexity, 00:01:15.510 --> 00:01:15.520 align:start position:0% Mistral, Grok, Deep Sea, Perplexity, 00:01:15.520 --> 00:01:18.030 align:start position:0% Mistral, Grok, Deep Sea, Perplexity, Flux,<00:01:15.920> Nano,<00:01:16.200> Banana,<00:01:16.760> Recraft.<00:01:17.520> And<00:01:17.680> instead 00:01:18.030 --> 00:01:18.040 align:start position:0% Flux, Nano, Banana, Recraft. And instead 00:01:18.040 --> 00:01:19.310 align:start position:0% Flux, Nano, Banana, Recraft. And instead of<00:01:18.120> betting<00:01:18.480> everything<00:01:18.920> in<00:01:19.120> one 00:01:19.310 --> 00:01:19.320 align:start position:0% of betting everything in one 00:01:19.320 --> 00:01:21.310 align:start position:0% of betting everything in one subscription,<00:01:20.120> this<00:01:20.320> platform<00:01:20.720> also<00:01:21.000> lets 00:01:21.310 --> 00:01:21.320 align:start position:0% subscription, this platform also lets 00:01:21.320 --> 00:01:23.390 align:start position:0% subscription, this platform also lets you<00:01:21.440> re-prompt,<00:01:22.160> compare<00:01:22.560> answers,<00:01:23.040> and<00:01:23.200> have 00:01:23.390 --> 00:01:23.400 align:start position:0% you re-prompt, compare answers, and have 00:01:23.400 --> 00:01:25.110 align:start position:0% you re-prompt, compare answers, and have models<00:01:23.800> challenge<00:01:24.200> each<00:01:24.440> other<00:01:24.720> so<00:01:24.840> you<00:01:24.920> get 00:01:25.110 --> 00:01:25.120 align:start position:0% models challenge each other so you get 00:01:25.120 --> 00:01:26.950 align:start position:0% models challenge each other so you get higher<00:01:25.400> quality<00:01:25.880> outputs<00:01:26.280> without<00:01:26.600> vendor 00:01:26.950 --> 00:01:26.960 align:start position:0% higher quality outputs without vendor 00:01:26.960 --> 00:01:28.910 align:start position:0% higher quality outputs without vendor lock-in<00:01:27.720> while<00:01:27.960> not<00:01:28.240> needing<00:01:28.560> to<00:01:28.680> switch 00:01:28.910 --> 00:01:28.920 align:start position:0% lock-in while not needing to switch 00:01:28.920 --> 00:01:31.230 align:start position:0% lock-in while not needing to switch between<00:01:29.240> 10<00:01:29.560> tabs<00:01:30.000> to<00:01:30.120> check<00:01:30.400> manually<00:01:31.000> which 00:01:31.230 --> 00:01:31.240 align:start position:0% between 10 tabs to check manually which 00:01:31.240 --> 00:01:33.150 align:start position:0% between 10 tabs to check manually which one<00:01:31.440> is<00:01:31.640> the<00:01:31.760> best.<00:01:32.120> This<00:01:32.360> multi-model 00:01:33.150 --> 00:01:33.160 align:start position:0% one is the best. This multi-model 00:01:33.160 --> 00:01:34.830 align:start position:0% one is the best. This multi-model loadout<00:01:33.560> would<00:01:33.760> also<00:01:34.000> be<00:01:34.120> capable<00:01:34.600> of 00:01:34.830 --> 00:01:34.840 align:start position:0% loadout would also be capable of 00:01:34.840 --> 00:01:37.190 align:start position:0% loadout would also be capable of analyzing<00:01:35.320> documents<00:01:35.840> and<00:01:36.000> images,<00:01:36.720> use<00:01:37.000> deep 00:01:37.190 --> 00:01:37.200 align:start position:0% analyzing documents and images, use deep 00:01:37.200 --> 00:01:39.310 align:start position:0% analyzing documents and images, use deep research<00:01:37.640> through<00:01:37.840> Perplexity,<00:01:38.680> and<00:01:38.880> even<00:01:39.160> do 00:01:39.310 --> 00:01:39.320 align:start position:0% research through Perplexity, and even do 00:01:39.320 --> 00:01:41.230 align:start position:0% research through Perplexity, and even do voice<00:01:39.640> chat<00:01:39.840> and<00:01:39.960> dictation<00:01:40.600> when<00:01:40.760> you<00:01:40.880> want 00:01:41.230 --> 00:01:41.240 align:start position:0% voice chat and dictation when you want 00:01:41.240 --> 00:01:43.070 align:start position:0% voice chat and dictation when you want to<00:01:41.320> move<00:01:41.560> faster.<00:01:42.160> And<00:01:42.280> if<00:01:42.360> your<00:01:42.520> task<00:01:42.880> is 00:01:43.070 --> 00:01:43.080 align:start position:0% to move faster. And if your task is 00:01:43.080 --> 00:01:44.990 align:start position:0% to move faster. And if your task is repetitive,<00:01:43.800> you<00:01:43.880> can<00:01:44.000> just<00:01:44.200> use<00:01:44.400> the<00:01:44.520> project 00:01:44.990 --> 00:01:45.000 align:start position:0% repetitive, you can just use the project 00:01:45.000 --> 00:01:46.830 align:start position:0% repetitive, you can just use the project function<00:01:45.480> where<00:01:45.600> you<00:01:45.680> can<00:01:45.840> create<00:01:46.200> custom<00:01:46.600> Map 00:01:46.830 --> 00:01:46.840 align:start position:0% function where you can create custom Map 00:01:46.840 --> 00:01:49.070 align:start position:0% function where you can create custom Map Muses<00:01:47.320> with<00:01:47.520> your<00:01:47.720> own<00:01:47.920> instructions,<00:01:48.840> set<00:01:49.040> a 00:01:49.070 --> 00:01:49.080 align:start position:0% Muses with your own instructions, set a 00:01:49.080 --> 00:01:51.070 align:start position:0% Muses with your own instructions, set a default<00:01:49.480> model<00:01:49.800> that<00:01:49.960> matches<00:01:50.320> your<00:01:50.480> habits, 00:01:51.070 --> 00:01:51.080 align:start position:0% default model that matches your habits, 00:01:51.080 --> 00:01:53.030 align:start position:0% default model that matches your habits, and<00:01:51.240> keep<00:01:51.520> everything<00:01:52.040> organized<00:01:52.520> instead<00:01:52.920> of 00:01:53.030 --> 00:01:53.040 align:start position:0% and keep everything organized instead of 00:01:53.040 --> 00:01:55.230 align:start position:0% and keep everything organized instead of starting<00:01:53.480> from<00:01:53.720> scratch<00:01:54.200> every<00:01:54.520> time.<00:01:55.000> So,<00:01:55.120> if 00:01:55.230 --> 00:01:55.240 align:start position:0% starting from scratch every time. So, if 00:01:55.240 --> 00:01:57.230 align:start position:0% starting from scratch every time. So, if you<00:01:55.360> want<00:01:55.720> one<00:01:55.960> clean<00:01:56.280> place<00:01:56.600> to<00:01:56.720> use<00:01:56.880> the<00:01:57.000> best 00:01:57.230 --> 00:01:57.240 align:start position:0% you want one clean place to use the best 00:01:57.240 --> 00:01:59.150 align:start position:0% you want one clean place to use the best models<00:01:57.680> without<00:01:57.960> investing<00:01:58.560> in<00:01:58.720> so<00:01:58.920> much 00:01:59.150 --> 00:01:59.160 align:start position:0% models without investing in so much 00:01:59.160 --> 00:02:00.950 align:start position:0% models without investing in so much subscription<00:01:59.720> money,<00:02:00.240> check<00:02:00.400> them<00:02:00.560> out<00:02:00.720> now 00:02:00.950 --> 00:02:00.960 align:start position:0% subscription money, check them out now 00:02:00.960 --> 00:02:02.270 align:start position:0% subscription money, check them out now using<00:02:01.200> the<00:02:01.280> link<00:02:01.440> down<00:02:01.600> in<00:02:01.640> the<00:02:01.720> description, 00:02:02.270 --> 00:02:02.280 align:start position:0% using the link down in the description, 00:02:02.280 --> 00:02:03.990 align:start position:0% using the link down in the description, and<00:02:02.480> thank<00:02:02.760> you<00:02:02.880> Map<00:02:03.160> Muse<00:02:03.400> for<00:02:03.520> sponsoring 00:02:03.990 --> 00:02:04.000 align:start position:0% and thank you Map Muse for sponsoring 00:02:04.000 --> 00:02:05.670 align:start position:0% and thank you Map Muse for sponsoring this<00:02:04.160> video.<00:02:04.560> Anyways,<00:02:05.000> the<00:02:05.080> main<00:02:05.360> idea 00:02:05.670 --> 00:02:05.680 align:start position:0% this video. Anyways, the main idea 00:02:05.680 --> 00:02:07.470 align:start position:0% this video. Anyways, the main idea behind<00:02:06.040> evolution<00:02:06.440> strategies<00:02:06.920> is<00:02:07.120> actually 00:02:07.470 --> 00:02:07.480 align:start position:0% behind evolution strategies is actually 00:02:07.480 --> 00:02:09.270 align:start position:0% behind evolution strategies is actually very<00:02:07.720> simple.<00:02:08.200> You<00:02:08.320> start<00:02:08.520> with<00:02:08.679> one<00:02:08.920> version 00:02:09.270 --> 00:02:09.280 align:start position:0% very simple. You start with one version 00:02:09.280 --> 00:02:10.830 align:start position:0% very simple. You start with one version of<00:02:09.360> your<00:02:09.479> model,<00:02:09.960> then<00:02:10.160> you<00:02:10.280> create<00:02:10.520> several 00:02:10.830 --> 00:02:10.840 align:start position:0% of your model, then you create several 00:02:10.840 --> 00:02:12.390 align:start position:0% of your model, then you create several slightly<00:02:11.240> different<00:02:11.560> versions<00:02:11.920> of<00:02:12.040> it<00:02:12.160> by 00:02:12.390 --> 00:02:12.400 align:start position:0% slightly different versions of it by 00:02:12.400 --> 00:02:14.190 align:start position:0% slightly different versions of it by adding<00:02:12.760> small<00:02:13.040> random<00:02:13.360> changes.<00:02:13.920> Let's<00:02:14.080> take 00:02:14.190 --> 00:02:14.200 align:start position:0% adding small random changes. Let's take 00:02:14.200 --> 00:02:16.070 align:start position:0% adding small random changes. Let's take a<00:02:14.280> genetic<00:02:14.720> algorithm<00:02:15.240> as<00:02:15.360> an<00:02:15.480> example.<00:02:16.000> You 00:02:16.070 --> 00:02:16.080 align:start position:0% a genetic algorithm as an example. You 00:02:16.080 --> 00:02:18.190 align:start position:0% a genetic algorithm as an example. You basically<00:02:16.400> copy<00:02:16.720> a<00:02:16.800> group<00:02:17.040> of<00:02:17.160> the<00:02:17.240> same<00:02:17.520> DNA 00:02:18.190 --> 00:02:18.200 align:start position:0% basically copy a group of the same DNA 00:02:18.200 --> 00:02:20.110 align:start position:0% basically copy a group of the same DNA and<00:02:18.360> slightly<00:02:18.720> mutate<00:02:19.240> them,<00:02:19.520> then<00:02:19.680> you<00:02:19.800> test 00:02:20.110 --> 00:02:20.120 align:start position:0% and slightly mutate them, then you test 00:02:20.120 --> 00:02:21.870 align:start position:0% and slightly mutate them, then you test each<00:02:20.280> of<00:02:20.360> these<00:02:20.560> copies<00:02:21.080> and<00:02:21.200> measure<00:02:21.560> how 00:02:21.870 --> 00:02:21.880 align:start position:0% each of these copies and measure how 00:02:21.880 --> 00:02:23.430 align:start position:0% each of these copies and measure how well<00:02:22.120> they<00:02:22.320> perform.<00:02:22.800> This<00:02:22.960> performance 00:02:23.430 --> 00:02:23.440 align:start position:0% well they perform. This performance 00:02:23.440 --> 00:02:25.510 align:start position:0% well they perform. This performance score<00:02:23.760> is<00:02:23.920> called<00:02:24.400> their<00:02:24.680> fitness.<00:02:25.320> Some 00:02:25.510 --> 00:02:25.520 align:start position:0% score is called their fitness. Some 00:02:25.520 --> 00:02:27.310 align:start position:0% score is called their fitness. Some copies<00:02:25.920> will<00:02:26.040> do<00:02:26.200> better<00:02:26.560> and<00:02:26.760> some<00:02:26.960> will<00:02:27.120> do 00:02:27.310 --> 00:02:27.320 align:start position:0% copies will do better and some will do 00:02:27.320 --> 00:02:29.350 align:start position:0% copies will do better and some will do worse.<00:02:27.680> Once<00:02:27.920> you<00:02:28.040> see<00:02:28.280> which<00:02:28.480> copies<00:02:28.960> perform 00:02:29.350 --> 00:02:29.360 align:start position:0% worse. Once you see which copies perform 00:02:29.360 --> 00:02:31.310 align:start position:0% worse. Once you see which copies perform better,<00:02:29.960> you<00:02:30.000> then<00:02:30.200> use<00:02:30.400> that<00:02:30.560> information<00:02:31.200> to 00:02:31.310 --> 00:02:31.320 align:start position:0% better, you then use that information to 00:02:31.320 --> 00:02:33.190 align:start position:0% better, you then use that information to guide<00:02:31.560> the<00:02:31.640> next<00:02:31.880> step.<00:02:32.360> And<00:02:32.560> the<00:02:32.640> copies<00:02:33.040> that 00:02:33.190 --> 00:02:33.200 align:start position:0% guide the next step. And the copies that 00:02:33.200 --> 00:02:35.070 align:start position:0% guide the next step. And the copies that have<00:02:33.400> high<00:02:33.560> fitness<00:02:33.920> score<00:02:34.400> will<00:02:34.600> influence 00:02:35.070 --> 00:02:35.080 align:start position:0% have high fitness score will influence 00:02:35.080 --> 00:02:36.870 align:start position:0% have high fitness score will influence the<00:02:35.160> next<00:02:35.480> version<00:02:35.840> a<00:02:35.920> lot<00:02:36.120> more<00:02:36.280> strongly. 00:02:36.870 --> 00:02:36.880 align:start position:0% the next version a lot more strongly. 00:02:36.880 --> 00:02:39.030 align:start position:0% the next version a lot more strongly. The<00:02:36.960> ones<00:02:37.200> that<00:02:37.360> did<00:02:37.520> poorly<00:02:38.120> influence<00:02:38.600> less 00:02:39.030 --> 00:02:39.040 align:start position:0% The ones that did poorly influence less 00:02:39.040 --> 00:02:41.310 align:start position:0% The ones that did poorly influence less or<00:02:39.320> is<00:02:39.520> completely<00:02:40.000> discarded.<00:02:40.640> Then,<00:02:41.120> you 00:02:41.310 --> 00:02:41.320 align:start position:0% or is completely discarded. Then, you 00:02:41.320 --> 00:02:42.670 align:start position:0% or is completely discarded. Then, you repeat<00:02:41.720> the<00:02:41.800> whole<00:02:42.000> process.<00:02:42.480> So,<00:02:42.600> you 00:02:42.670 --> 00:02:42.680 align:start position:0% repeat the whole process. So, you 00:02:42.680 --> 00:02:44.750 align:start position:0% repeat the whole process. So, you basically<00:02:43.080> create<00:02:43.400> new<00:02:43.560> variations,<00:02:44.440> test 00:02:44.750 --> 00:02:44.760 align:start position:0% basically create new variations, test 00:02:44.760 --> 00:02:46.590 align:start position:0% basically create new variations, test them,<00:02:45.080> and<00:02:45.240> move<00:02:45.440> toward<00:02:45.760> the<00:02:45.840> variations 00:02:46.590 --> 00:02:46.600 align:start position:0% them, and move toward the variations 00:02:46.600 --> 00:02:48.190 align:start position:0% them, and move toward the variations that<00:02:46.800> worked<00:02:47.080> best.<00:02:47.480> And<00:02:47.600> over<00:02:47.800> time,<00:02:48.080> the 00:02:48.190 --> 00:02:48.200 align:start position:0% that worked best. And over time, the 00:02:48.200 --> 00:02:50.110 align:start position:0% that worked best. And over time, the model<00:02:48.480> improves<00:02:49.040> because<00:02:49.320> it<00:02:49.440> keeps<00:02:49.760> shifting 00:02:50.110 --> 00:02:50.120 align:start position:0% model improves because it keeps shifting 00:02:50.120 --> 00:02:51.790 align:start position:0% model improves because it keeps shifting towards<00:02:50.480> changes<00:02:50.880> that<00:02:51.040> increase<00:02:51.560> its 00:02:51.790 --> 00:02:51.800 align:start position:0% towards changes that increase its 00:02:51.800 --> 00:02:53.870 align:start position:0% towards changes that increase its fitness.<00:02:52.320> And<00:02:52.480> it<00:02:52.560> can<00:02:52.720> run<00:02:52.920> infinitely<00:02:53.520> until 00:02:53.870 --> 00:02:53.880 align:start position:0% fitness. And it can run infinitely until 00:02:53.880 --> 00:02:55.790 align:start position:0% fitness. And it can run infinitely until you<00:02:54.000> decide<00:02:54.360> to<00:02:54.440> stop.<00:02:54.840> So,<00:02:55.080> this<00:02:55.360> seemingly 00:02:55.790 --> 00:02:55.800 align:start position:0% you decide to stop. So, this seemingly 00:02:55.800 --> 00:02:57.990 align:start position:0% you decide to stop. So, this seemingly bulletproof<00:02:56.400> idea<00:02:56.840> sounds<00:02:57.120> like<00:02:57.480> it<00:02:57.640> should 00:02:57.990 --> 00:02:58.000 align:start position:0% bulletproof idea sounds like it should 00:02:58.000 --> 00:03:00.630 align:start position:0% bulletproof idea sounds like it should work<00:02:58.320> everywhere.<00:02:59.040> But<00:02:59.200> in<00:02:59.320> practice,<00:03:00.400> is 00:03:00.630 --> 00:03:00.640 align:start position:0% work everywhere. But in practice, is 00:03:00.640 --> 00:03:02.950 align:start position:0% work everywhere. But in practice, is actually<00:03:01.000> not<00:03:01.400> as<00:03:01.680> linear<00:03:02.120> as<00:03:02.320> it<00:03:02.440> seems.<00:03:02.880> In 00:03:02.950 --> 00:03:02.960 align:start position:0% actually not as linear as it seems. In 00:03:02.960 --> 00:03:04.550 align:start position:0% actually not as linear as it seems. In the<00:03:03.120> early<00:03:03.360> days<00:03:03.640> of<00:03:03.760> deep<00:03:04.000> learning, 00:03:04.550 --> 00:03:04.560 align:start position:0% the early days of deep learning, 00:03:04.560 --> 00:03:06.110 align:start position:0% the early days of deep learning, researchers<00:03:05.040> trained<00:03:05.360> neural<00:03:05.600> networks<00:03:06.040> to 00:03:06.110 --> 00:03:06.120 align:start position:0% researchers trained neural networks to 00:03:06.120 --> 00:03:08.430 align:start position:0% researchers trained neural networks to do<00:03:06.320> things<00:03:06.640> like<00:03:06.920> play<00:03:07.200> Atari<00:03:07.680> games.<00:03:08.120> Those 00:03:08.430 --> 00:03:08.440 align:start position:0% do things like play Atari games. Those 00:03:08.440 --> 00:03:10.230 align:start position:0% do things like play Atari games. Those networks<00:03:08.880> usually<00:03:09.160> had<00:03:09.320> around<00:03:09.720> 2<00:03:09.960> million 00:03:10.230 --> 00:03:10.240 align:start position:0% networks usually had around 2 million 00:03:10.240 --> 00:03:12.470 align:start position:0% networks usually had around 2 million parameters.<00:03:10.920> That<00:03:11.240> may<00:03:11.520> not<00:03:11.840> sound<00:03:12.120> huge 00:03:12.470 --> 00:03:12.480 align:start position:0% parameters. That may not sound huge 00:03:12.480 --> 00:03:14.470 align:start position:0% parameters. That may not sound huge today,<00:03:12.880> but<00:03:13.160> at<00:03:13.280> the<00:03:13.400> time,<00:03:13.880> especially<00:03:14.320> for 00:03:14.470 --> 00:03:14.480 align:start position:0% today, but at the time, especially for 00:03:14.480 --> 00:03:16.790 align:start position:0% today, but at the time, especially for evolutionary<00:03:15.120> methods,<00:03:15.800> it<00:03:16.000> was<00:03:16.280> already 00:03:16.790 --> 00:03:16.800 align:start position:0% evolutionary methods, it was already 00:03:16.800 --> 00:03:19.070 align:start position:0% evolutionary methods, it was already extremely<00:03:17.320> large.<00:03:17.720> And<00:03:17.920> optimizing<00:03:18.520> it<00:03:18.760> using 00:03:19.070 --> 00:03:19.080 align:start position:0% extremely large. And optimizing it using 00:03:19.080 --> 00:03:20.790 align:start position:0% extremely large. And optimizing it using evolution<00:03:19.560> strategies<00:03:20.120> is<00:03:20.240> like<00:03:20.400> trying<00:03:20.680> to 00:03:20.790 --> 00:03:20.800 align:start position:0% evolution strategies is like trying to 00:03:20.800 --> 00:03:22.910 align:start position:0% evolution strategies is like trying to randomly<00:03:21.360> tweak<00:03:21.760> 2<00:03:22.000> million<00:03:22.400> knobs<00:03:22.760> at<00:03:22.840> the 00:03:22.910 --> 00:03:22.920 align:start position:0% randomly tweak 2 million knobs at the 00:03:22.920 --> 00:03:24.910 align:start position:0% randomly tweak 2 million knobs at the same<00:03:23.160> time<00:03:23.560> and<00:03:23.760> hoping<00:03:24.120> you<00:03:24.240> would<00:03:24.440> randomly 00:03:24.910 --> 00:03:24.920 align:start position:0% same time and hoping you would randomly 00:03:24.920 --> 00:03:26.950 align:start position:0% same time and hoping you would randomly get<00:03:25.120> improvements<00:03:25.760> out<00:03:25.920> of<00:03:26.040> that,<00:03:26.360> which 00:03:26.950 --> 00:03:26.960 align:start position:0% get improvements out of that, which 00:03:26.960 --> 00:03:29.310 align:start position:0% get improvements out of that, which seems<00:03:27.520> worse<00:03:27.880> than<00:03:28.120> gambling.<00:03:28.840> Most<00:03:29.000> random 00:03:29.310 --> 00:03:29.320 align:start position:0% seems worse than gambling. Most random 00:03:29.320 --> 00:03:31.150 align:start position:0% seems worse than gambling. Most random changes<00:03:29.760> will<00:03:29.960> also<00:03:30.200> completely<00:03:30.680> scramble 00:03:31.150 --> 00:03:31.160 align:start position:0% changes will also completely scramble 00:03:31.160 --> 00:03:32.870 align:start position:0% changes will also completely scramble the<00:03:31.240> model's<00:03:31.600> behavior,<00:03:32.080> so<00:03:32.200> the<00:03:32.320> model<00:03:32.640> might 00:03:32.870 --> 00:03:32.880 align:start position:0% the model's behavior, so the model might 00:03:32.880 --> 00:03:35.110 align:start position:0% the model's behavior, so the model might go<00:03:33.040> from<00:03:33.320> playing<00:03:33.680> somewhat<00:03:34.080> reasonably<00:03:34.680> to 00:03:35.110 --> 00:03:35.120 align:start position:0% go from playing somewhat reasonably to 00:03:35.120 --> 00:03:36.910 align:start position:0% go from playing somewhat reasonably to acting<00:03:35.520> almost<00:03:35.920> randomly.<00:03:36.152> [music]<00:03:36.480> And<00:03:36.760> when 00:03:36.910 --> 00:03:36.920 align:start position:0% acting almost randomly. [music] And when 00:03:36.920 --> 00:03:38.510 align:start position:0% acting almost randomly. [music] And when nearly<00:03:37.280> all<00:03:37.400> mutations<00:03:38.040> destroy 00:03:38.510 --> 00:03:38.520 align:start position:0% nearly all mutations destroy 00:03:38.520 --> 00:03:40.430 align:start position:0% nearly all mutations destroy performance,<00:03:39.240> it<00:03:39.360> becomes<00:03:39.800> very<00:03:40.120> hard<00:03:40.360> to 00:03:40.430 --> 00:03:40.440 align:start position:0% performance, it becomes very hard to 00:03:40.440 --> 00:03:42.430 align:start position:0% performance, it becomes very hard to find<00:03:40.680> the<00:03:40.800> rare<00:03:41.160> ones<00:03:41.480> that<00:03:41.640> actually<00:03:41.960> improve 00:03:42.430 --> 00:03:42.440 align:start position:0% find the rare ones that actually improve 00:03:42.440 --> 00:03:44.590 align:start position:0% find the rare ones that actually improve it,<00:03:42.720> resulting<00:03:43.280> in<00:03:43.440> the<00:03:43.600> good<00:03:43.880> signal<00:03:44.280> getting 00:03:44.590 --> 00:03:44.600 align:start position:0% it, resulting in the good signal getting 00:03:44.600 --> 00:03:46.870 align:start position:0% it, resulting in the good signal getting buried<00:03:45.160> under<00:03:45.440> noise.<00:03:45.920> On<00:03:46.000> top<00:03:46.200> of<00:03:46.320> that,<00:03:46.640> deep 00:03:46.870 --> 00:03:46.880 align:start position:0% buried under noise. On top of that, deep 00:03:46.880 --> 00:03:48.310 align:start position:0% buried under noise. On top of that, deep neural<00:03:47.120> network<00:03:47.440> parameters<00:03:47.960> are<00:03:48.080> not 00:03:48.310 --> 00:03:48.320 align:start position:0% neural network parameters are not 00:03:48.320 --> 00:03:50.630 align:start position:0% neural network parameters are not independent<00:03:48.880> knobs<00:03:49.240> like<00:03:49.560> genes<00:03:49.920> in<00:03:50.120> genetic 00:03:50.630 --> 00:03:50.640 align:start position:0% independent knobs like genes in genetic 00:03:50.640 --> 00:03:52.310 align:start position:0% independent knobs like genes in genetic algorithms,<00:03:51.280> which<00:03:51.480> are<00:03:51.600> simple<00:03:52.120> and 00:03:52.310 --> 00:03:52.320 align:start position:0% algorithms, which are simple and 00:03:52.320 --> 00:03:54.710 align:start position:0% algorithms, which are simple and unrelated<00:03:52.960> DNA<00:03:53.400> string.<00:03:53.880> The<00:03:54.000> parameters<00:03:54.600> in 00:03:54.710 --> 00:03:54.720 align:start position:0% unrelated DNA string. The parameters in 00:03:54.720 --> 00:03:56.150 align:start position:0% unrelated DNA string. The parameters in neural<00:03:54.920> networks<00:03:55.400> are<00:03:55.720> highly 00:03:56.150 --> 00:03:56.160 align:start position:0% neural networks are highly 00:03:56.160 --> 00:03:57.790 align:start position:0% neural networks are highly interconnected<00:03:56.880> with<00:03:57.080> each<00:03:57.280> other.<00:03:57.640> So, 00:03:57.790 --> 00:03:57.800 align:start position:0% interconnected with each other. So, 00:03:57.800 --> 00:04:00.510 align:start position:0% interconnected with each other. So, changing<00:03:58.480> one<00:03:58.800> weight<00:03:59.120> slightly<00:03:59.680> can<00:04:00.200> change 00:04:00.510 --> 00:04:00.520 align:start position:0% changing one weight slightly can change 00:04:00.520 --> 00:04:01.790 align:start position:0% changing one weight slightly can change how<00:04:00.640> many<00:04:00.800> other<00:04:01.040> weights<00:04:01.360> behave 00:04:01.790 --> 00:04:01.800 align:start position:0% how many other weights behave 00:04:01.800 --> 00:04:03.590 align:start position:0% how many other weights behave downstream.<00:04:02.440> So,<00:04:02.600> usually<00:04:03.160> it'll<00:04:03.360> probably 00:04:03.590 --> 00:04:03.600 align:start position:0% downstream. So, usually it'll probably 00:04:03.600 --> 00:04:05.190 align:start position:0% downstream. So, usually it'll probably bring<00:04:03.920> more<00:04:04.200> destruction<00:04:05.040> than 00:04:05.190 --> 00:04:05.200 align:start position:0% bring more destruction than 00:04:05.200 --> 00:04:07.430 align:start position:0% bring more destruction than improvements.<00:04:06.080> Some<00:04:06.320> methods<00:04:06.800> did<00:04:06.960> try<00:04:07.280> to 00:04:07.430 --> 00:04:07.440 align:start position:0% improvements. Some methods did try to 00:04:07.440 --> 00:04:09.310 align:start position:0% improvements. Some methods did try to model<00:04:07.800> how<00:04:07.920> parameters<00:04:08.400> interact<00:04:08.920> with<00:04:09.080> each 00:04:09.310 --> 00:04:09.320 align:start position:0% model how parameters interact with each 00:04:09.320 --> 00:04:11.190 align:start position:0% model how parameters interact with each other<00:04:09.560> by<00:04:09.720> learning<00:04:10.040> a<00:04:10.120> large<00:04:10.480> covariance 00:04:11.190 --> 00:04:11.200 align:start position:0% other by learning a large covariance 00:04:11.200 --> 00:04:13.390 align:start position:0% other by learning a large covariance matrix.<00:04:11.720> But<00:04:11.960> for<00:04:12.160> a<00:04:12.240> network<00:04:12.720> with<00:04:12.920> 2<00:04:13.120> million 00:04:13.390 --> 00:04:13.400 align:start position:0% matrix. But for a network with 2 million 00:04:13.400 --> 00:04:15.590 align:start position:0% matrix. But for a network with 2 million parameters,<00:04:14.080> that<00:04:14.360> matrix<00:04:14.880> would<00:04:15.120> contain 00:04:15.590 --> 00:04:15.600 align:start position:0% parameters, that matrix would contain 00:04:15.600 --> 00:04:17.630 align:start position:0% parameters, that matrix would contain trillions<00:04:16.239> of<00:04:16.440> entries.<00:04:17.000> And<00:04:17.120> if<00:04:17.239> you<00:04:17.359> store 00:04:17.630 --> 00:04:17.640 align:start position:0% trillions of entries. And if you store 00:04:17.640 --> 00:04:19.470 align:start position:0% trillions of entries. And if you store and<00:04:17.760> update<00:04:18.000> something<00:04:18.320> that<00:04:18.560> large,<00:04:19.160> it<00:04:19.320> is 00:04:19.470 --> 00:04:19.480 align:start position:0% and update something that large, it is 00:04:19.480 --> 00:04:21.430 align:start position:0% and update something that large, it is pretty<00:04:19.720> much<00:04:20.040> impossible<00:04:20.560> to<00:04:20.680> train<00:04:21.000> and<00:04:21.200> even 00:04:21.430 --> 00:04:21.440 align:start position:0% pretty much impossible to train and even 00:04:21.440 --> 00:04:23.430 align:start position:0% pretty much impossible to train and even use.<00:04:21.799> So,<00:04:22.040> older<00:04:22.320> evolution<00:04:22.800> strategies 00:04:23.430 --> 00:04:23.440 align:start position:0% use. So, older evolution strategies 00:04:23.440 --> 00:04:25.710 align:start position:0% use. So, older evolution strategies simply<00:04:23.920> could<00:04:24.160> not<00:04:24.440> scale<00:04:24.920> to<00:04:25.160> deep<00:04:25.440> neural 00:04:25.710 --> 00:04:25.720 align:start position:0% simply could not scale to deep neural 00:04:25.720 --> 00:04:27.950 align:start position:0% simply could not scale to deep neural networks.<00:04:26.280> But<00:04:26.440> in<00:04:26.600> OpenAI's<00:04:27.080> 2017<00:04:27.600> paper 00:04:27.950 --> 00:04:27.960 align:start position:0% networks. But in OpenAI's 2017 paper 00:04:27.960 --> 00:04:29.550 align:start position:0% networks. But in OpenAI's 2017 paper called<00:04:28.400> Evolution<00:04:28.880> Strategies<00:04:29.400> as<00:04:29.520> a 00:04:29.550 --> 00:04:29.560 align:start position:0% called Evolution Strategies as a 00:04:29.560 --> 00:04:31.310 align:start position:0% called Evolution Strategies as a scalable<00:04:30.120> alternative<00:04:30.520> to<00:04:30.680> reinforcement 00:04:31.310 --> 00:04:31.320 align:start position:0% scalable alternative to reinforcement 00:04:31.320 --> 00:04:32.710 align:start position:0% scalable alternative to reinforcement learning,<00:04:31.800> they<00:04:31.960> changed<00:04:32.320> the<00:04:32.400> way<00:04:32.560> of 00:04:32.710 --> 00:04:32.720 align:start position:0% learning, they changed the way of 00:04:32.720 --> 00:04:34.830 align:start position:0% learning, they changed the way of implementing<00:04:33.320> it<00:04:33.520> for<00:04:33.680> neural<00:04:34.080> networks.<00:04:34.760> So, 00:04:34.830 --> 00:04:34.840 align:start position:0% implementing it for neural networks. So, 00:04:34.840 --> 00:04:36.350 align:start position:0% implementing it for neural networks. So, instead<00:04:35.160> of<00:04:35.240> trying<00:04:35.480> to<00:04:35.640> learn<00:04:35.840> a<00:04:35.920> huge<00:04:36.240> and 00:04:36.350 --> 00:04:36.360 align:start position:0% instead of trying to learn a huge and 00:04:36.360 --> 00:04:38.030 align:start position:0% instead of trying to learn a huge and complicated<00:04:36.880> structure<00:04:37.240> that<00:04:37.440> models<00:04:37.800> how 00:04:38.030 --> 00:04:38.040 align:start position:0% complicated structure that models how 00:04:38.040 --> 00:04:39.630 align:start position:0% complicated structure that models how all<00:04:38.160> the<00:04:38.280> parameters<00:04:38.760> interact<00:04:39.240> with<00:04:39.440> each 00:04:39.630 --> 00:04:39.640 align:start position:0% all the parameters interact with each 00:04:39.640 --> 00:04:41.550 align:start position:0% all the parameters interact with each other,<00:04:40.040> for<00:04:40.200> example,<00:04:40.760> the<00:04:40.880> covariance 00:04:41.550 --> 00:04:41.560 align:start position:0% other, for example, the covariance 00:04:41.560 --> 00:04:44.270 align:start position:0% other, for example, the covariance matrix,<00:04:42.200> they<00:04:42.480> used<00:04:42.840> basic<00:04:43.320> Gaussian<00:04:43.800> noise. 00:04:44.270 --> 00:04:44.280 align:start position:0% matrix, they used basic Gaussian noise. 00:04:44.280 --> 00:04:45.870 align:start position:0% matrix, they used basic Gaussian noise. This<00:04:44.440> slightly<00:04:44.800> nudges<00:04:45.200> all<00:04:45.320> the<00:04:45.440> parameters 00:04:45.870 --> 00:04:45.880 align:start position:0% This slightly nudges all the parameters 00:04:45.880 --> 00:04:47.990 align:start position:0% This slightly nudges all the parameters in<00:04:46.040> random<00:04:46.320> directions<00:04:47.040> and<00:04:47.320> then<00:04:47.560> measures 00:04:47.990 --> 00:04:48.000 align:start position:0% in random directions and then measures 00:04:48.000 --> 00:04:49.750 align:start position:0% in random directions and then measures how<00:04:48.240> the<00:04:48.360> performances<00:04:49.080> are<00:04:49.200> changed.<00:04:49.640> For 00:04:49.750 --> 00:04:49.760 align:start position:0% how the performances are changed. For 00:04:49.760 --> 00:04:51.510 align:start position:0% how the performances are changed. For example,<00:04:50.360> let's<00:04:50.560> say<00:04:50.680> you<00:04:50.760> have<00:04:50.880> a<00:04:50.920> population 00:04:51.510 --> 00:04:51.520 align:start position:0% example, let's say you have a population 00:04:51.520 --> 00:04:53.790 align:start position:0% example, let's say you have a population of<00:04:51.680> nine<00:04:51.960> models<00:04:52.440> in<00:04:52.640> one<00:04:52.880> iteration<00:04:53.560> where 00:04:53.790 --> 00:04:53.800 align:start position:0% of nine models in one iteration where 00:04:53.800 --> 00:04:55.470 align:start position:0% of nine models in one iteration where every<00:04:54.040> model<00:04:54.320> will<00:04:54.480> get<00:04:54.680> a<00:04:54.720> full<00:04:55.000> parameter 00:04:55.470 --> 00:04:55.480 align:start position:0% every model will get a full parameter 00:04:55.480 --> 00:04:57.710 align:start position:0% every model will get a full parameter update<00:04:55.880> in<00:04:56.120> all<00:04:56.320> nine<00:04:56.600> random<00:04:56.960> directions. 00:04:57.710 --> 00:04:57.720 align:start position:0% update in all nine random directions. 00:04:57.720 --> 00:04:59.590 align:start position:0% update in all nine random directions. You<00:04:57.920> then<00:04:58.200> evaluate<00:04:58.760> all<00:04:58.920> nine<00:04:59.160> updated 00:04:59.590 --> 00:04:59.600 align:start position:0% You then evaluate all nine updated 00:04:59.600 --> 00:05:01.270 align:start position:0% You then evaluate all nine updated models<00:05:00.200> and<00:05:00.240> see<00:05:00.520> who<00:05:00.720> has<00:05:00.920> the<00:05:01.040> best 00:05:01.270 --> 00:05:01.280 align:start position:0% models and see who has the best 00:05:01.280 --> 00:05:03.110 align:start position:0% models and see who has the best performance<00:05:01.760> given<00:05:02.040> a<00:05:02.120> task.<00:05:02.640> And<00:05:02.880> there 00:05:03.110 --> 00:05:03.120 align:start position:0% performance given a task. And there 00:05:03.120 --> 00:05:05.030 align:start position:0% performance given a task. And there would<00:05:03.360> be<00:05:03.520> a<00:05:03.600> scalar<00:05:03.960> reward<00:05:04.360> to<00:05:04.480> indicate<00:05:04.880> the 00:05:05.030 --> 00:05:05.040 align:start position:0% would be a scalar reward to indicate the 00:05:05.040 --> 00:05:06.870 align:start position:0% would be a scalar reward to indicate the random<00:05:05.440> direction's<00:05:05.960> performance.<00:05:06.600> After 00:05:06.870 --> 00:05:06.880 align:start position:0% random direction's performance. After 00:05:06.880 --> 00:05:08.550 align:start position:0% random direction's performance. After you<00:05:07.040> evaluate<00:05:07.560> every<00:05:07.800> random<00:05:08.080> direction's 00:05:08.550 --> 00:05:08.560 align:start position:0% you evaluate every random direction's 00:05:08.560 --> 00:05:09.990 align:start position:0% you evaluate every random direction's effectiveness,<00:05:09.360> everything<00:05:09.680> will<00:05:09.800> be 00:05:09.990 --> 00:05:10.000 align:start position:0% effectiveness, everything will be 00:05:10.000 --> 00:05:11.990 align:start position:0% effectiveness, everything will be weighted<00:05:10.480> corresponds<00:05:11.080> to<00:05:11.200> its<00:05:11.360> performance. 00:05:11.990 --> 00:05:12.000 align:start position:0% weighted corresponds to its performance. 00:05:12.000 --> 00:05:13.870 align:start position:0% weighted corresponds to its performance. And<00:05:12.120> a<00:05:12.160> proper<00:05:12.480> update<00:05:12.840> for<00:05:13.040> all<00:05:13.240> nine<00:05:13.480> models 00:05:13.870 --> 00:05:13.880 align:start position:0% And a proper update for all nine models 00:05:13.880 --> 00:05:15.710 align:start position:0% And a proper update for all nine models of<00:05:13.960> the<00:05:14.080> weighted<00:05:14.480> average<00:05:15.000> will<00:05:15.240> be<00:05:15.440> shared 00:05:15.710 --> 00:05:15.720 align:start position:0% of the weighted average will be shared 00:05:15.720 --> 00:05:17.630 align:start position:0% of the weighted average will be shared across<00:05:16.120> all<00:05:16.280> models.<00:05:16.920> Then,<00:05:17.200> you<00:05:17.280> basically 00:05:17.630 --> 00:05:17.640 align:start position:0% across all models. Then, you basically 00:05:17.640 --> 00:05:19.270 align:start position:0% across all models. Then, you basically repeat<00:05:18.000> this<00:05:18.160> process.<00:05:18.720> If<00:05:18.840> you<00:05:18.960> scale<00:05:19.200> the 00:05:19.270 --> 00:05:19.280 align:start position:0% repeat this process. If you scale the 00:05:19.280 --> 00:05:21.390 align:start position:0% repeat this process. If you scale the population<00:05:19.880> by<00:05:20.080> running<00:05:20.520> hundreds<00:05:20.920> or<00:05:21.120> even 00:05:21.390 --> 00:05:21.400 align:start position:0% population by running hundreds or even 00:05:21.400 --> 00:05:23.630 align:start position:0% population by running hundreds or even more<00:05:21.600> than<00:05:21.800> a<00:05:21.840> thousand<00:05:22.400> at<00:05:22.600> once,<00:05:23.120> they<00:05:23.320> could 00:05:23.630 --> 00:05:23.640 align:start position:0% more than a thousand at once, they could 00:05:23.640 --> 00:05:26.110 align:start position:0% more than a thousand at once, they could average<00:05:24.120> over<00:05:24.400> many<00:05:24.720> random<00:05:25.240> perturbations 00:05:26.110 --> 00:05:26.120 align:start position:0% average over many random perturbations 00:05:26.120 --> 00:05:28.270 align:start position:0% average over many random perturbations and<00:05:26.440> eventually<00:05:27.120> the<00:05:27.280> noise<00:05:27.760> will<00:05:27.880> start<00:05:28.200> to 00:05:28.270 --> 00:05:28.280 align:start position:0% and eventually the noise will start to 00:05:28.280 --> 00:05:30.070 align:start position:0% and eventually the noise will start to cancel<00:05:28.680> out<00:05:28.880> and<00:05:29.000> the<00:05:29.080> useful<00:05:29.320> direction 00:05:30.070 --> 00:05:30.080 align:start position:0% cancel out and the useful direction 00:05:30.080 --> 00:05:31.550 align:start position:0% cancel out and the useful direction would<00:05:30.240> naturally<00:05:30.720> emerge.<00:05:31.120> That's<00:05:31.360> why 00:05:31.550 --> 00:05:31.560 align:start position:0% would naturally emerge. That's why 00:05:31.560 --> 00:05:33.830 align:start position:0% would naturally emerge. That's why evolution<00:05:32.000> strategies<00:05:32.640> were<00:05:33.000> revived.<00:05:33.680> Not 00:05:33.830 --> 00:05:33.840 align:start position:0% evolution strategies were revived. Not 00:05:33.840 --> 00:05:35.590 align:start position:0% evolution strategies were revived. Not because<00:05:34.120> the<00:05:34.200> core<00:05:34.440> idea<00:05:34.760> changed,<00:05:35.320> but 00:05:35.590 --> 00:05:35.600 align:start position:0% because the core idea changed, but 00:05:35.600 --> 00:05:37.470 align:start position:0% because the core idea changed, but because<00:05:36.040> the<00:05:36.240> engineering<00:05:36.760> made<00:05:37.000> the<00:05:37.080> method 00:05:37.470 --> 00:05:37.480 align:start position:0% because the engineering made the method 00:05:37.480 --> 00:05:38.990 align:start position:0% because the engineering made the method viable<00:05:37.880> for<00:05:38.040> deep<00:05:38.320> neural<00:05:38.600> networks, 00:05:38.990 --> 00:05:39.000 align:start position:0% viable for deep neural networks, 00:05:39.000 --> 00:05:40.470 align:start position:0% viable for deep neural networks, especially<00:05:39.480> in<00:05:39.640> deep<00:05:39.880> reinforcement 00:05:40.470 --> 00:05:40.480 align:start position:0% especially in deep reinforcement 00:05:40.480 --> 00:05:42.510 align:start position:0% especially in deep reinforcement learning<00:05:40.840> for<00:05:41.040> Atari<00:05:41.520> games.<00:05:41.960> This<00:05:42.160> OpenAI 00:05:42.510 --> 00:05:42.520 align:start position:0% learning for Atari games. This OpenAI 00:05:42.520 --> 00:05:44.030 align:start position:0% learning for Atari games. This OpenAI research<00:05:42.920> was<00:05:43.120> the<00:05:43.240> first<00:05:43.600> time<00:05:43.880> that 00:05:44.030 --> 00:05:44.040 align:start position:0% research was the first time that 00:05:44.040 --> 00:05:46.230 align:start position:0% research was the first time that evolution<00:05:44.600> strategies<00:05:45.320> worked<00:05:45.760> on<00:05:46.000> deep 00:05:46.230 --> 00:05:46.240 align:start position:0% evolution strategies worked on deep 00:05:46.240 --> 00:05:48.270 align:start position:0% evolution strategies worked on deep neural<00:05:46.520> networks,<00:05:46.960> which<00:05:47.240> is<00:05:47.600> a<00:05:47.680> pivotal 00:05:48.270 --> 00:05:48.280 align:start position:0% neural networks, which is a pivotal 00:05:48.280 --> 00:05:50.350 align:start position:0% neural networks, which is a pivotal paper.<00:05:48.680> So,<00:05:48.880> as<00:05:49.040> we<00:05:49.160> now<00:05:49.400> know<00:05:49.680> that<00:05:49.960> evolution 00:05:50.350 --> 00:05:50.360 align:start position:0% paper. So, as we now know that evolution 00:05:50.360 --> 00:05:52.670 align:start position:0% paper. So, as we now know that evolution strategies<00:05:50.880> could<00:05:51.280> be<00:05:51.440> a<00:05:51.520> good<00:05:51.760> optimizer, 00:05:52.670 --> 00:05:52.680 align:start position:0% strategies could be a good optimizer, 00:05:52.680 --> 00:05:55.110 align:start position:0% strategies could be a good optimizer, then<00:05:53.120> what<00:05:53.280> if<00:05:53.400> we<00:05:53.560> use<00:05:53.760> it<00:05:53.880> to<00:05:54.000> optimize<00:05:54.480> LLMs? 00:05:55.110 --> 00:05:55.120 align:start position:0% then what if we use it to optimize LLMs? 00:05:55.120 --> 00:05:57.110 align:start position:0% then what if we use it to optimize LLMs? Well,<00:05:55.400> the<00:05:55.520> practical<00:05:55.960> truth<00:05:56.280> is,<00:05:56.680> due<00:05:56.840> to<00:05:56.960> how 00:05:57.110 --> 00:05:57.120 align:start position:0% Well, the practical truth is, due to how 00:05:57.120 --> 00:05:58.950 align:start position:0% Well, the practical truth is, due to how the<00:05:57.280> learning<00:05:57.680> is<00:05:57.840> set<00:05:58.120> up,<00:05:58.520> next<00:05:58.760> token 00:05:58.950 --> 00:05:58.960 align:start position:0% the learning is set up, next token 00:05:58.960 --> 00:06:01.470 align:start position:0% the learning is set up, next token prediction<00:05:59.520> is<00:05:59.920> easy<00:06:00.360> for<00:06:00.560> gradients,<00:06:01.120> but 00:06:01.470 --> 00:06:01.480 align:start position:0% prediction is easy for gradients, but 00:06:01.480 --> 00:06:03.310 align:start position:0% prediction is easy for gradients, but hard<00:06:01.840> for<00:06:02.040> evolution<00:06:02.480> strategies.<00:06:03.080> Because 00:06:03.310 --> 00:06:03.320 align:start position:0% hard for evolution strategies. Because 00:06:03.320 --> 00:06:05.190 align:start position:0% hard for evolution strategies. Because with<00:06:03.480> next<00:06:03.720> token<00:06:03.960> prediction,<00:06:04.640> you<00:06:04.840> have<00:06:05.120> a 00:06:05.190 --> 00:06:05.200 align:start position:0% with next token prediction, you have a 00:06:05.200 --> 00:06:07.990 align:start position:0% with next token prediction, you have a clear<00:06:05.520> teacher<00:06:05.880> signal<00:06:06.400> at<00:06:06.880> every<00:06:07.400> token.<00:06:07.880> So, 00:06:07.990 --> 00:06:08.000 align:start position:0% clear teacher signal at every token. So, 00:06:08.000 --> 00:06:09.430 align:start position:0% clear teacher signal at every token. So, the<00:06:08.120> correct<00:06:08.400> next<00:06:08.720> word<00:06:09.000> will<00:06:09.200> always 00:06:09.430 --> 00:06:09.440 align:start position:0% the correct next word will always 00:06:09.440 --> 00:06:11.790 align:start position:0% the correct next word will always provide<00:06:09.840> good<00:06:10.160> loss<00:06:10.560> that<00:06:10.800> is<00:06:11.040> both<00:06:11.400> smooth 00:06:11.790 --> 00:06:11.800 align:start position:0% provide good loss that is both smooth 00:06:11.800 --> 00:06:13.670 align:start position:0% provide good loss that is both smooth and<00:06:12.000> differentiable.<00:06:12.840> But<00:06:13.000> in<00:06:13.160> evolution 00:06:13.670 --> 00:06:13.680 align:start position:0% and differentiable. But in evolution 00:06:13.680 --> 00:06:15.390 align:start position:0% and differentiable. But in evolution strategies'<00:06:14.160> case,<00:06:14.560> we<00:06:14.840> are<00:06:14.960> basically 00:06:15.390 --> 00:06:15.400 align:start position:0% strategies' case, we are basically 00:06:15.400 --> 00:06:17.190 align:start position:0% strategies' case, we are basically throwing<00:06:15.760> away<00:06:16.040> most<00:06:16.280> of<00:06:16.360> that<00:06:16.520> information 00:06:17.190 --> 00:06:17.200 align:start position:0% throwing away most of that information 00:06:17.200 --> 00:06:19.390 align:start position:0% throwing away most of that information and<00:06:17.360> replacing<00:06:18.000> it<00:06:18.200> with<00:06:18.440> a<00:06:18.520> single<00:06:19.040> scalar 00:06:19.390 --> 00:06:19.400 align:start position:0% and replacing it with a single scalar 00:06:19.400 --> 00:06:21.550 align:start position:0% and replacing it with a single scalar reward,<00:06:19.920> which<00:06:20.080> is<00:06:20.240> kind<00:06:20.440> of<00:06:20.520> like<00:06:20.800> an<00:06:21.080> average 00:06:21.550 --> 00:06:21.560 align:start position:0% reward, which is kind of like an average 00:06:21.560 --> 00:06:23.910 align:start position:0% reward, which is kind of like an average loss<00:06:21.920> of<00:06:22.280> everything.<00:06:22.920> And<00:06:23.080> this<00:06:23.400> is<00:06:23.600> a<00:06:23.640> lot 00:06:23.910 --> 00:06:23.920 align:start position:0% loss of everything. And this is a lot 00:06:23.920 --> 00:06:25.670 align:start position:0% loss of everything. And this is a lot less<00:06:24.120> meaningful<00:06:24.680> than<00:06:24.960> what<00:06:25.200> next<00:06:25.440> token 00:06:25.670 --> 00:06:25.680 align:start position:0% less meaningful than what next token 00:06:25.680 --> 00:06:27.950 align:start position:0% less meaningful than what next token prediction<00:06:26.160> can<00:06:26.400> provide.<00:06:27.040> On<00:06:27.200> top<00:06:27.440> of<00:06:27.560> that, 00:06:27.950 --> 00:06:27.960 align:start position:0% prediction can provide. On top of that, 00:06:27.960 --> 00:06:29.910 align:start position:0% prediction can provide. On top of that, evolution<00:06:28.360> strategies<00:06:28.960> takes<00:06:29.360> a<00:06:29.400> huge<00:06:29.680> amount 00:06:29.910 --> 00:06:29.920 align:start position:0% evolution strategies takes a huge amount 00:06:29.920 --> 00:06:31.910 align:start position:0% evolution strategies takes a huge amount of<00:06:30.000> compute<00:06:30.760> while<00:06:30.840> giving<00:06:31.200> very<00:06:31.520> fuzzy 00:06:31.910 --> 00:06:31.920 align:start position:0% of compute while giving very fuzzy 00:06:31.920 --> 00:06:33.870 align:start position:0% of compute while giving very fuzzy signal.<00:06:32.480> But<00:06:32.680> one<00:06:32.880> gradient<00:06:33.240> step<00:06:33.520> in<00:06:33.640> next 00:06:33.870 --> 00:06:33.880 align:start position:0% signal. But one gradient step in next 00:06:33.880 --> 00:06:35.830 align:start position:0% signal. But one gradient step in next token<00:06:34.080> prediction<00:06:34.560> tells<00:06:34.840> you<00:06:34.960> exactly<00:06:35.520> how 00:06:35.830 --> 00:06:35.840 align:start position:0% token prediction tells you exactly how 00:06:35.840 --> 00:06:38.070 align:start position:0% token prediction tells you exactly how wrong<00:06:36.160> you<00:06:36.320> are.<00:06:36.640> However,<00:06:37.160> all<00:06:37.400> hope<00:06:37.720> is<00:06:37.880> not 00:06:38.070 --> 00:06:38.080 align:start position:0% wrong you are. However, all hope is not 00:06:38.080 --> 00:06:39.990 align:start position:0% wrong you are. However, all hope is not lost.<00:06:38.480> Reinforcement<00:06:39.120> learning<00:06:39.440> in<00:06:39.600> LLM 00:06:39.990 --> 00:06:40.000 align:start position:0% lost. Reinforcement learning in LLM 00:06:40.000 --> 00:06:42.390 align:start position:0% lost. Reinforcement learning in LLM fine-tuning<00:06:40.720> is<00:06:41.000> the<00:06:41.280> opposite<00:06:41.720> situation 00:06:42.390 --> 00:06:42.400 align:start position:0% fine-tuning is the opposite situation 00:06:42.400 --> 00:06:44.390 align:start position:0% fine-tuning is the opposite situation compared<00:06:42.920> to<00:06:43.080> next<00:06:43.360> token<00:06:43.600> prediction<00:06:44.160> used 00:06:44.390 --> 00:06:44.400 align:start position:0% compared to next token prediction used 00:06:44.400 --> 00:06:46.350 align:start position:0% compared to next token prediction used during<00:06:44.680> pre-training.<00:06:45.360> In<00:06:45.560> LLM<00:06:45.960> RL 00:06:46.350 --> 00:06:46.360 align:start position:0% during pre-training. In LLM RL 00:06:46.360 --> 00:06:48.670 align:start position:0% during pre-training. In LLM RL fine-tuning,<00:06:47.040> you<00:06:47.240> often<00:06:47.600> only<00:06:47.840> get<00:06:48.040> a<00:06:48.120> single 00:06:48.670 --> 00:06:48.680 align:start position:0% fine-tuning, you often only get a single 00:06:48.680 --> 00:06:50.910 align:start position:0% fine-tuning, you often only get a single score<00:06:49.080> for<00:06:49.320> the<00:06:49.520> whole<00:06:49.880> generated<00:06:50.360> answer. 00:06:50.910 --> 00:06:50.920 align:start position:0% score for the whole generated answer. 00:06:50.920 --> 00:06:52.630 align:start position:0% score for the whole generated answer. You<00:06:51.040> do<00:06:51.160> not<00:06:51.360> get<00:06:51.520> a<00:06:51.600> clean<00:06:52.080> signal.<00:06:52.520> For 00:06:52.630 --> 00:06:52.640 align:start position:0% You do not get a clean signal. For 00:06:52.640 --> 00:06:54.430 align:start position:0% You do not get a clean signal. For example,<00:06:53.240> which<00:06:53.480> token<00:06:53.880> makes<00:06:54.080> the<00:06:54.200> most 00:06:54.430 --> 00:06:54.440 align:start position:0% example, which token makes the most 00:06:54.440 --> 00:06:56.190 align:start position:0% example, which token makes the most difference<00:06:54.920> in<00:06:55.080> this<00:06:55.280> sentence.<00:06:55.920> So,<00:06:56.040> we 00:06:56.190 --> 00:06:56.200 align:start position:0% difference in this sentence. So, we 00:06:56.200 --> 00:06:58.550 align:start position:0% difference in this sentence. So, we often<00:06:56.520> see<00:06:56.720> in<00:06:56.920> RLVR<00:06:57.600> research<00:06:58.160> that<00:06:58.440> the 00:06:58.550 --> 00:06:58.560 align:start position:0% often see in RLVR research that the 00:06:58.560 --> 00:07:00.510 align:start position:0% often see in RLVR research that the learning<00:06:58.920> signal<00:06:59.320> is<00:06:59.520> way<00:06:59.640> too<00:07:00.000> sparse<00:07:00.240> due<00:07:00.360> to 00:07:00.510 --> 00:07:00.520 align:start position:0% learning signal is way too sparse due to 00:07:00.520 --> 00:07:02.390 align:start position:0% learning signal is way too sparse due to how<00:07:00.760> for<00:07:00.960> a<00:07:01.040> piece<00:07:01.320> of<00:07:01.440> training<00:07:01.800> data,<00:07:02.240> you 00:07:02.390 --> 00:07:02.400 align:start position:0% how for a piece of training data, you 00:07:02.400 --> 00:07:04.830 align:start position:0% how for a piece of training data, you sometimes<00:07:02.960> only<00:07:03.280> get<00:07:03.600> a<00:07:03.680> binary<00:07:04.360> feedback 00:07:04.830 --> 00:07:04.840 align:start position:0% sometimes only get a binary feedback 00:07:04.840 --> 00:07:06.710 align:start position:0% sometimes only get a binary feedback that<00:07:05.040> is<00:07:05.240> then<00:07:05.520> used<00:07:05.720> to<00:07:05.800> update<00:07:06.080> a<00:07:06.120> billion<00:07:06.600> or 00:07:06.710 --> 00:07:06.720 align:start position:0% that is then used to update a billion or 00:07:06.720 --> 00:07:08.470 align:start position:0% that is then used to update a billion or even<00:07:06.960> a<00:07:07.040> trillion<00:07:07.480> parameter<00:07:07.880> model.<00:07:08.360> So, 00:07:08.470 --> 00:07:08.480 align:start position:0% even a trillion parameter model. So, 00:07:08.480 --> 00:07:09.670 align:start position:0% even a trillion parameter model. So, with<00:07:08.640> a<00:07:08.680> lot<00:07:08.880> of<00:07:08.960> new<00:07:09.120> research<00:07:09.440> trying<00:07:09.600> to 00:07:09.670 --> 00:07:09.680 align:start position:0% with a lot of new research trying to 00:07:09.680 --> 00:07:11.150 align:start position:0% with a lot of new research trying to figure<00:07:09.960> out<00:07:10.040> how<00:07:10.160> to<00:07:10.240> provide<00:07:10.600> more<00:07:10.800> learning 00:07:11.150 --> 00:07:11.160 align:start position:0% figure out how to provide more learning 00:07:11.160 --> 00:07:13.390 align:start position:0% figure out how to provide more learning signals<00:07:11.640> in<00:07:11.840> RLVR<00:07:12.520> processing,<00:07:13.200> for 00:07:13.390 --> 00:07:13.400 align:start position:0% signals in RLVR processing, for 00:07:13.400 --> 00:07:15.310 align:start position:0% signals in RLVR processing, for instance,<00:07:13.920> token<00:07:14.200> level<00:07:14.440> credit<00:07:14.760> assignment, 00:07:15.310 --> 00:07:15.320 align:start position:0% instance, token level credit assignment, 00:07:15.320 --> 00:07:16.870 align:start position:0% instance, token level credit assignment, which<00:07:15.520> I<00:07:15.600> talked<00:07:15.920> about<00:07:16.120> before.<00:07:16.680> This 00:07:16.870 --> 00:07:16.880 align:start position:0% which I talked about before. This 00:07:16.880 --> 00:07:18.910 align:start position:0% which I talked about before. This situation,<00:07:17.360> however,<00:07:17.720> is<00:07:17.920> exactly<00:07:18.480> the<00:07:18.600> kind 00:07:18.910 --> 00:07:18.920 align:start position:0% situation, however, is exactly the kind 00:07:18.920 --> 00:07:20.790 align:start position:0% situation, however, is exactly the kind of<00:07:19.040> setting<00:07:19.480> where<00:07:19.720> evolution<00:07:20.200> strategies 00:07:20.790 --> 00:07:20.800 align:start position:0% of setting where evolution strategies 00:07:20.800 --> 00:07:22.390 align:start position:0% of setting where evolution strategies would<00:07:21.120> actually<00:07:21.440> make<00:07:21.640> sense.<00:07:22.160> Since 00:07:22.390 --> 00:07:22.400 align:start position:0% would actually make sense. Since 00:07:22.400 --> 00:07:24.430 align:start position:0% would actually make sense. Since evolution<00:07:22.800> strategies<00:07:23.320> only<00:07:23.600> needs<00:07:23.880> a<00:07:23.960> reward 00:07:24.430 --> 00:07:24.440 align:start position:0% evolution strategies only needs a reward 00:07:24.440 --> 00:07:26.030 align:start position:0% evolution strategies only needs a reward for<00:07:24.520> the<00:07:24.680> whole<00:07:25.000> outcome,<00:07:25.520> and<00:07:25.680> it<00:07:25.760> doesn't 00:07:26.030 --> 00:07:26.040 align:start position:0% for the whole outcome, and it doesn't 00:07:26.040 --> 00:07:27.790 align:start position:0% for the whole outcome, and it doesn't need<00:07:26.200> to<00:07:26.320> backpropagate<00:07:27.200> through<00:07:27.440> a<00:07:27.520> long 00:07:27.790 --> 00:07:27.800 align:start position:0% need to backpropagate through a long 00:07:27.800 --> 00:07:30.030 align:start position:0% need to backpropagate through a long sequence<00:07:28.240> or<00:07:28.360> decide<00:07:28.760> which<00:07:28.960> token<00:07:29.360> deserves 00:07:30.030 --> 00:07:30.040 align:start position:0% sequence or decide which token deserves 00:07:30.040 --> 00:07:31.390 align:start position:0% sequence or decide which token deserves the<00:07:30.120> credit.<00:07:30.520> The<00:07:30.640> way<00:07:30.800> that<00:07:30.960> it<00:07:31.080> treats<00:07:31.320> the 00:07:31.390 --> 00:07:31.400 align:start position:0% the credit. The way that it treats the 00:07:31.400 --> 00:07:33.270 align:start position:0% the credit. The way that it treats the model<00:07:31.680> as<00:07:31.760> a<00:07:31.800> black<00:07:32.080> box<00:07:32.440> consequently 00:07:33.270 --> 00:07:33.280 align:start position:0% model as a black box consequently 00:07:33.280 --> 00:07:35.470 align:start position:0% model as a black box consequently provides<00:07:33.760> larger<00:07:34.120> parameter<00:07:34.640> updates,<00:07:35.200> which 00:07:35.470 --> 00:07:35.480 align:start position:0% provides larger parameter updates, which 00:07:35.480 --> 00:07:37.310 align:start position:0% provides larger parameter updates, which theoretically<00:07:36.280> should<00:07:36.520> be<00:07:36.760> able<00:07:37.000> to<00:07:37.080> give 00:07:37.310 --> 00:07:37.320 align:start position:0% theoretically should be able to give 00:07:37.320 --> 00:07:40.230 align:start position:0% theoretically should be able to give stronger<00:07:37.720> feedback<00:07:38.200> than<00:07:38.400> RLVR<00:07:39.240> like<00:07:39.440> GRPO. 00:07:40.230 --> 00:07:40.240 align:start position:0% stronger feedback than RLVR like GRPO. 00:07:40.240 --> 00:07:42.470 align:start position:0% stronger feedback than RLVR like GRPO. And<00:07:40.440> this<00:07:40.880> is<00:07:41.160> exactly<00:07:41.800> what<00:07:41.960> the<00:07:42.040> paper 00:07:42.470 --> 00:07:42.480 align:start position:0% And this is exactly what the paper 00:07:42.480 --> 00:07:44.510 align:start position:0% And this is exactly what the paper Evolution<00:07:43.000> Strategies<00:07:43.520> at<00:07:43.680> Scale<00:07:44.200> published 00:07:44.510 --> 00:07:44.520 align:start position:0% Evolution Strategies at Scale published 00:07:44.520 --> 00:07:47.150 align:start position:0% Evolution Strategies at Scale published back<00:07:44.720> in<00:07:44.840> September<00:07:45.320> 2025<00:07:46.160> has<00:07:46.440> found<00:07:46.760> out.<00:07:47.040> In 00:07:47.150 --> 00:07:47.160 align:start position:0% back in September 2025 has found out. In 00:07:47.160 --> 00:07:48.910 align:start position:0% back in September 2025 has found out. In their<00:07:47.320> setup,<00:07:47.800> evolution<00:07:48.200> strategies<00:07:48.680> does 00:07:48.910 --> 00:07:48.920 align:start position:0% their setup, evolution strategies does 00:07:48.920 --> 00:07:50.950 align:start position:0% their setup, evolution strategies does not<00:07:49.160> need<00:07:49.360> token<00:07:49.680> level<00:07:49.960> rewards<00:07:50.440> and<00:07:50.640> only 00:07:50.950 --> 00:07:50.960 align:start position:0% not need token level rewards and only 00:07:50.960 --> 00:07:52.870 align:start position:0% not need token level rewards and only needs<00:07:51.200> a<00:07:51.280> response<00:07:51.760> level<00:07:52.040> reward<00:07:52.360> for<00:07:52.640> each 00:07:52.870 --> 00:07:52.880 align:start position:0% needs a response level reward for each 00:07:52.880 --> 00:07:54.630 align:start position:0% needs a response level reward for each batch<00:07:53.160> of<00:07:53.320> perturbations,<00:07:54.200> which<00:07:54.400> kind<00:07:54.560> of 00:07:54.630 --> 00:07:54.640 align:start position:0% batch of perturbations, which kind of 00:07:54.640 --> 00:07:56.110 align:start position:0% batch of perturbations, which kind of makes<00:07:54.800> it<00:07:54.880> a<00:07:54.960> perfect<00:07:55.360> match<00:07:55.680> for<00:07:55.880> long 00:07:56.110 --> 00:07:56.120 align:start position:0% makes it a perfect match for long 00:07:56.120 --> 00:07:58.190 align:start position:0% makes it a perfect match for long horizon<00:07:56.520> outcome<00:07:56.880> only<00:07:57.160> tasks<00:07:57.720> where<00:07:57.880> credit 00:07:58.190 --> 00:07:58.200 align:start position:0% horizon outcome only tasks where credit 00:07:58.200 --> 00:08:00.350 align:start position:0% horizon outcome only tasks where credit assignment<00:07:58.640> is<00:07:58.800> a<00:07:58.840> lot<00:07:59.120> harder<00:07:59.520> to<00:07:59.680> attribute. 00:08:00.350 --> 00:08:00.360 align:start position:0% assignment is a lot harder to attribute. 00:08:00.360 --> 00:08:02.270 align:start position:0% assignment is a lot harder to attribute. On<00:08:00.400> top<00:08:00.680> of<00:08:00.800> that,<00:08:01.200> this<00:08:01.400> paper<00:08:01.680> is<00:08:01.800> the<00:08:01.920> first 00:08:02.270 --> 00:08:02.280 align:start position:0% On top of that, this paper is the first 00:08:02.280 --> 00:08:04.150 align:start position:0% On top of that, this paper is the first paper<00:08:02.560> that<00:08:02.800> tested<00:08:03.120> evolution<00:08:03.600> strategies 00:08:04.150 --> 00:08:04.160 align:start position:0% paper that tested evolution strategies 00:08:04.160 --> 00:08:06.390 align:start position:0% paper that tested evolution strategies on<00:08:04.360> a<00:08:04.400> model<00:08:04.840> with<00:08:05.160> billions<00:08:05.640> of<00:08:05.760> parameters. 00:08:06.390 --> 00:08:06.400 align:start position:0% on a model with billions of parameters. 00:08:06.400 --> 00:08:08.150 align:start position:0% on a model with billions of parameters. It<00:08:06.520> replaced<00:08:07.000> the<00:08:07.120> idea<00:08:07.400> of<00:08:07.600> action<00:08:07.920> space 00:08:08.150 --> 00:08:08.160 align:start position:0% It replaced the idea of action space 00:08:08.160 --> 00:08:10.030 align:start position:0% It replaced the idea of action space exploration<00:08:08.880> to<00:08:09.040> parameter<00:08:09.640> space 00:08:10.030 --> 00:08:10.040 align:start position:0% exploration to parameter space 00:08:10.040 --> 00:08:11.950 align:start position:0% exploration to parameter space exploration.<00:08:10.840> Because<00:08:11.120> in<00:08:11.360> action<00:08:11.680> space 00:08:11.950 --> 00:08:11.960 align:start position:0% exploration. Because in action space 00:08:11.960 --> 00:08:14.310 align:start position:0% exploration. Because in action space exploration,<00:08:12.920> each<00:08:13.120> sampled<00:08:13.560> sequence<00:08:14.040> is<00:08:14.240> a 00:08:14.310 --> 00:08:14.320 align:start position:0% exploration, each sampled sequence is a 00:08:14.320 --> 00:08:16.470 align:start position:0% exploration, each sampled sequence is a small<00:08:14.600> variation<00:08:15.280> of<00:08:15.520> what<00:08:15.720> the<00:08:15.800> same<00:08:16.120> model 00:08:16.470 --> 00:08:16.480 align:start position:0% small variation of what the same model 00:08:16.480 --> 00:08:18.430 align:start position:0% small variation of what the same model would<00:08:16.640> normally<00:08:17.120> say.<00:08:17.560> The<00:08:17.640> model's<00:08:17.960> internal 00:08:18.430 --> 00:08:18.440 align:start position:0% would normally say. The model's internal 00:08:18.440 --> 00:08:20.430 align:start position:0% would normally say. The model's internal reasoning<00:08:18.800> structure<00:08:19.320> is<00:08:19.640> unchanged.<00:08:20.320> You're 00:08:20.430 --> 00:08:20.440 align:start position:0% reasoning structure is unchanged. You're 00:08:20.440 --> 00:08:21.990 align:start position:0% reasoning structure is unchanged. You're just<00:08:20.680> basically<00:08:21.040> sampling<00:08:21.480> from<00:08:21.720> what<00:08:21.920> the 00:08:21.990 --> 00:08:22.000 align:start position:0% just basically sampling from what the 00:08:22.000 --> 00:08:24.110 align:start position:0% just basically sampling from what the model<00:08:22.440> already<00:08:22.840> knows.<00:08:23.280> But<00:08:23.520> in<00:08:23.680> parameter 00:08:24.110 --> 00:08:24.120 align:start position:0% model already knows. But in parameter 00:08:24.120 --> 00:08:26.310 align:start position:0% model already knows. But in parameter space<00:08:24.400> exploration,<00:08:25.440> each<00:08:25.680> perturbation 00:08:26.310 --> 00:08:26.320 align:start position:0% space exploration, each perturbation 00:08:26.320 --> 00:08:28.270 align:start position:0% space exploration, each perturbation slightly<00:08:26.840> changes<00:08:27.360> the<00:08:27.440> model's<00:08:27.880> reasoning 00:08:28.270 --> 00:08:28.280 align:start position:0% slightly changes the model's reasoning 00:08:28.280 --> 00:08:30.430 align:start position:0% slightly changes the model's reasoning behavior<00:08:28.760> itself.<00:08:29.400> One<00:08:29.520> perturbation<00:08:30.160> might 00:08:30.430 --> 00:08:30.440 align:start position:0% behavior itself. One perturbation might 00:08:30.440 --> 00:08:32.190 align:start position:0% behavior itself. One perturbation might make<00:08:30.600> the<00:08:30.720> model<00:08:31.040> more<00:08:31.200> concise,<00:08:31.840> another 00:08:32.190 --> 00:08:32.200 align:start position:0% make the model more concise, another 00:08:32.200 --> 00:08:34.469 align:start position:0% make the model more concise, another might<00:08:32.440> make<00:08:32.640> it<00:08:32.760> more<00:08:33.000> verbose,<00:08:33.640> maybe<00:08:34.200> even 00:08:34.469 --> 00:08:34.479 align:start position:0% might make it more verbose, maybe even 00:08:34.479 --> 00:08:36.430 align:start position:0% might make it more verbose, maybe even discovering<00:08:35.080> a<00:08:35.159> new<00:08:35.520> reasoning<00:08:35.919> approach. 00:08:36.430 --> 00:08:36.440 align:start position:0% discovering a new reasoning approach. 00:08:36.440 --> 00:08:38.070 align:start position:0% discovering a new reasoning approach. Because<00:08:36.719> what<00:08:36.880> evolution<00:08:37.320> strategies<00:08:37.800> does 00:08:38.070 --> 00:08:38.080 align:start position:0% Because what evolution strategies does 00:08:38.080 --> 00:08:40.550 align:start position:0% Because what evolution strategies does is<00:08:38.400> provide<00:08:38.840> structural<00:08:39.440> behavior<00:08:40.000> changes, 00:08:40.550 --> 00:08:40.560 align:start position:0% is provide structural behavior changes, 00:08:40.560 --> 00:08:42.230 align:start position:0% is provide structural behavior changes, not<00:08:40.800> just<00:08:41.039> token<00:08:41.360> level<00:08:41.599> randomness. 00:08:42.230 --> 00:08:42.240 align:start position:0% not just token level randomness. 00:08:42.240 --> 00:08:44.110 align:start position:0% not just token level randomness. Especially<00:08:42.719> when<00:08:42.919> it<00:08:43.039> is<00:08:43.200> restricted<00:08:43.800> to<00:08:43.919> its 00:08:44.110 --> 00:08:44.120 align:start position:0% Especially when it is restricted to its 00:08:44.120 --> 00:08:45.550 align:start position:0% Especially when it is restricted to its own<00:08:44.280> knowledge<00:08:44.680> base<00:08:45.000> and<00:08:45.200> just<00:08:45.400> be 00:08:45.550 --> 00:08:45.560 align:start position:0% own knowledge base and just be 00:08:45.560 --> 00:08:47.550 align:start position:0% own knowledge base and just be reinforcing<00:08:46.360> a<00:08:46.440> pre-existing<00:08:47.160> sampling 00:08:47.550 --> 00:08:47.560 align:start position:0% reinforcing a pre-existing sampling 00:08:47.560 --> 00:08:49.510 align:start position:0% reinforcing a pre-existing sampling distribution.<00:08:48.240> This<00:08:48.520> blew<00:08:48.760> away<00:08:49.200> all 00:08:49.510 --> 00:08:49.520 align:start position:0% distribution. This blew away all 00:08:49.520 --> 00:08:51.430 align:start position:0% distribution. This blew away all previous<00:08:50.000> expectations<00:08:50.760> of<00:08:50.960> evolution 00:08:51.430 --> 00:08:51.440 align:start position:0% previous expectations of evolution 00:08:51.440 --> 00:08:53.550 align:start position:0% previous expectations of evolution strategies,<00:08:52.040> especially<00:08:52.480> the<00:08:52.640> assumption<00:08:53.320> of 00:08:53.550 --> 00:08:53.560 align:start position:0% strategies, especially the assumption of 00:08:53.560 --> 00:08:55.030 align:start position:0% strategies, especially the assumption of it<00:08:53.680> cannot<00:08:54.040> scale<00:08:54.400> beyond<00:08:54.760> million 00:08:55.030 --> 00:08:55.040 align:start position:0% it cannot scale beyond million 00:08:55.040 --> 00:08:56.870 align:start position:0% it cannot scale beyond million parameters.<00:08:55.720> And<00:08:55.840> the<00:08:55.960> reason<00:08:56.440> why<00:08:56.600> it's<00:08:56.760> so 00:08:56.870 --> 00:08:56.880 align:start position:0% parameters. And the reason why it's so 00:08:56.880 --> 00:08:58.430 align:start position:0% parameters. And the reason why it's so surprising<00:08:57.360> is<00:08:57.480> that<00:08:57.720> ever<00:08:57.920> since<00:08:58.200> that 00:08:58.430 --> 00:08:58.440 align:start position:0% surprising is that ever since that 00:08:58.440 --> 00:09:00.590 align:start position:0% surprising is that ever since that OpenAI<00:08:58.840> paper,<00:08:59.280> it<00:08:59.400> was<00:08:59.600> widely<00:09:00.040> assumed 00:09:00.590 --> 00:09:00.600 align:start position:0% OpenAI paper, it was widely assumed 00:09:00.600 --> 00:09:02.310 align:start position:0% OpenAI paper, it was widely assumed evolution<00:09:01.080> strategies<00:09:01.600> would<00:09:01.800> not<00:09:02.000> be<00:09:02.120> able 00:09:02.310 --> 00:09:02.320 align:start position:0% evolution strategies would not be able 00:09:02.320 --> 00:09:04.710 align:start position:0% evolution strategies would not be able to<00:09:02.400> scale<00:09:02.640> up<00:09:02.800> to<00:09:03.000> LLM<00:09:03.440> sized<00:09:03.840> models.<00:09:04.440> This 00:09:04.710 --> 00:09:04.720 align:start position:0% to scale up to LLM sized models. This 00:09:04.720 --> 00:09:06.590 align:start position:0% to scale up to LLM sized models. This simply<00:09:05.080> because<00:09:05.400> exploring<00:09:06.080> in<00:09:06.200> parameter 00:09:06.590 --> 00:09:06.600 align:start position:0% simply because exploring in parameter 00:09:06.600 --> 00:09:08.550 align:start position:0% simply because exploring in parameter space<00:09:07.000> gets<00:09:07.320> harder<00:09:07.840> as<00:09:08.040> the<00:09:08.120> number<00:09:08.440> of 00:09:08.550 --> 00:09:08.560 align:start position:0% space gets harder as the number of 00:09:08.560 --> 00:09:11.150 align:start position:0% space gets harder as the number of parameters<00:09:09.120> grows,<00:09:09.560> and<00:09:09.760> modern<00:09:10.120> LLMs<00:09:10.720> have 00:09:11.150 --> 00:09:11.160 align:start position:0% parameters grows, and modern LLMs have 00:09:11.160 --> 00:09:12.910 align:start position:0% parameters grows, and modern LLMs have billions<00:09:11.600> of<00:09:11.760> them.<00:09:12.080> Especially<00:09:12.600> how<00:09:12.800> the 00:09:12.910 --> 00:09:12.920 align:start position:0% billions of them. Especially how the 00:09:12.920 --> 00:09:15.150 align:start position:0% billions of them. Especially how the relationships<00:09:13.800> would<00:09:14.120> be<00:09:14.360> in<00:09:14.520> much<00:09:14.880> higher 00:09:15.150 --> 00:09:15.160 align:start position:0% relationships would be in much higher 00:09:15.160 --> 00:09:17.310 align:start position:0% relationships would be in much higher dimensions<00:09:15.880> to<00:09:16.040> map<00:09:16.400> them<00:09:16.640> all<00:09:16.880> out.<00:09:17.160> So, 00:09:17.310 --> 00:09:17.320 align:start position:0% dimensions to map them all out. So, 00:09:17.320 --> 00:09:19.150 align:start position:0% dimensions to map them all out. So, doing<00:09:17.640> evolution<00:09:18.080> strategy<00:09:18.440> optimization 00:09:19.150 --> 00:09:19.160 align:start position:0% doing evolution strategy optimization 00:09:19.160 --> 00:09:21.030 align:start position:0% doing evolution strategy optimization directly<00:09:19.800> looked<00:09:20.240> infeasible 00:09:21.030 --> 00:09:21.040 align:start position:0% directly looked infeasible 00:09:21.040 --> 00:09:23.190 align:start position:0% directly looked infeasible computationally,<00:09:22.040> and<00:09:22.240> most<00:09:22.560> prior<00:09:22.880> work 00:09:23.190 --> 00:09:23.200 align:start position:0% computationally, and most prior work 00:09:23.200 --> 00:09:25.070 align:start position:0% computationally, and most prior work tried<00:09:23.560> to<00:09:23.680> avoid<00:09:24.000> the<00:09:24.080> problem<00:09:24.480> by<00:09:24.600> shrinking 00:09:25.070 --> 00:09:25.080 align:start position:0% tried to avoid the problem by shrinking 00:09:25.080 --> 00:09:26.630 align:start position:0% tried to avoid the problem by shrinking the<00:09:25.160> search<00:09:25.440> space<00:09:25.880> or<00:09:26.080> reducing<00:09:26.560> the 00:09:26.630 --> 00:09:26.640 align:start position:0% the search space or reducing the 00:09:26.640 --> 00:09:28.110 align:start position:0% the search space or reducing the dimensions.<00:09:27.480> What's<00:09:27.640> even<00:09:27.880> more 00:09:28.110 --> 00:09:28.120 align:start position:0% dimensions. What's even more 00:09:28.120 --> 00:09:30.070 align:start position:0% dimensions. What's even more jaw-dropping<00:09:28.760> is<00:09:28.880> that<00:09:29.160> all<00:09:29.320> prior<00:09:29.600> works<00:09:29.920> are 00:09:30.070 --> 00:09:30.080 align:start position:0% jaw-dropping is that all prior works are 00:09:30.080 --> 00:09:32.390 align:start position:0% jaw-dropping is that all prior works are perturbing<00:09:30.640> from<00:09:30.800> a<00:09:30.880> population<00:09:31.680> with<00:09:32.000> tens 00:09:32.390 --> 00:09:32.400 align:start position:0% perturbing from a population with tens 00:09:32.400 --> 00:09:34.310 align:start position:0% perturbing from a population with tens of<00:09:32.560> thousands<00:09:33.000> of<00:09:33.120> models.<00:09:33.680> But<00:09:33.840> this<00:09:34.000> paper 00:09:34.310 --> 00:09:34.320 align:start position:0% of thousands of models. But this paper 00:09:34.320 --> 00:09:37.190 align:start position:0% of thousands of models. But this paper only<00:09:34.640> used<00:09:34.920> a<00:09:35.000> population<00:09:35.680> size<00:09:36.120> of<00:09:36.360> just<00:09:36.800> 30 00:09:37.190 --> 00:09:37.200 align:start position:0% only used a population size of just 30 00:09:37.200 --> 00:09:39.510 align:start position:0% only used a population size of just 30 models<00:09:37.720> achieve<00:09:38.080> competitive<00:09:38.800> performance. 00:09:39.510 --> 00:09:39.520 align:start position:0% models achieve competitive performance. 00:09:39.520 --> 00:09:41.270 align:start position:0% models achieve competitive performance. This<00:09:39.640> is<00:09:39.720> like<00:09:39.880> a<00:09:39.960> 300<00:09:40.560> times<00:09:40.840> compute 00:09:41.270 --> 00:09:41.280 align:start position:0% This is like a 300 times compute 00:09:41.280 --> 00:09:43.070 align:start position:0% This is like a 300 times compute reduction.<00:09:41.880> But<00:09:41.960> the<00:09:42.040> reason<00:09:42.400> why<00:09:42.560> this<00:09:42.800> works 00:09:43.070 --> 00:09:43.080 align:start position:0% reduction. But the reason why this works 00:09:43.080 --> 00:09:45.190 align:start position:0% reduction. But the reason why this works with<00:09:43.280> only<00:09:43.560> a<00:09:43.600> population<00:09:44.200> of<00:09:44.360> 30<00:09:44.760> is<00:09:44.960> that 00:09:45.190 --> 00:09:45.200 align:start position:0% with only a population of 30 is that 00:09:45.200 --> 00:09:46.830 align:start position:0% with only a population of 30 is that even<00:09:45.440> though<00:09:45.560> the<00:09:45.680> model<00:09:46.040> has<00:09:46.320> billions<00:09:46.720> of 00:09:46.830 --> 00:09:46.840 align:start position:0% even though the model has billions of 00:09:46.840 --> 00:09:48.670 align:start position:0% even though the model has billions of parameters,<00:09:47.560> the<00:09:47.720> useful<00:09:48.080> directions<00:09:48.520> for 00:09:48.670 --> 00:09:48.680 align:start position:0% parameters, the useful directions for 00:09:48.680 --> 00:09:50.350 align:start position:0% parameters, the useful directions for improvement<00:09:49.400> are<00:09:49.600> in<00:09:49.800> much<00:09:50.080> lower 00:09:50.350 --> 00:09:50.360 align:start position:0% improvement are in much lower 00:09:50.360 --> 00:09:52.070 align:start position:0% improvement are in much lower dimensions.<00:09:51.080> Think<00:09:51.280> of<00:09:51.400> it<00:09:51.520> like<00:09:51.720> this. 00:09:52.070 --> 00:09:52.080 align:start position:0% dimensions. Think of it like this. 00:09:52.080 --> 00:09:53.390 align:start position:0% dimensions. Think of it like this. Imagine<00:09:52.440> you<00:09:52.560> are<00:09:52.640> standing<00:09:52.960> on<00:09:53.040> a<00:09:53.120> huge 00:09:53.390 --> 00:09:53.400 align:start position:0% Imagine you are standing on a huge 00:09:53.400 --> 00:09:54.910 align:start position:0% Imagine you are standing on a huge mountain<00:09:53.840> with<00:09:54.000> billions<00:09:54.360> of<00:09:54.480> possible 00:09:54.910 --> 00:09:54.920 align:start position:0% mountain with billions of possible 00:09:54.920 --> 00:09:56.830 align:start position:0% mountain with billions of possible directions<00:09:55.520> you<00:09:55.640> could<00:09:55.800> step<00:09:56.160> in.<00:09:56.520> But<00:09:56.640> in 00:09:56.830 --> 00:09:56.840 align:start position:0% directions you could step in. But in 00:09:56.840 --> 00:09:58.630 align:start position:0% directions you could step in. But in reality,<00:09:57.680> only<00:09:57.920> a<00:09:57.960> small<00:09:58.240> number<00:09:58.520> of 00:09:58.630 --> 00:09:58.640 align:start position:0% reality, only a small number of 00:09:58.640 --> 00:10:01.190 align:start position:0% reality, only a small number of directions<00:09:59.280> actually<00:09:59.720> lead<00:10:00.200> uphill<00:10:00.720> as<00:10:00.920> most 00:10:01.190 --> 00:10:01.200 align:start position:0% directions actually lead uphill as most 00:10:01.200 --> 00:10:03.190 align:start position:0% directions actually lead uphill as most directions<00:10:01.760> are<00:10:01.880> either<00:10:02.120> flat<00:10:02.480> or<00:10:02.680> clearly 00:10:03.190 --> 00:10:03.200 align:start position:0% directions are either flat or clearly 00:10:03.200 --> 00:10:05.270 align:start position:0% directions are either flat or clearly downhill.<00:10:03.800> So,<00:10:03.920> if<00:10:04.080> you<00:10:04.200> randomly<00:10:04.600> try<00:10:04.840> 30 00:10:05.270 --> 00:10:05.280 align:start position:0% downhill. So, if you randomly try 30 00:10:05.280 --> 00:10:07.230 align:start position:0% downhill. So, if you randomly try 30 small<00:10:05.600> steps<00:10:06.000> in<00:10:06.160> different<00:10:06.440> directions,<00:10:07.160> a 00:10:07.230 --> 00:10:07.240 align:start position:0% small steps in different directions, a 00:10:07.240 --> 00:10:09.390 align:start position:0% small steps in different directions, a few<00:10:07.440> of<00:10:07.560> them<00:10:07.920> will<00:10:08.080> likely<00:10:08.520> tilt<00:10:08.960> slightly 00:10:09.390 --> 00:10:09.400 align:start position:0% few of them will likely tilt slightly 00:10:09.400 --> 00:10:11.430 align:start position:0% few of them will likely tilt slightly uphill.<00:10:10.000> And<00:10:10.120> when<00:10:10.280> you<00:10:10.440> average<00:10:10.840> those,<00:10:11.320> the 00:10:11.430 --> 00:10:11.440 align:start position:0% uphill. And when you average those, the 00:10:11.440 --> 00:10:13.470 align:start position:0% uphill. And when you average those, the downhill<00:10:11.880> noise<00:10:12.200> cancels<00:10:12.840> out<00:10:13.080> and<00:10:13.280> the 00:10:13.470 --> 00:10:13.480 align:start position:0% downhill noise cancels out and the 00:10:13.480 --> 00:10:15.790 align:start position:0% downhill noise cancels out and the uphill<00:10:13.840> signals<00:10:14.480> reinforces.<00:10:15.440> And<00:10:15.560> this<00:10:15.720> is 00:10:15.790 --> 00:10:15.800 align:start position:0% uphill signals reinforces. And this is 00:10:15.800 --> 00:10:17.270 align:start position:0% uphill signals reinforces. And this is thanks<00:10:16.080> to<00:10:16.200> the<00:10:16.280> special<00:10:16.680> attributes<00:10:17.120> of 00:10:17.270 --> 00:10:17.280 align:start position:0% thanks to the special attributes of 00:10:17.280 --> 00:10:19.230 align:start position:0% thanks to the special attributes of extremely<00:10:17.840> large<00:10:18.160> neural<00:10:18.440> networks<00:10:18.960> because 00:10:19.230 --> 00:10:19.240 align:start position:0% extremely large neural networks because 00:10:19.240 --> 00:10:21.230 align:start position:0% extremely large neural networks because one,<00:10:19.680> they<00:10:19.840> behave<00:10:20.200> more<00:10:20.400> smoothly<00:10:20.960> than 00:10:21.230 --> 00:10:21.240 align:start position:0% one, they behave more smoothly than 00:10:21.240 --> 00:10:22.910 align:start position:0% one, they behave more smoothly than people<00:10:21.520> expect.<00:10:21.960> So,<00:10:22.080> when<00:10:22.240> you<00:10:22.360> are<00:10:22.600> only 00:10:22.910 --> 00:10:22.920 align:start position:0% people expect. So, when you are only 00:10:22.920 --> 00:10:24.790 align:start position:0% people expect. So, when you are only adding<00:10:23.280> a<00:10:23.320> very<00:10:23.600> small<00:10:23.880> Gaussian<00:10:24.360> noise, 00:10:24.790 --> 00:10:24.800 align:start position:0% adding a very small Gaussian noise, 00:10:24.800 --> 00:10:26.630 align:start position:0% adding a very small Gaussian noise, you're<00:10:25.040> actually<00:10:25.320> not<00:10:25.720> jumping<00:10:26.080> around,<00:10:26.480> but 00:10:26.630 --> 00:10:26.640 align:start position:0% you're actually not jumping around, but 00:10:26.640 --> 00:10:28.790 align:start position:0% you're actually not jumping around, but are<00:10:26.760> basically<00:10:27.240> sampling<00:10:27.720> in<00:10:27.840> a<00:10:27.920> local<00:10:28.400> region 00:10:28.790 --> 00:10:28.800 align:start position:0% are basically sampling in a local region 00:10:28.800 --> 00:10:30.390 align:start position:0% are basically sampling in a local region defined<00:10:29.160> by<00:10:29.280> the<00:10:29.400> Gaussian<00:10:29.840> noise,<00:10:30.160> which 00:10:30.390 --> 00:10:30.400 align:start position:0% defined by the Gaussian noise, which 00:10:30.400 --> 00:10:32.110 align:start position:0% defined by the Gaussian noise, which maps<00:10:30.760> out<00:10:30.920> the<00:10:31.000> surroundings.<00:10:31.720> Therefore, 00:10:32.110 --> 00:10:32.120 align:start position:0% maps out the surroundings. Therefore, 00:10:32.120 --> 00:10:33.750 align:start position:0% maps out the surroundings. Therefore, you<00:10:32.240> can<00:10:32.440> find<00:10:32.640> the<00:10:32.720> uphill<00:10:33.000> directions<00:10:33.520> very 00:10:33.750 --> 00:10:33.760 align:start position:0% you can find the uphill directions very 00:10:33.760 --> 00:10:35.870 align:start position:0% you can find the uphill directions very easily.<00:10:34.160> And<00:10:34.320> second,<00:10:34.880> the<00:10:35.000> reward<00:10:35.360> signal<00:10:35.720> in 00:10:35.870 --> 00:10:35.880 align:start position:0% easily. And second, the reward signal in 00:10:35.880 --> 00:10:38.190 align:start position:0% easily. And second, the reward signal in RL-style<00:10:36.480> fine-tuning<00:10:37.080> is<00:10:37.240> very<00:10:37.520> coarse.<00:10:38.040> You 00:10:38.190 --> 00:10:38.200 align:start position:0% RL-style fine-tuning is very coarse. You 00:10:38.200 --> 00:10:40.190 align:start position:0% RL-style fine-tuning is very coarse. You are<00:10:38.360> not<00:10:38.640> trying<00:10:39.000> to<00:10:39.120> fine-tune<00:10:39.560> every<00:10:39.840> token 00:10:40.190 --> 00:10:40.200 align:start position:0% are not trying to fine-tune every token 00:10:40.200 --> 00:10:42.430 align:start position:0% are not trying to fine-tune every token perfectly.<00:10:41.120> What<00:10:41.320> you<00:10:41.440> are<00:10:41.560> doing<00:10:41.840> instead<00:10:42.240> is 00:10:42.430 --> 00:10:42.440 align:start position:0% perfectly. What you are doing instead is 00:10:42.440 --> 00:10:44.310 align:start position:0% perfectly. What you are doing instead is trying<00:10:42.760> to<00:10:42.880> move<00:10:43.120> the<00:10:43.240> model<00:10:43.560> in<00:10:43.720> a<00:10:43.760> direction 00:10:44.310 --> 00:10:44.320 align:start position:0% trying to move the model in a direction 00:10:44.320 --> 00:10:46.830 align:start position:0% trying to move the model in a direction that<00:10:44.520> increases<00:10:45.280> overall<00:10:45.800> outcome<00:10:46.200> quality. 00:10:46.830 --> 00:10:46.840 align:start position:0% that increases overall outcome quality. 00:10:46.840 --> 00:10:48.990 align:start position:0% that increases overall outcome quality. So,<00:10:47.000> that<00:10:47.200> global<00:10:47.520> signal<00:10:47.960> is<00:10:48.200> often<00:10:48.600> aligned 00:10:48.990 --> 00:10:49.000 align:start position:0% So, that global signal is often aligned 00:10:49.000 --> 00:10:50.990 align:start position:0% So, that global signal is often aligned across<00:10:49.400> many<00:10:49.640> parameters,<00:10:50.360> which<00:10:50.600> means<00:10:50.880> when 00:10:50.990 --> 00:10:51.000 align:start position:0% across many parameters, which means when 00:10:51.000 --> 00:10:53.110 align:start position:0% across many parameters, which means when a<00:10:51.040> perturbation<00:10:51.640> improves<00:10:52.160> performance,<00:10:52.920> it 00:10:53.110 --> 00:10:53.120 align:start position:0% a perturbation improves performance, it 00:10:53.120 --> 00:10:55.350 align:start position:0% a perturbation improves performance, it tends<00:10:53.400> to<00:10:53.520> do<00:10:53.720> so<00:10:53.960> in<00:10:54.080> a<00:10:54.160> coordinated<00:10:55.080> way.<00:10:55.160> So, 00:10:55.350 --> 00:10:55.360 align:start position:0% tends to do so in a coordinated way. So, 00:10:55.360 --> 00:10:57.430 align:start position:0% tends to do so in a coordinated way. So, the<00:10:55.480> signal<00:10:55.840> shows<00:10:56.160> up<00:10:56.360> clearly<00:10:56.840> even<00:10:57.200> with<00:10:57.400> a 00:10:57.430 --> 00:10:57.440 align:start position:0% the signal shows up clearly even with a 00:10:57.440 --> 00:10:59.550 align:start position:0% the signal shows up clearly even with a small<00:10:57.720> population.<00:10:58.400> To<00:10:58.520> sum<00:10:58.800> that<00:10:59.080> up,<00:10:59.400> the 00:10:59.550 --> 00:10:59.560 align:start position:0% small population. To sum that up, the 00:10:59.560 --> 00:11:02.110 align:start position:0% small population. To sum that up, the key<00:10:59.880> idea<00:11:00.440> is<00:11:00.960> you<00:11:01.120> don't<00:11:01.400> need<00:11:01.640> to<00:11:01.760> explore 00:11:02.110 --> 00:11:02.120 align:start position:0% key idea is you don't need to explore 00:11:02.120 --> 00:11:04.270 align:start position:0% key idea is you don't need to explore the<00:11:02.320> entire<00:11:02.920> billion-dimensional<00:11:03.839> space. 00:11:04.270 --> 00:11:04.280 align:start position:0% the entire billion-dimensional space. 00:11:04.280 --> 00:11:06.070 align:start position:0% the entire billion-dimensional space. You<00:11:04.440> only<00:11:04.680> need<00:11:04.960> enough<00:11:05.200> random<00:11:05.480> directions 00:11:06.070 --> 00:11:06.080 align:start position:0% You only need enough random directions 00:11:06.080 --> 00:11:08.230 align:start position:0% You only need enough random directions to<00:11:06.240> estimate<00:11:06.680> the<00:11:06.800> local<00:11:07.320> uphill<00:11:07.680> direction, 00:11:08.230 --> 00:11:08.240 align:start position:0% to estimate the local uphill direction, 00:11:08.240 --> 00:11:10.070 align:start position:0% to estimate the local uphill direction, which<00:11:08.440> makes<00:11:08.720> evolution<00:11:09.120> strategies<00:11:09.680> a<00:11:09.760> lot 00:11:10.070 --> 00:11:10.080 align:start position:0% which makes evolution strategies a lot 00:11:10.080 --> 00:11:11.470 align:start position:0% which makes evolution strategies a lot more<00:11:10.240> feasible<00:11:10.800> as<00:11:11.000> it<00:11:11.120> is<00:11:11.280> now 00:11:11.470 --> 00:11:11.480 align:start position:0% more feasible as it is now 00:11:11.480 --> 00:11:13.710 align:start position:0% more feasible as it is now memory-efficient<00:11:12.360> and<00:11:12.520> can<00:11:12.720> be<00:11:12.880> parallelized 00:11:13.710 --> 00:11:13.720 align:start position:0% memory-efficient and can be parallelized 00:11:13.720 --> 00:11:15.990 align:start position:0% memory-efficient and can be parallelized across<00:11:14.120> GPUs<00:11:14.839> while<00:11:15.080> still<00:11:15.360> only<00:11:15.600> require 00:11:15.990 --> 00:11:16.000 align:start position:0% across GPUs while still only require 00:11:16.000 --> 00:11:17.590 align:start position:0% across GPUs while still only require inference<00:11:16.600> as<00:11:16.760> it<00:11:16.880> does<00:11:17.080> not<00:11:17.280> require 00:11:17.590 --> 00:11:17.600 align:start position:0% inference as it does not require 00:11:17.600 --> 00:11:19.750 align:start position:0% inference as it does not require back-propagation.<00:11:18.720> Crazy,<00:11:19.200> right?<00:11:19.600> But, 00:11:19.750 --> 00:11:19.760 align:start position:0% back-propagation. Crazy, right? But, 00:11:19.760 --> 00:11:21.030 align:start position:0% back-propagation. Crazy, right? But, even<00:11:19.960> though<00:11:20.080> this<00:11:20.360> makes<00:11:20.600> evolution 00:11:21.030 --> 00:11:21.040 align:start position:0% even though this makes evolution 00:11:21.040 --> 00:11:22.990 align:start position:0% even though this makes evolution strategies<00:11:21.600> statistically<00:11:22.400> feasible<00:11:22.839> with<00:11:22.960> a 00:11:22.990 --> 00:11:23.000 align:start position:0% strategies statistically feasible with a 00:11:23.000 --> 00:11:25.390 align:start position:0% strategies statistically feasible with a population<00:11:23.560> of<00:11:23.680> 30,<00:11:24.200> there<00:11:24.440> is<00:11:24.600> still 00:11:25.390 --> 00:11:25.400 align:start position:0% population of 30, there is still 00:11:25.400 --> 00:11:27.110 align:start position:0% population of 30, there is still practical<00:11:25.839> problem.<00:11:26.280> The<00:11:26.400> method<00:11:26.760> so<00:11:26.920> far 00:11:27.110 --> 00:11:27.120 align:start position:0% practical problem. The method so far 00:11:27.120 --> 00:11:29.510 align:start position:0% practical problem. The method so far still<00:11:27.400> requires<00:11:27.880> you<00:11:28.000> to<00:11:28.240> run<00:11:28.560> 30<00:11:29.120> full 00:11:29.510 --> 00:11:29.520 align:start position:0% still requires you to run 30 full 00:11:29.520 --> 00:11:31.430 align:start position:0% still requires you to run 30 full forward<00:11:29.880> passes<00:11:30.360> of<00:11:30.480> a<00:11:30.560> billion<00:11:30.960> parameter 00:11:31.430 --> 00:11:31.440 align:start position:0% forward passes of a billion parameter 00:11:31.440 --> 00:11:33.550 align:start position:0% forward passes of a billion parameter model<00:11:31.800> for<00:11:32.160> every<00:11:32.480> update.<00:11:32.960> And<00:11:33.080> not<00:11:33.240> just 00:11:33.550 --> 00:11:33.560 align:start position:0% model for every update. And not just 00:11:33.560 --> 00:11:35.750 align:start position:0% model for every update. And not just once,<00:11:34.000> you<00:11:34.200> need<00:11:34.400> to<00:11:34.480> do<00:11:34.640> this<00:11:34.880> over<00:11:35.240> and<00:11:35.440> over 00:11:35.750 --> 00:11:35.760 align:start position:0% once, you need to do this over and over 00:11:35.760 --> 00:11:37.790 align:start position:0% once, you need to do this over and over again<00:11:36.160> for<00:11:36.360> however<00:11:36.800> many<00:11:37.120> iterations<00:11:37.680> you 00:11:37.790 --> 00:11:37.800 align:start position:0% again for however many iterations you 00:11:37.800 --> 00:11:40.230 align:start position:0% again for however many iterations you set.<00:11:38.240> At<00:11:38.360> the<00:11:38.560> LM<00:11:38.880> scale,<00:11:39.440> doing<00:11:39.760> this<00:11:39.960> much 00:11:40.230 --> 00:11:40.240 align:start position:0% set. At the LM scale, doing this much 00:11:40.240 --> 00:11:42.590 align:start position:0% set. At the LM scale, doing this much forward<00:11:40.560> passes<00:11:41.240> is<00:11:41.480> extremely<00:11:42.000> expensive 00:11:42.590 --> 00:11:42.600 align:start position:0% forward passes is extremely expensive 00:11:42.600 --> 00:11:44.190 align:start position:0% forward passes is extremely expensive because<00:11:42.880> compared<00:11:43.400> to<00:11:43.560> standard<00:11:43.880> gradient 00:11:44.190 --> 00:11:44.200 align:start position:0% because compared to standard gradient 00:11:44.200 --> 00:11:46.150 align:start position:0% because compared to standard gradient training,<00:11:44.680> which<00:11:44.880> does<00:11:45.120> one<00:11:45.440> forward<00:11:45.680> and<00:11:45.880> one 00:11:46.150 --> 00:11:46.160 align:start position:0% training, which does one forward and one 00:11:46.160 --> 00:11:48.230 align:start position:0% training, which does one forward and one backward<00:11:46.600> pass,<00:11:47.080> this<00:11:47.280> can<00:11:47.440> be<00:11:47.600> slower<00:11:48.040> or 00:11:48.230 --> 00:11:48.240 align:start position:0% backward pass, this can be slower or 00:11:48.240 --> 00:11:50.310 align:start position:0% backward pass, this can be slower or more<00:11:48.520> costly<00:11:49.040> depending<00:11:49.440> on<00:11:49.520> the<00:11:49.600> setup.<00:11:50.160> So, 00:11:50.310 --> 00:11:50.320 align:start position:0% more costly depending on the setup. So, 00:11:50.320 --> 00:11:52.150 align:start position:0% more costly depending on the setup. So, this<00:11:50.480> is<00:11:50.560> where<00:11:50.680> the<00:11:50.800> next<00:11:51.040> paper,<00:11:51.520> Agro, 00:11:52.150 --> 00:11:52.160 align:start position:0% this is where the next paper, Agro, 00:11:52.160 --> 00:11:53.870 align:start position:0% this is where the next paper, Agro, short<00:11:52.520> for<00:11:52.760> evolution<00:11:53.120> strategies<00:11:53.720> at 00:11:53.870 --> 00:11:53.880 align:start position:0% short for evolution strategies at 00:11:53.880 --> 00:11:56.550 align:start position:0% short for evolution strategies at hyperscale,<00:11:54.640> published<00:11:55.120> in<00:11:55.240> November<00:11:55.760> 2025 00:11:56.550 --> 00:11:56.560 align:start position:0% hyperscale, published in November 2025 00:11:56.560 --> 00:11:58.950 align:start position:0% hyperscale, published in November 2025 comes<00:11:56.880> in.<00:11:57.240> Agro<00:11:57.720> addresses<00:11:58.280> the<00:11:58.400> systems 00:11:58.950 --> 00:11:58.960 align:start position:0% comes in. Agro addresses the systems 00:11:58.960 --> 00:12:01.030 align:start position:0% comes in. Agro addresses the systems bottleneck<00:11:59.520> of<00:11:59.839> evolution<00:12:00.320> strategies.<00:12:00.920> The 00:12:01.030 --> 00:12:01.040 align:start position:0% bottleneck of evolution strategies. The 00:12:01.040 --> 00:12:02.590 align:start position:0% bottleneck of evolution strategies. The core<00:12:01.320> idea<00:12:01.640> is<00:12:01.760> simple.<00:12:02.200> Instead<00:12:02.480> of 00:12:02.590 --> 00:12:02.600 align:start position:0% core idea is simple. Instead of 00:12:02.600 --> 00:12:04.550 align:start position:0% core idea is simple. Instead of perturbing<00:12:03.160> the<00:12:03.320> entire<00:12:03.720> weight<00:12:03.960> matrix<00:12:04.400> in<00:12:04.520> a 00:12:04.550 --> 00:12:04.560 align:start position:0% perturbing the entire weight matrix in a 00:12:04.560 --> 00:12:06.430 align:start position:0% perturbing the entire weight matrix in a full<00:12:04.920> random<00:12:05.280> way,<00:12:05.640> they<00:12:05.880> structure<00:12:06.320> the 00:12:06.430 --> 00:12:06.440 align:start position:0% full random way, they structure the 00:12:06.440 --> 00:12:08.950 align:start position:0% full random way, they structure the perturbations<00:12:07.320> as<00:12:07.640> LoRa<00:12:08.040> updates.<00:12:08.640> So,<00:12:08.760> by 00:12:08.950 --> 00:12:08.960 align:start position:0% perturbations as LoRa updates. So, by 00:12:08.960 --> 00:12:11.350 align:start position:0% perturbations as LoRa updates. So, by making<00:12:09.400> perturbations<00:12:10.120> low<00:12:10.400> rank,<00:12:11.000> you<00:12:11.160> can 00:12:11.350 --> 00:12:11.360 align:start position:0% making perturbations low rank, you can 00:12:11.360 --> 00:12:13.510 align:start position:0% making perturbations low rank, you can bash<00:12:11.680> them<00:12:11.920> like<00:12:12.160> LoRa<00:12:12.560> adapters.<00:12:13.280> You 00:12:13.510 --> 00:12:13.520 align:start position:0% bash them like LoRa adapters. You 00:12:13.520 --> 00:12:15.230 align:start position:0% bash them like LoRa adapters. You basically<00:12:13.920> reuse<00:12:14.400> most<00:12:14.600> of<00:12:14.680> the<00:12:14.800> original 00:12:15.230 --> 00:12:15.240 align:start position:0% basically reuse most of the original 00:12:15.240 --> 00:12:17.390 align:start position:0% basically reuse most of the original computation<00:12:15.960> and<00:12:16.200> only<00:12:16.480> swap<00:12:16.760> the<00:12:16.880> LoRa<00:12:17.240> to 00:12:17.390 --> 00:12:17.400 align:start position:0% computation and only swap the LoRa to 00:12:17.400 --> 00:12:19.390 align:start position:0% computation and only swap the LoRa to evaluate<00:12:18.040> the<00:12:18.120> perturbations,<00:12:18.960> which<00:12:19.160> means 00:12:19.390 --> 00:12:19.400 align:start position:0% evaluate the perturbations, which means 00:12:19.400 --> 00:12:21.910 align:start position:0% evaluate the perturbations, which means instead<00:12:19.800> of<00:12:19.920> paying<00:12:20.360> the<00:12:20.480> full<00:12:20.800> cost<00:12:21.240> of<00:12:21.480> 30 00:12:21.910 --> 00:12:21.920 align:start position:0% instead of paying the full cost of 30 00:12:21.920 --> 00:12:24.070 align:start position:0% instead of paying the full cost of 30 completely<00:12:22.480> separate<00:12:22.920> forward<00:12:23.240> passes,<00:12:23.920> you 00:12:24.070 --> 00:12:24.080 align:start position:0% completely separate forward passes, you 00:12:24.080 --> 00:12:26.230 align:start position:0% completely separate forward passes, you can<00:12:24.240> compute<00:12:24.600> just<00:12:25.000> one<00:12:25.200> forward<00:12:25.520> pass<00:12:26.080> and 00:12:26.230 --> 00:12:26.240 align:start position:0% can compute just one forward pass and 00:12:26.240 --> 00:12:28.750 align:start position:0% can compute just one forward pass and swap<00:12:26.560> in<00:12:26.680> different<00:12:26.960> LoRas.<00:12:27.480> So,<00:12:27.880> Agro<00:12:28.360> makes 00:12:28.750 --> 00:12:28.760 align:start position:0% swap in different LoRas. So, Agro makes 00:12:28.760 --> 00:12:30.710 align:start position:0% swap in different LoRas. So, Agro makes evolution<00:12:29.240> strategies<00:12:29.839> a<00:12:29.960> lot<00:12:30.480> more 00:12:30.710 --> 00:12:30.720 align:start position:0% evolution strategies a lot more 00:12:30.720 --> 00:12:32.390 align:start position:0% evolution strategies a lot more hardware-friendly.<00:12:31.680> Another<00:12:32.000> important 00:12:32.390 --> 00:12:32.400 align:start position:0% hardware-friendly. Another important 00:12:32.400 --> 00:12:33.790 align:start position:0% hardware-friendly. Another important thing<00:12:32.600> to<00:12:32.680> note<00:12:32.880> that<00:12:33.160> even<00:12:33.360> though<00:12:33.560> each 00:12:33.790 --> 00:12:33.800 align:start position:0% thing to note that even though each 00:12:33.800 --> 00:12:35.630 align:start position:0% thing to note that even though each perturbation<00:12:34.280> is<00:12:34.440> low<00:12:34.640> rank,<00:12:35.160> when<00:12:35.400> you 00:12:35.630 --> 00:12:35.640 align:start position:0% perturbation is low rank, when you 00:12:35.640 --> 00:12:37.630 align:start position:0% perturbation is low rank, when you average<00:12:36.080> many<00:12:36.320> of<00:12:36.440> them<00:12:36.560> together,<00:12:37.160> the<00:12:37.320> final 00:12:37.630 --> 00:12:37.640 align:start position:0% average many of them together, the final 00:12:37.640 --> 00:12:39.750 align:start position:0% average many of them together, the final update<00:12:38.160> is<00:12:38.320> not<00:12:38.520> actually<00:12:38.800> restricted<00:12:39.400> to<00:12:39.520> low 00:12:39.750 --> 00:12:39.760 align:start position:0% update is not actually restricted to low 00:12:39.760 --> 00:12:41.590 align:start position:0% update is not actually restricted to low rank.<00:12:40.160> So,<00:12:40.440> you<00:12:40.600> still<00:12:40.839> get<00:12:41.000> a<00:12:41.080> rich<00:12:41.400> and 00:12:41.590 --> 00:12:41.600 align:start position:0% rank. So, you still get a rich and 00:12:41.600 --> 00:12:43.590 align:start position:0% rank. So, you still get a rich and high-dimensional<00:12:42.440> update,<00:12:42.839> but<00:12:43.000> you<00:12:43.200> compute 00:12:43.590 --> 00:12:43.600 align:start position:0% high-dimensional update, but you compute 00:12:43.600 --> 00:12:45.470 align:start position:0% high-dimensional update, but you compute it<00:12:43.800> in<00:12:44.000> a<00:12:44.080> much<00:12:44.480> cheaper<00:12:44.880> way.<00:12:45.280> And<00:12:45.400> the 00:12:45.470 --> 00:12:45.480 align:start position:0% it in a much cheaper way. And the 00:12:45.480 --> 00:12:47.030 align:start position:0% it in a much cheaper way. And the performance<00:12:46.040> is<00:12:46.240> broadly<00:12:46.640> similar<00:12:46.920> to 00:12:47.030 --> 00:12:47.040 align:start position:0% performance is broadly similar to 00:12:47.040 --> 00:12:48.670 align:start position:0% performance is broadly similar to standard<00:12:47.480> evolution<00:12:47.839> strategies,<00:12:48.440> but<00:12:48.560> the 00:12:48.670 --> 00:12:48.680 align:start position:0% standard evolution strategies, but the 00:12:48.680 --> 00:12:51.710 align:start position:0% standard evolution strategies, but the compute<00:12:49.080> cost<00:12:49.520> is<00:12:49.800> reduced<00:12:50.400> by<00:12:50.600> so<00:12:51.080> much.<00:12:51.600> So, 00:12:51.710 --> 00:12:51.720 align:start position:0% compute cost is reduced by so much. So, 00:12:51.720 --> 00:12:53.710 align:start position:0% compute cost is reduced by so much. So, with<00:12:51.880> how<00:12:52.120> Agro<00:12:52.480> is<00:12:52.640> making<00:12:52.920> models<00:12:53.280> only<00:12:53.520> need 00:12:53.710 --> 00:12:53.720 align:start position:0% with how Agro is making models only need 00:12:53.720 --> 00:12:55.990 align:start position:0% with how Agro is making models only need to<00:12:53.839> run<00:12:54.080> with<00:12:54.320> inference<00:12:54.760> mode<00:12:55.400> while<00:12:55.680> keeping 00:12:55.990 --> 00:12:56.000 align:start position:0% to run with inference mode while keeping 00:12:56.000 --> 00:12:57.790 align:start position:0% to run with inference mode while keeping performance<00:12:56.480> roughly<00:12:56.880> on<00:12:57.080> par<00:12:57.320> with<00:12:57.400> the<00:12:57.480> best 00:12:57.790 --> 00:12:57.800 align:start position:0% performance roughly on par with the best 00:12:57.800 --> 00:12:59.790 align:start position:0% performance roughly on par with the best evolution<00:12:58.280> strategies<00:12:58.839> baselines,<00:12:59.520> when<00:12:59.680> you 00:12:59.790 --> 00:12:59.800 align:start position:0% evolution strategies baselines, when you 00:12:59.800 --> 00:13:02.430 align:start position:0% evolution strategies baselines, when you compare<00:13:00.240> them<00:13:00.520> on<00:13:00.839> raw<00:13:01.040> training<00:13:01.400> speed,<00:13:02.000> Agro 00:13:02.430 --> 00:13:02.440 align:start position:0% compare them on raw training speed, Agro 00:13:02.440 --> 00:13:05.910 align:start position:0% compare them on raw training speed, Agro is<00:13:02.760> at<00:13:02.920> around<00:13:03.280> 91,<00:13:04.000> PPO<00:13:04.520> is<00:13:04.760> at<00:13:04.920> 34,<00:13:05.680> and 00:13:05.910 --> 00:13:05.920 align:start position:0% is at around 91, PPO is at 34, and 00:13:05.920 --> 00:13:09.870 align:start position:0% is at around 91, PPO is at 34, and OpenES<00:13:06.600> is<00:13:06.880> at<00:13:07.080> 0.41.<00:13:08.400> PPO<00:13:08.880> is<00:13:09.120> not<00:13:09.400> slow<00:13:09.720> in 00:13:09.870 --> 00:13:09.880 align:start position:0% OpenES is at 0.41. PPO is not slow in 00:13:09.880 --> 00:13:11.310 align:start position:0% OpenES is at 0.41. PPO is not slow in general,<00:13:10.320> but<00:13:10.520> it's<00:13:10.680> that<00:13:10.920> evolution 00:13:11.310 --> 00:13:11.320 align:start position:0% general, but it's that evolution 00:13:11.320 --> 00:13:13.829 align:start position:0% general, but it's that evolution strategy<00:13:11.760> training<00:13:12.160> can<00:13:12.520> be<00:13:12.760> extremely<00:13:13.360> fast 00:13:13.829 --> 00:13:13.839 align:start position:0% strategy training can be extremely fast 00:13:13.839 --> 00:13:15.430 align:start position:0% strategy training can be extremely fast once<00:13:14.120> you<00:13:14.280> structure<00:13:14.680> perturbations<00:13:15.320> to 00:13:15.430 --> 00:13:15.440 align:start position:0% once you structure perturbations to 00:13:15.440 --> 00:13:17.790 align:start position:0% once you structure perturbations to match<00:13:15.680> GPU<00:13:16.080> MatMul<00:13:16.480> hardware.<00:13:17.040> In<00:13:17.200> some<00:13:17.440> LM 00:13:17.790 --> 00:13:17.800 align:start position:0% match GPU MatMul hardware. In some LM 00:13:17.800 --> 00:13:19.750 align:start position:0% match GPU MatMul hardware. In some LM settings,<00:13:18.320> it<00:13:18.520> also<00:13:18.839> beats<00:13:19.240> popular 00:13:19.750 --> 00:13:19.760 align:start position:0% settings, it also beats popular 00:13:19.760 --> 00:13:21.390 align:start position:0% settings, it also beats popular reinforcement<00:13:20.480> learning<00:13:20.839> fine-tuning 00:13:21.390 --> 00:13:21.400 align:start position:0% reinforcement learning fine-tuning 00:13:21.400 --> 00:13:23.630 align:start position:0% reinforcement learning fine-tuning methods.<00:13:21.920> For<00:13:22.080> instance,<00:13:22.640> on<00:13:22.839> LM<00:13:23.280> reasoning 00:13:23.630 --> 00:13:23.640 align:start position:0% methods. For instance, on LM reasoning 00:13:23.640 --> 00:13:26.150 align:start position:0% methods. For instance, on LM reasoning fine-tuning<00:13:24.200> comparisons<00:13:25.000> against<00:13:25.280> GRPO, 00:13:26.150 --> 00:13:26.160 align:start position:0% fine-tuning comparisons against GRPO, 00:13:26.160 --> 00:13:28.630 align:start position:0% fine-tuning comparisons against GRPO, they<00:13:26.360> fine-tuned<00:13:26.960> RWKV-7<00:13:28.040> models<00:13:28.440> on 00:13:28.630 --> 00:13:28.640 align:start position:0% they fine-tuned RWKV-7 models on 00:13:28.640 --> 00:13:31.110 align:start position:0% they fine-tuned RWKV-7 models on countdown<00:13:29.240> and<00:13:29.400> GSM8K<00:13:30.280> and<00:13:30.400> report<00:13:30.839> that 00:13:31.110 --> 00:13:31.120 align:start position:0% countdown and GSM8K and report that 00:13:31.120 --> 00:13:33.110 align:start position:0% countdown and GSM8K and report that under<00:13:31.400> the<00:13:31.520> same<00:13:31.760> hardware<00:13:32.280> and<00:13:32.520> wall<00:13:32.760> clock 00:13:33.110 --> 00:13:33.120 align:start position:0% under the same hardware and wall clock 00:13:33.120 --> 00:13:35.790 align:start position:0% under the same hardware and wall clock time,<00:13:33.640> Agro<00:13:34.000> reaches<00:13:34.400> 35%<00:13:35.240> validation 00:13:35.790 --> 00:13:35.800 align:start position:0% time, Agro reaches 35% validation 00:13:35.800 --> 00:13:38.790 align:start position:0% time, Agro reaches 35% validation accuracy<00:13:36.360> versus<00:13:36.800> 23%<00:13:37.680> for<00:13:37.839> GRPO<00:13:38.560> on<00:13:38.720> the 00:13:38.790 --> 00:13:38.800 align:start position:0% accuracy versus 23% for GRPO on the 00:13:38.800 --> 00:13:40.630 align:start position:0% accuracy versus 23% for GRPO on the countdown<00:13:39.240> benchmark.<00:13:39.920> For<00:13:40.000> the<00:13:40.120> benchmark 00:13:40.630 --> 00:13:40.640 align:start position:0% countdown benchmark. For the benchmark 00:13:40.640 --> 00:13:45.070 align:start position:0% countdown benchmark. For the benchmark GSM8K<00:13:41.560> with<00:13:41.800> RWKV-7<00:13:42.960> 7B<00:13:43.600> on<00:13:43.880> eight<00:13:44.040> GPUs,<00:13:44.920> they 00:13:45.070 --> 00:13:45.080 align:start position:0% GSM8K with RWKV-7 7B on eight GPUs, they 00:13:45.080 --> 00:13:48.510 align:start position:0% GSM8K with RWKV-7 7B on eight GPUs, they show<00:13:45.320> that<00:13:45.600> Agro<00:13:46.080> can<00:13:46.480> run<00:13:46.839> 8,192 00:13:48.510 --> 00:13:48.520 align:start position:0% show that Agro can run 8,192 00:13:48.520 --> 00:13:51.150 align:start position:0% show that Agro can run 8,192 parallel<00:13:49.000> generations<00:13:49.880> while<00:13:50.240> GRPO<00:13:50.839> runs 00:13:51.150 --> 00:13:51.160 align:start position:0% parallel generations while GRPO runs 00:13:51.160 --> 00:13:54.310 align:start position:0% parallel generations while GRPO runs only<00:13:51.520> 256.<00:13:52.560> So,<00:13:52.839> Agro<00:13:53.240> can<00:13:53.520> run<00:13:53.800> far<00:13:54.120> more 00:13:54.310 --> 00:13:54.320 align:start position:0% only 256. So, Agro can run far more 00:13:54.320 --> 00:13:56.790 align:start position:0% only 256. So, Agro can run far more parallel<00:13:54.760> generations<00:13:55.520> than<00:13:55.560> GRPO<00:13:56.400> under<00:13:56.680> the 00:13:56.790 --> 00:13:56.800 align:start position:0% parallel generations than GRPO under the 00:13:56.800 --> 00:13:59.070 align:start position:0% parallel generations than GRPO under the same<00:13:57.040> hardware,<00:13:57.560> which<00:13:57.760> means<00:13:58.120> Agro<00:13:58.640> is<00:13:58.880> more 00:13:59.070 --> 00:13:59.080 align:start position:0% same hardware, which means Agro is more 00:13:59.080 --> 00:14:01.310 align:start position:0% same hardware, which means Agro is more efficient<00:13:59.720> in<00:14:00.000> wall<00:14:00.280> clock<00:14:00.640> throughput<00:14:01.160> and 00:14:01.310 --> 00:14:01.320 align:start position:0% efficient in wall clock throughput and 00:14:01.320 --> 00:14:03.190 align:start position:0% efficient in wall clock throughput and memory.<00:14:01.800> Another<00:14:02.080> example<00:14:02.560> is<00:14:02.720> like<00:14:02.960> this 00:14:03.190 --> 00:14:03.200 align:start position:0% memory. Another example is like this 00:14:03.200 --> 00:14:06.310 align:start position:0% memory. Another example is like this RWKV-7<00:14:04.320> 14B<00:14:04.920> trained<00:14:05.240> for<00:14:05.400> 12<00:14:05.760> hours<00:14:06.120> with 00:14:06.310 --> 00:14:06.320 align:start position:0% RWKV-7 14B trained for 12 hours with 00:14:06.320 --> 00:14:09.150 align:start position:0% RWKV-7 14B trained for 12 hours with Agro<00:14:06.800> on<00:14:07.000> 32<00:14:07.440> GPUs.<00:14:08.280> They<00:14:08.480> were<00:14:08.720> able<00:14:08.920> to<00:14:09.000> get 00:14:09.150 --> 00:14:09.160 align:start position:0% Agro on 32 GPUs. They were able to get 00:14:09.160 --> 00:14:12.550 align:start position:0% Agro on 32 GPUs. They were able to get improvements<00:14:09.800> such<00:14:10.000> as<00:14:10.160> plus<00:14:10.400> 17%<00:14:11.440> on<00:14:11.680> AM<00:14:11.920> 24 00:14:12.550 --> 00:14:12.560 align:start position:0% improvements such as plus 17% on AM 24 00:14:12.560 --> 00:14:15.949 align:start position:0% improvements such as plus 17% on AM 24 and<00:14:12.800> plus<00:14:13.079> 26<00:14:13.680> on<00:14:13.959> AM<00:14:14.280> 25.<00:14:15.000> They<00:14:15.240> also<00:14:15.520> reported 00:14:15.949 --> 00:14:15.959 align:start position:0% and plus 26 on AM 25. They also reported 00:14:15.959 --> 00:14:19.550 align:start position:0% and plus 26 on AM 25. They also reported that<00:14:16.240> Agro<00:14:16.880> outperforms<00:14:17.520> GRPO<00:14:18.440> on<00:14:18.680> GSM8K 00:14:19.550 --> 00:14:19.560 align:start position:0% that Agro outperforms GRPO on GSM8K 00:14:19.560 --> 00:14:20.990 align:start position:0% that Agro outperforms GRPO on GSM8K fine-tuning.<00:14:20.280> While<00:14:20.520> I<00:14:20.560> know<00:14:20.720> I<00:14:20.800> just 00:14:20.990 --> 00:14:21.000 align:start position:0% fine-tuning. While I know I just 00:14:21.000 --> 00:14:22.550 align:start position:0% fine-tuning. While I know I just bombarded<00:14:21.600> you<00:14:21.720> with<00:14:21.880> a<00:14:21.920> lot<00:14:22.200> of<00:14:22.320> great 00:14:22.550 --> 00:14:22.560 align:start position:0% bombarded you with a lot of great 00:14:22.560 --> 00:14:24.190 align:start position:0% bombarded you with a lot of great performance<00:14:23.079> reports,<00:14:23.720> it<00:14:23.880> does<00:14:24.079> not 00:14:24.190 --> 00:14:24.200 align:start position:0% performance reports, it does not 00:14:24.200 --> 00:14:26.310 align:start position:0% performance reports, it does not necessarily<00:14:24.839> mean<00:14:25.240> Agro<00:14:25.760> is<00:14:25.959> just<00:14:26.120> better 00:14:26.310 --> 00:14:26.320 align:start position:0% necessarily mean Agro is just better 00:14:26.320 --> 00:14:28.190 align:start position:0% necessarily mean Agro is just better than<00:14:26.440> GRPO.<00:14:27.079> It's<00:14:27.240> just<00:14:27.440> that<00:14:27.680> Agro's 00:14:28.190 --> 00:14:28.200 align:start position:0% than GRPO. It's just that Agro's 00:14:28.200 --> 00:14:30.670 align:start position:0% than GRPO. It's just that Agro's advantage<00:14:28.800> so<00:14:29.000> far<00:14:29.320> can<00:14:29.760> be<00:14:29.920> compensated<00:14:30.560> by 00:14:30.670 --> 00:14:30.680 align:start position:0% advantage so far can be compensated by 00:14:30.680 --> 00:14:32.990 align:start position:0% advantage so far can be compensated by being<00:14:30.920> much<00:14:31.160> faster<00:14:31.640> per<00:14:32.000> unit<00:14:32.400> wall<00:14:32.600> clock 00:14:32.990 --> 00:14:33.000 align:start position:0% being much faster per unit wall clock 00:14:33.000 --> 00:14:34.829 align:start position:0% being much faster per unit wall clock and<00:14:33.200> lighter<00:14:33.560> on<00:14:33.720> memory,<00:14:34.200> so<00:14:34.320> it<00:14:34.400> can<00:14:34.520> afford 00:14:34.829 --> 00:14:34.839 align:start position:0% and lighter on memory, so it can afford 00:14:34.839 --> 00:14:37.470 align:start position:0% and lighter on memory, so it can afford more<00:14:35.000> exploration<00:14:35.680> than<00:14:35.839> GRPO.<00:14:36.720> I<00:14:36.839> personally 00:14:37.470 --> 00:14:37.480 align:start position:0% more exploration than GRPO. I personally 00:14:37.480 --> 00:14:39.190 align:start position:0% more exploration than GRPO. I personally think<00:14:37.760> more<00:14:37.959> experiments<00:14:38.600> are<00:14:38.760> needed<00:14:39.079> to 00:14:39.190 --> 00:14:39.200 align:start position:0% think more experiments are needed to 00:14:39.200 --> 00:14:41.630 align:start position:0% think more experiments are needed to draw<00:14:39.480> a<00:14:39.560> better<00:14:39.839> comparison<00:14:40.360> to<00:14:40.480> GRPO<00:14:41.120> or<00:14:41.360> even 00:14:41.630 --> 00:14:41.640 align:start position:0% draw a better comparison to GRPO or even 00:14:41.640 --> 00:14:43.710 align:start position:0% draw a better comparison to GRPO or even existing<00:14:42.120> RL<00:14:42.400> methods,<00:14:42.880> but<00:14:43.040> I<00:14:43.120> do<00:14:43.400> think 00:14:43.710 --> 00:14:43.720 align:start position:0% existing RL methods, but I do think 00:14:43.720 --> 00:14:45.829 align:start position:0% existing RL methods, but I do think evolution<00:14:44.160> strategies<00:14:44.760> is<00:14:45.079> really<00:14:45.320> promising 00:14:45.829 --> 00:14:45.839 align:start position:0% evolution strategies is really promising 00:14:45.839 --> 00:14:47.870 align:start position:0% evolution strategies is really promising from<00:14:46.079> reading<00:14:46.360> these<00:14:46.560> few<00:14:46.760> papers.<00:14:47.280> So,<00:14:47.600> I'm 00:14:47.870 --> 00:14:47.880 align:start position:0% from reading these few papers. So, I'm 00:14:47.880 --> 00:14:49.430 align:start position:0% from reading these few papers. So, I'm really<00:14:48.040> excited<00:14:48.400> to<00:14:48.480> see<00:14:48.680> how<00:14:48.880> it<00:14:49.000> develop 00:14:49.430 --> 00:14:49.440 align:start position:0% really excited to see how it develop 00:14:49.440 --> 00:14:50.949 align:start position:0% really excited to see how it develop over<00:14:49.720> time.<00:14:50.000> What<00:14:50.120> do<00:14:50.200> you<00:14:50.360> think?<00:14:50.800> Let<00:14:50.880> me 00:14:50.949 --> 00:14:50.959 align:start position:0% over time. What do you think? Let me 00:14:50.959 --> 00:14:52.510 align:start position:0% over time. What do you think? Let me know<00:14:51.120> down<00:14:51.320> in<00:14:51.400> comments.<00:14:52.040> So,<00:14:52.160> yeah,<00:14:52.320> that's 00:14:52.510 --> 00:14:52.520 align:start position:0% know down in comments. So, yeah, that's 00:14:52.520 --> 00:14:53.990 align:start position:0% know down in comments. So, yeah, that's it<00:14:52.600> for<00:14:52.720> this<00:14:52.880> video.<00:14:53.280> And<00:14:53.400> if<00:14:53.520> you<00:14:53.640> like<00:14:53.800> how<00:14:53.920> I 00:14:53.990 --> 00:14:54.000 align:start position:0% it for this video. And if you like how I 00:14:54.000 --> 00:14:55.670 align:start position:0% it for this video. And if you like how I explained<00:14:54.360> the<00:14:54.520> AI<00:14:54.680> concepts<00:14:55.160> today,<00:14:55.520> you 00:14:55.670 --> 00:14:55.680 align:start position:0% explained the AI concepts today, you 00:14:55.680 --> 00:14:56.870 align:start position:0% explained the AI concepts today, you should<00:14:55.839> definitely<00:14:56.120> check<00:14:56.320> out<00:14:56.440> my<00:14:56.560> latest 00:14:56.870 --> 00:14:56.880 align:start position:0% should definitely check out my latest 00:14:56.880 --> 00:14:58.949 align:start position:0% should definitely check out my latest project,<00:14:57.360> intuitiveai.academy, 00:14:58.949 --> 00:14:58.959 align:start position:0% project, intuitiveai.academy, 00:14:58.959 --> 00:15:00.510 align:start position:0% project, intuitiveai.academy, where<00:14:59.079> it<00:14:59.160> contains<00:14:59.720> an<00:14:59.880> intuitive 00:15:00.510 --> 00:15:00.520 align:start position:0% where it contains an intuitive 00:15:00.520 --> 00:15:02.829 align:start position:0% where it contains an intuitive explanation<00:15:01.160> of<00:15:01.400> all<00:15:01.560> modern<00:15:01.959> LMs<00:15:02.520> from<00:15:02.760> the 00:15:02.829 --> 00:15:02.839 align:start position:0% explanation of all modern LMs from the 00:15:02.839 --> 00:15:04.390 align:start position:0% explanation of all modern LMs from the ground<00:15:03.160> up,<00:15:03.440> ranging<00:15:03.760> from<00:15:04.000> LM 00:15:04.390 --> 00:15:04.400 align:start position:0% ground up, ranging from LM 00:15:04.400 --> 00:15:07.310 align:start position:0% ground up, ranging from LM architectures,<00:15:05.360> LoRa,<00:15:05.920> to<00:15:06.120> how<00:15:06.360> MoEs<00:15:06.920> work.<00:15:07.280> A 00:15:07.310 --> 00:15:07.320 align:start position:0% architectures, LoRa, to how MoEs work. A 00:15:07.320 --> 00:15:09.510 align:start position:0% architectures, LoRa, to how MoEs work. A total<00:15:07.720> of<00:15:07.920> 24<00:15:08.480> chapters<00:15:09.000> are<00:15:09.160> currently 00:15:09.510 --> 00:15:09.520 align:start position:0% total of 24 chapters are currently 00:15:09.520 --> 00:15:11.670 align:start position:0% total of 24 chapters are currently available<00:15:10.079> and<00:15:10.320> will<00:15:10.520> be<00:15:10.720> updated<00:15:11.200> monthly. 00:15:11.670 --> 00:15:11.680 align:start position:0% available and will be updated monthly. 00:15:11.680 --> 00:15:13.230 align:start position:0% available and will be updated monthly. This<00:15:11.839> is<00:15:11.920> the<00:15:12.000> start<00:15:12.240> of<00:15:12.360> a<00:15:12.400> series<00:15:12.839> where<00:15:13.079> I'll 00:15:13.230 --> 00:15:13.240 align:start position:0% This is the start of a series where I'll 00:15:13.240 --> 00:15:15.310 align:start position:0% This is the start of a series where I'll break<00:15:13.440> down<00:15:13.680> AI<00:15:13.839> topics<00:15:14.240> intuitively<00:15:15.040> because 00:15:15.310 --> 00:15:15.320 align:start position:0% break down AI topics intuitively because 00:15:15.320 --> 00:15:16.870 align:start position:0% break down AI topics intuitively because I<00:15:15.440> genuinely<00:15:16.160> think<00:15:16.400> anyone<00:15:16.720> could 00:15:16.870 --> 00:15:16.880 align:start position:0% I genuinely think anyone could 00:15:16.880 --> 00:15:18.829 align:start position:0% I genuinely think anyone could understand<00:15:17.440> them<00:15:17.680> no<00:15:17.839> matter<00:15:18.200> how<00:15:18.360> difficult 00:15:18.829 --> 00:15:18.839 align:start position:0% understand them no matter how difficult 00:15:18.839 --> 00:15:20.350 align:start position:0% understand them no matter how difficult it<00:15:18.920> may<00:15:19.079> seem.<00:15:19.480> So,<00:15:19.600> for<00:15:19.760> those<00:15:20.000> who<00:15:20.079> want<00:15:20.280> to 00:15:20.350 --> 00:15:20.360 align:start position:0% it may seem. So, for those who want to 00:15:20.360 --> 00:15:22.630 align:start position:0% it may seem. So, for those who want to get<00:15:20.520> into<00:15:20.760> AI<00:15:21.040> or<00:15:21.240> LMs,<00:15:21.880> this<00:15:22.079> should<00:15:22.360> be<00:15:22.520> the 00:15:22.630 --> 00:15:22.640 align:start position:0% get into AI or LMs, this should be the 00:15:22.640 --> 00:15:24.190 align:start position:0% get into AI or LMs, this should be the perfect<00:15:23.000> place<00:15:23.200> for<00:15:23.360> you<00:15:23.480> to<00:15:23.600> dive<00:15:23.839> into<00:15:24.079> the 00:15:24.190 --> 00:15:24.200 align:start position:0% perfect place for you to dive into the 00:15:24.200 --> 00:15:26.270 align:start position:0% perfect place for you to dive into the technical<00:15:24.640> parts<00:15:25.000> without<00:15:25.280> being<00:15:25.880> afraid<00:15:26.160> of 00:15:26.270 --> 00:15:26.280 align:start position:0% technical parts without being afraid of 00:15:26.280 --> 00:15:28.430 align:start position:0% technical parts without being afraid of crazy-looking<00:15:27.120> maths.<00:15:27.680> And<00:15:27.800> right<00:15:27.959> now,<00:15:28.200> I<00:15:28.280> am 00:15:28.430 --> 00:15:28.440 align:start position:0% crazy-looking maths. And right now, I am 00:15:28.440 --> 00:15:30.150 align:start position:0% crazy-looking maths. And right now, I am also<00:15:28.680> putting<00:15:28.920> out<00:15:29.040> a<00:15:29.120> new<00:15:29.360> launch<00:15:29.720> discount 00:15:30.150 --> 00:15:30.160 align:start position:0% also putting out a new launch discount 00:15:30.160 --> 00:15:32.310 align:start position:0% also putting out a new launch discount for<00:15:30.280> 2026,<00:15:31.000> so<00:15:31.079> you<00:15:31.160> can<00:15:31.280> use<00:15:31.440> the<00:15:31.520> code<00:15:31.920> early 00:15:32.310 --> 00:15:32.320 align:start position:0% for 2026, so you can use the code early 00:15:32.320 --> 00:15:34.470 align:start position:0% for 2026, so you can use the code early for<00:15:32.520> 40%<00:15:33.160> off<00:15:33.360> a<00:15:33.400> yearly<00:15:33.800> plan.<00:15:34.200> And<00:15:34.320> thank<00:15:34.400> you 00:15:34.470 --> 00:15:34.480 align:start position:0% for 40% off a yearly plan. And thank you 00:15:34.480 --> 00:15:36.069 align:start position:0% for 40% off a yearly plan. And thank you guys<00:15:34.640> for<00:15:34.760> watching.<00:15:35.360> A<00:15:35.400> big<00:15:35.600> shout-out<00:15:36.000> to 00:15:36.069 --> 00:15:36.079 align:start position:0% guys for watching. A big shout-out to 00:15:36.079 --> 00:15:39.670 align:start position:0% guys for watching. A big shout-out to Spam<00:15:36.440> Maj,<00:15:37.200> Chris<00:15:37.480> Ladue,<00:15:38.400> Deegan,<00:15:39.360> Robert 00:15:39.670 --> 00:15:39.680 align:start position:0% Spam Maj, Chris Ladue, Deegan, Robert 00:15:39.680 --> 00:15:42.550 align:start position:0% Spam Maj, Chris Ladue, Deegan, Robert Zaviassa,<00:15:40.760> Marcelo<00:15:41.200> Ferreria,<00:15:42.200> Proof<00:15:42.400> and 00:15:42.550 --> 00:15:42.560 align:start position:0% Zaviassa, Marcelo Ferreria, Proof and 00:15:42.560 --> 00:15:46.030 align:start position:0% Zaviassa, Marcelo Ferreria, Proof and Inu,<00:15:43.320> DX<00:15:43.680> Research<00:15:44.040> Group,<00:15:44.720> Alex,<00:15:45.680> Midwest 00:15:46.030 --> 00:15:46.040 align:start position:0% Inu, DX Research Group, Alex, Midwest 00:15:46.040 --> 00:15:47.870 align:start position:0% Inu, DX Research Group, Alex, Midwest Maker,<00:15:46.760> and<00:15:46.880> many<00:15:47.079> others<00:15:47.320> that<00:15:47.440> support<00:15:47.720> me 00:15:47.870 --> 00:15:47.880 align:start position:0% Maker, and many others that support me 00:15:47.880 --> 00:15:49.630 align:start position:0% Maker, and many others that support me through<00:15:48.079> Patreon<00:15:48.440> or<00:15:48.520> YouTube.<00:15:49.120> Follow<00:15:49.360> me<00:15:49.520> on 00:15:49.630 --> 00:15:49.640 align:start position:0% through Patreon or YouTube. Follow me on 00:15:49.640 --> 00:15:51.310 align:start position:0% through Patreon or YouTube. Follow me on Twitter<00:15:49.839> if<00:15:49.959> you<00:15:50.040> haven't,<00:15:50.440> and<00:15:50.560> I'll<00:15:50.720> see<00:15:50.959> you 00:15:51.310 --> 00:15:51.320 align:start position:0% Twitter if you haven't, and I'll see you 00:15:51.320 --> 00:15:54.280 align:start position:0% Twitter if you haven't, and I'll see you in<00:15:51.440> the<00:15:51.560> next<00:15:51.880> one.