How to boost LLM performance during pre-training: A preview AI_Distilled #97: What’s New in AI This Week Build Your AI Chatbot with Free LLM Boomcamp Join LLM Zoomcamp, a free online course starting on June 2 and build an end-to-end AI chatbot tailored to your use case. In 10 weeks, you’ll learn key skills like working with LLMs and RAG, vector search for indexing and retrieval, how to evaluate and monitor performance, and key best practices for building robust, real-world applications. REGISTER NOW FOR FREE It’s time for the final issue of May 2025. In this edition, we bring you the top five news highlights of the week, upcoming events shaping the AI and LLM landscape, and a sneak peek into techniques for optimizing LLM performance. LLM Expert Insights, Packt In today's issue: 🧠 Expert Deep Dive: This week, we explore pre-training optimization techniques—from quantization to flash attention—for building faster, smarter LLMs. 📅 Webinar Watchlist: June’s top AI/LLM webinars cover automation, cybersecurity, healthcare, legal AI, and multimodal fine-tuning. 🔌 Build AI Agents This Weekend: Join Packt’s Accelerated Agentic AI Bootcamp—hands-on, fast-paced, and 35% off. 📚 Optimize Your LLM Stack: Learn more from Generative AI with Python and PyTorch—a guide to efficient training and deployment. 🚀 DeepSeek V3 Debuts: China’s latest open-source model steps up with better reasoning and dev capabilities. 📰 Publishers vs. AI Search: Google CEO Sundar Pichai defends AI-powered results amid growing backlash from content creators. 📱 Apple Rebrands for 2026: WWDC will unveil iOS 26 and align all platforms under a unified OS naming strategy. 🎨 Sam Altman x Jony Ive: OpenAI teams up with the design legend to build magical, AI-first consumer products. 🧠 Anthropic Traces Thoughts: Claude’s internal reasoning gets visualized through groundbreaking interpretability research. 📈UPCOMING EVENTS JUNE'S MUST ATTEND AI/LLM WEBINARS In June 2025, a number of exciting AI webinars are already generating buzz. Here are the Top 5 not-to-miss events in the next month (for more information and registration details, please visit the links): 1. AI-Enhanced Motion Control: Innovations Driving Automation Forward Date: June 5, 2025 Time: 12:00 PM – 1:00 PM ET Location: Online Cost: Free Hosted by the Association for Advancing Automation, this webinar explores how AI is revolutionizing motion control systems, enhancing precision, efficiency, and adaptability across various industries. 2. AI Security Webinar – Practical Measures to Mitigate AI and Cybersecurity Risks Date: June 11, 2025 Time: 11:00 AM – 12:30 PM BST Location: Online Cost: Free Presented by The Alan Turing Institute, this interactive webinar brings together industry experts and SMEs to share practical, cost-efficient, and high-impact security measures that deliver maximum AI and cybersecurity protection for businesses. 3. Clinical Large Language Models in Healthcare – Applications, Challenges, and Opportunities Date: June 12, 2025 Time: 10:00 AM – 11:00 AM CEST Location: Online Cost: Free Organized by the Helmholtz Information & Data Science Academy in collaboration with NORA, this webinar features Anne Torill Nordsletta discussing the role of large language models in healthcare, exploring applications, challenges, and future opportunities in the clinical setting. 4. Inside the TBI Playbook: How I Use AI to Win the Hardest Cases Date: June 17, 2025 Time: 1:00 PM – 2:30 PM EST Location: Online Cost: Free Hosted by Anytime AI™, this CLE-accredited webinar features attorney Taylor Ernst sharing insights on leveraging AI in traumatic brain injury litigation. Attendees will learn about practical applications of AI tools in complex legal cases. 5. Multi-Modal LLM Fine-Tuning of Unstructured Data with Dataloop & SingleStore Date: June 18, 2025 Time: 10:00 AM – 11:00 AM PST Location: Online Cost: Free Presented by SingleStore, this webinar explores techniques for fine-tuning multi-modal large language models on unstructured data, covering integration strategies with Dataloop and SingleStore platforms. Machine Learning Summit 2025 JULY 16–18 | LIVE (VIRTUAL) 20+ ML Experts | 20+ Sessions | 3 Days of Practical Machine Learning and 35% OFF BOOK NOW AND SAVE 35% Use Code EMAIL35 at checkout when purchasing the 3-day ticket Limited to the first 50 customers EXPERT INSIGHTS PRE-TRAINING OPTIMIZATION TECHNIQUES FOR LLMs The scale of data and computation required for large language models (LLMs), along with the significant capital investment needed to train and deploy them, necessitates the exploration of optimization techniques throughout the LLM lifecycle. In this issue, we focus on potential improvements during the pre-training phase, as this is the most resource-intensive step, involving a vast amount of data and sensitivity to architectural design. Here are some techniques you can employ to improve LLM performance and efficiency: 1. Quantization: Quantization aims to reduce the number of bits needed to store these weights by binning floating-point values into lower-precision buckets. This reduces memory usage with minimal impact on performance. Small precision losses are acceptable as long as the model’s performance is within the required levels. For instance, a weight value like 3.1457898 could be quantized to 3.1458 using a scheme that retains four decimal places. Such a scheme might lead to slight changes (during the backward pass of the training step, for example, a higher margin of error) while computing loss or while updating weights. Take, for instance, 4-bit quantization, which uses small bins where the density of weights is higher and fewer larger bins for weights away from the mean. The 4-bit float representation employs an intelligent approach based on the distribution of model weights. Most weights tend to cluster near zero, with minor differences requiring higher precision, while fewer weights have larger values. To accommodate this, asymmetric binning is used: smaller bins are allocated for values near the mean to maintain precision, while fewer larger bins handle outliers further from the mean. 2. Mixed precision: This is another technique to reduce memory and computational demands without sacrificing significant accuracy. These methods combine different numerical formats, such as float16, int8, and more, to optimize efficiency and performance during training or inference. 3. Data efficiency: Large datasets are costly to process, and redundant or noisy data can negatively impact model performance. Therefore, data efficiency techniques can be applied to achieve high model accuracy and generalization with a reduced or optimized dataset. This process includes filtering data for quality, reducing redundancy, and applying sampling techniques to emphasize high-value samples. 4. Sparse attention: Instead of computing attention weights for every pair of tokens in the input sequence, sparse attention focuses only on a subset of tokens, exploiting patterns in the data or task-specific properties. To put things into perspective, think about decoder-only architectures like GPT trained with an auto-regressive language objective. Such an objective puts a constraint on the attention layer to be causal, and thus, only the lower triangular attention matrix is useful (but the computation is still done for the whole matrix). Different architectures leverage specific patterns, like local or strided attention mechanisms, to bring in efficiency in computation time. 5. Flash attention: Flash attention takes the route of hardware-based improvements and efficiencies to compute attention scores. There are two popular techniques for sparse attention: Kernel fusion and Tiling. Kernel fusion reduces the number of I/O operations by combining all steps (elementwise operations, matrix multiplication, softmax, etc.) into a single read-write operation. This technique is pretty effective during inference. Tiling, on the other hand, breaks down the overall attention calculation into smaller and manageable groups of operations that fit into fast and low-latency GPU memory. For instance, instead of computing softmax across the entire attention matrix at once, FlashAttention computes it over smaller chunks in a numerically stable and tiled fashion, thus making use of faster memory without the need to store a large matrix. 6. Mixture of Experts (MoE) architecture: MOE is an advanced architecture designed to leverage a subset of components (or experts) rather than the whole architecture itself, thereby achieving higher scalability and efficiency. The Experts in this architecture are independent modules or blocks of the network, where each can be trained to specialize in a specific task. While the Router is a module that learns to select which experts to leverage (or activate) for a given input based on different criteria. The Router itself can be a neural network. 7. Efficient architectures: There are a number of different patterns and techniques that have been developed and leveraged by different architectural improvements over the years. Some of the popular architectures are Linformer, Reformer, and Big Bird. Apart from pre-training optimizations, there are other techniques as well, such as fine-tuning and improvements in inference time. More recently, the availability and popularity of small language models and specialized hardware and frameworks has also contributed to significant improvements in the overall efficiency of resource-constrained environments. Liked the Insights? Want to dig in deeper? If you wish to learn more about these techniques or wish to dive deep into foundational aspects of the LLM ecosystem, you can check out the book, Generative AI with Python and PyTorch, Second Edition, by Joseph Babcock and Raghav Bali. BUY NOW 📈LATEST DEVELOPMENT Let’s kick things off with the top stories of the week. China is aiming for the top spot in the AI race with DeepSeek V3's latest release DeepSeek just released -V3-0324, claiming a major boost in reasoning, front-end development capabilities, and smarter tool use. The release positions DeepSeek as a serious contender to models like Code Llama and Codex. You can try out the open-source weights from this HuggingFace card. Publishers claim AI-Search is an internet takeover, Pichai defends it as an innovation In a podcast with Nilay Patel (Editor-in-Chief of The Verge), Google CEO Sundar Pichai shared candid thoughts on AI’s impact on the internet. He defended AI-generated search results amid backlash, insisting they won’t kill the open web. As Google walks a tightrope between innovation and publisher outrage, Pichai expressed confidence that AI will ultimately “enhance,” not erase, human content. He dodged revenue concerns but acknowledged the risks of unchecked AI growth. Catch the full conversation here. Apple’s branding power move with iOS26 A Bloomberg report says that Apple is set to revamp its OS branding game at WWDC-2025. The rebranding will sync all platforms with the upcoming 2026 launch year, setting the stage for a unified, modernized software identity with iOS 26, macOS 26, and watchOS 26. SamA and Ive team up for AI-first products OpenAI is collaborating with design icon Jony Ive and his firm LoveFrom to craft AI-powered products. Jony Ive, Scott Cannon, Evans Hankey, and Tang Tan led io team will collaborate closely with Open AI’s research and engineering teams, with LoveFrom leading design and creative responsibilities. Their goal: to recapture the magic, creativity, and wonder of early Apple-era technology. Hear more about their vision in this video. Anthropic inching towards interpretable AI? Anthropic just cracked open the black box of AI thinking with its latest research, Tracing Thoughts. Using a novel method called dictionary learning, researchers mapped how language models like Claude internally form and organize thoughts. They uncovered thousands of hidden features that resemble abstract concepts and reasoning steps. This breakthrough gives us a glimpse into not just what AI predicts—but how it thinks. Dive into this investigative research here. 📢 If your company is interested in reaching an audience of developers and, technical professionals, and decision makers, you may want toadvertise with us. If you have any comments or feedback, just reply back to this email. Thanks for reading and have a great day! That’s a wrap for this week’s edition of AI_Distilled 🧠⚙️ We would love to know what you thought—your feedback helps us keep leveling up. 👉 Drop your rating here Thanks for reading, The AI_Distilled Team (Curated by humans. Powered by curiosity.) *{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
LLM Expert Insights, Packt
30 May 2025