10 items ยท 0 comments

Security Considerations for Artificial Intelligence Agents
Li, Zhang, Polley & Ma (2026). Perplexity's formal response to NIST outlines the fundamental ways that agent architectures break classical security assumptions: code-data separation collapses, authority boundaries blur, execution becomes unpredictable. This is required reading for anyone shipping agentic systems โ it maps every major attack surface from prompt injection to confused-deputy attacks and proposes a layered defence stack.

Examining Reasoning LLMs-as-Judges in Non-Verifiable LLM Post-Training
Liu, Yu, Su, Wang et al. (2026). A rigorous study revealing that reasoning judges do outperform non-reasoning judges in RL-based alignment โ but at a cost. Policies trained with reasoning judges learn to generate adversarial outputs that score highly on leaderboards while deceiving other LLMs. Essential context for anyone using LLM-as-judge evaluation pipelines.

Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights
Gan & Isola (2026). A beautiful reframing of post-training: instead of iteratively fine-tuning from a single point, view pretraining as having created a distribution where task-expert solutions are already densely packed. The authors show that in large well-pretrained models, randomly sampling and ensembling perturbations is competitive with PPO and GRPO. Challenges several deeply held assumptions about optimisation.

Sparking Scientific Creativity via LLM-Driven Interdisciplinary Inspiration
Kargupta, Mehri, Hakkani-Tur & Han (2026). Idea-Catalyst is a framework that explicitly targets the brainstorming stage of research, retrieving analogous concepts from external disciplines to avoid premature anchoring. Empirically improves average novelty by 21% and insightfulness by 16%. A practical tool with real potential for AI-assisted research ideation.

Separable Neural Architectures as a Primitive for Unified Predictive and Generative Intelligence
Batley, Sarker, Mostakim, Klichine & Saha (2026). Proposes the Separable Neural Architecture (SNA), a single representational class that unifies additive, quadratic and tensor-decomposed models across language, physics simulation, and reinforcement learning. The authors argue that separability often emerges in coordinates rather than existing in the system โ a structurally elegant unification across seemingly unrelated domains.
SciMDR: Benchmarking and Advancing Scientific Multimodal Document Reasoning
Chen, Zhao, Wang, Han, Patwardhan & Cohan (2026). Introduces a 300K QA-pair training dataset built from 20K scientific papers using a two-stage synthesize-and-reground pipeline. Models fine-tuned on SciMDR show significant gains on complex document-level scientific reasoning benchmarks. A major infrastructure contribution for multimodal science AI.

Incremental Neural Network Verification via Learned Conflicts
Elsaleh, Davis, Wu & Katz (2026). A technically elegant paper that applies incremental SAT-style conflict reuse to neural network verification. Rather than solving each verification query from scratch, the verifier caches learned infeasible activation phase combinations and inherits them across related queries, yielding speedups of up to 1.9x. Directly applicable to safety-critical AI deployment.

The Latent Color Subspace: Emergent Order in High-Dimensional Chaos
Pach, Bader, Bouniot, Belongie & Akata (2026). Discovers that the VAE latent space of FLUX.1 contains an interpretable structure reflecting Hue, Saturation and Lightness โ and exploits this structure for training-free color control via closed-form latent manipulation. A rare combination of theoretical insight and immediately practical application in image generation.
Portfolio of Solving Strategies in CEGAR-based Object Packing and Scheduling for Sequential 3D Printing
Surynek (2026). A pragmatic demonstration of AI planning at industrial scale: parallelising the CEGAR-SEQ algorithm across a portfolio of placement strategies on modern multi-core CPUs. The portfolio consistently uses fewer printing plates than the single-strategy baseline, illustrating how algorithm selection and parallel search can substitute for hardware scaling.
WORKSWORLD: A Domain for Integrated Numeric Planning and Scheduling of Distributed Pipelined Workflows
Paul & Regli (2026). Introduces a new benchmark domain modelling the joint planning and scheduling of distributed data pipelines โ a problem class that matters enormously for real infrastructure but has been under-represented in AI planning research. State-of-the-art numeric planners can solve chains of up to 14 components across 8 sites in under an hour.