arXiv:2511.08376v1 Announce Type: new 
Abstract: This paper introduces TurkEmbed, a novel Turkish language embedding model designed to outperform existing models, particularly in Natural Language Inference (NLI) and Semantic Textual Similarity (STS) tasks. Current Turkish embedding models often rely on machine-translated datasets, potentially limiting their accuracy and semantic understanding. TurkEmbed utilizes a combination of diverse datasets and advanced training techniques, including matryoshka representation learning, to achieve more robust and accurate embeddings. This approach enables the model to adapt to various resource-constrained environments, offering faster encoding capabilities. Our evaluation on the Turkish STS-b-TR dataset, using Pearson and Spearman correlation metrics, demonstrates significant improvements in semantic similarity tasks. Furthermore, TurkEmbed surpasses the current state-of-the-art model, Emrecan, on All-NLI-TR and STS-b-TR benchmarks, achieving a 1-4\% improvement. TurkEmbed promises to enhance the Turkish NLP ecosystem by providing a more nuanced understanding of language and facilitating advancements in downstream applications.

تقديم نموذج تركي جديد يسمى TurkEmbed يمثل تقدمًا كبيرًا في مهام الاستدلال اللغوي الطبيعي (NLI) والتشابه النصي الدلالي (STS). من خلال استخدام مجموعات بيانات متنوعة وتقنيات تدريب متقدمة، يتفوق TurkEmbed على النماذج الحالية، بما في ذلك النموذج الرائد Emrecan، مع تحسين يتراوح بين 1-4%. يعد هذا التطور تعزيزًا لنظام معالجة اللغة التركية، مما يعد بتحسين الدقة والفهم الدلالي في تطبيقات متنوعة.

La introducción de TurkEmbed, un nuevo modelo de incrustación en lengua turca, representa un avance significativo en las tareas de Inferencia de Lenguaje Natural (NLI) y Similitud Textual Semántica (STS). Al utilizar conjuntos de datos diversos y técnicas de entrenamiento avanzadas, TurkEmbed supera a los modelos existentes, incluido el modelo de vanguardia Emrecan, con una mejora del 1 al 4%. Este desarrollo mejora el ecosistema de PLN turco, prometiendo una mayor precisión y comprensión semántica en diversas aplicaciones.

L'introduction de TurkEmbed, un nouveau modèle d'embedding en langue turque, représente une avancée significative dans les tâches d'inférence en langage naturel (NLI) et de similarité textuelle sémantique (STS). En utilisant des ensembles de données diversifiés et des techniques d'entraînement avancées, TurkEmbed surpasse les modèles existants, y compris le modèle de pointe Emrecan, avec une amélioration de 1 à 4 %. Ce développement renforce l'écosystème NLP turc, promettant une meilleure précision et compréhension sémantique dans diverses applications.

The introduction of TurkEmbed, a new Turkish language embedding model, marks a significant advancement in Natural Language Inference (NLI) and Semantic Textual Similarity (STS) tasks. By utilizing diverse datasets and advanced training techniques, TurkEmbed outperforms existing models, including the state-of-the-art Emrecan, by 1-4%. This development enhances the Turkish NLP ecosystem, promising improved accuracy and semantic understanding in various applications.

TurkEmbed: Turkish Embedding Model on NLI & STS Tasks

Guardio is leveraging its experience building browser extensions and apps that scan for malicious and phishing sites to build a tool that looks for artifacts in code and websites made with vibe coding tools.

نجحت شركة Guardio الناشئة في مجال الأمن السيبراني في تأمين 80 مليون دولار من التمويل من ION Crossover Partners. تشتهر الشركة بخبرتها في تطوير إضافات المتصفح والتطبيقات التي تكشف عن المواقع الضارة والمحتالة. تخطط Guardio لاستخدام هذا التمويل لإنشاء أداة تبحث عن الآثار في الشيفرة والمواقع التي تم إنشاؤها باستخدام أدوات البرمجة vibe.

La startup de seguridad Guardio ha asegurado 80 millones de dólares en financiamiento de ION Crossover Partners. La empresa es conocida por su experiencia en el desarrollo de extensiones de navegador y aplicaciones que detectan sitios maliciosos y de phishing. Guardio planea utilizar estos fondos para crear una herramienta que identifique artefactos en el código y sitios web construidos con herramientas de codificación vibe.

La startup de sécurité Guardio a obtenu 80 millions de dollars de financement de la part d'ION Crossover Partners. Connue pour son expertise dans le développement d'extensions de navigateur et d'applications détectant les sites malveillants et de phishing, Guardio prévoit d'utiliser ce financement pour créer un outil identifiant les artefacts dans le code et les sites web construits avec des outils de codage vibe.

Security startup Guardio has secured $80 million in funding from ION Crossover Partners. The company is known for its expertise in developing browser extensions and applications that detect malicious and phishing websites. Guardio plans to utilize this funding to create a tool that identifies artifacts in code and websites built with vibe coding tools.

Security startup Guardio nabs $80M from ION Crossover Partners

A new artificial intelligence startup founded by the creators of <a href="https://opencv.org/">the world&#x27;s most widely used computer vision library</a> has emerged from stealth with technology that generates realistic human-centric videos up to five minutes long — a dramatic leap beyond the capabilities of rivals including OpenAI&#x27;s <a href="https://openai.com/sora/">Sora</a> and Google&#x27;s <a href="https://deepmind.google/models/veo/">Veo</a>.<a href="https://craftstory.com/">CraftStory</a>, which launched Tuesday with $2 million in funding, is introducing Model 2.0, a video generation system that addresses one of the most significant limitations plaguing the nascent AI video industry: duration. While OpenAI&#x27;s <a href="https://openai.com/index/sora-2/">Sora 2</a> tops out at 25 seconds and most competing models generate clips of 10 seconds or less, CraftStory&#x27;s system can produce continuous, coherent video performances that run as long as a typical YouTube tutorial or product demonstration.The breakthrough could unlock substantial commercial value for enterprises struggling to scale video production for training, marketing, and customer education — markets where brief AI-generated clips have proven inadequate despite their visual polish.&quot;If you really try to create a video with one of these video generation systems, you find that a lot of the times you want to implement a certain creative vision, and regardless of how detailed the instructions are, the systems basically ignore a part of your instructions,&quot; said Victor Erukhimov, CraftStory&#x27;s founder and CEO, in an exclusive interview with VentureBeat. &quot;We developed a system that can generate videos basically as long as you need them.&quot;<h3>How parallel processing solves the long-form video problem</h3>CraftStory&#x27;s advance rests on what the company describes as a parallelized diffusion architecture — a fundamentally different approach to how AI models generate video compared to the sequential methods employed by most competitors.Traditional video generation models work by running diffusion algorithms on increasingly large three-dimensional volumes where time represents the third axis. To generate a longer video, these models require proportionally larger networks, more training data, and significantly more computational resources.<a href="https://craftstory.com/">CraftStory</a> instead runs multiple smaller diffusion algorithms simultaneously across the entire duration of the video, with bidirectional constraints connecting them. &quot;The latter part of the video can influence the former part of the video too,&quot; Erukhimov explained. &quot;And this is pretty important, because if you do it one by one, then an artifact that appears in the first part propagates to the second one, and then it accumulates.&quot;Rather than generating eight seconds and then stitching on additional segments, CraftStory&#x27;s system processes all five minutes concurrently through interconnected diffusion processes.Crucially, CraftStory trained its model on proprietary footage rather than relying solely on internet-scraped videos. The company hired studios to shoot actors using high-frame-rate camera systems that capture crisp detail even in fast-moving elements like fingers — avoiding the motion blur inherent in standard 30-frames-per-second YouTube clips.&quot;What we showed is that you don&#x27;t need a lot of data and you don&#x27;t need a lot of training budget to create high quality videos,&quot; Erukhimov said. &quot;You just need high quality data.&quot;Model 2.0 currently operates as a video-to-video system: users upload a still image to animate and a &quot;driving video&quot; containing a person whose movements the AI will replicate. CraftStory provides preset driving videos shot with professional actors, who receive revenue shares when their motion data is used, or users can upload their own footage.The system generates 30-second clips at low resolution in approximately 15 minutes. An advanced lip-sync system synchronizes mouth movements to scripts or audio tracks, while gesture alignment algorithms ensure body language matches speech rhythm and emotional tone.<h3>Fighting a war chest battle with $2 million against billions</h3>CraftStory&#x27;s funding comes almost entirely from <a href="https://finance.yahoo.com/news/2-25-billion-exit-taught-130300997.html">Andrew Filev</a>, who sold his project management software company Wrike to Citrix for <a href="https://techcrunch.com/2021/01/19/citrix-is-acquiring-wrike-from-vista-for-2-25b/">$2.25 billion</a> in 2021 and now runs <a href="https://zencoder.ai/">Zencoder</a>, an AI coding company. The modest raise stands in stark contrast to the billions flowing into competing efforts — OpenAI has <a href="https://www.reuters.com/technology/artificial-intelligence/openai-closes-66-billion-funding-haul-valuation-157-billion-with-investment-2024-10-02/">raised over $6 billion</a> in its latest funding round alone.Erukhimov pushed back on the notion that massive capital is prerequisite for success. &quot;I don&#x27;t necessarily buy the thesis that compute is the path to success,&quot; he said. &quot;It definitely helps if you have compute. But if you raise a billion dollars on a PowerPoint, in the end, no one is happy, neither the founders nor the investors.&quot;Filev defended the David-versus-Goliath approach. &quot;When you invest in startups, you&#x27;re fundamentally betting on people,&quot; he said in an interview with VentureBeat. &quot;To paraphrase Margaret Mead: never underestimate what a small group of thoughtful, committed engineers and scientists can build.&quot;He argued that CraftStory benefits from a focused strategy. &quot;The big labs are in an arms race to build general-purpose video foundation models,&quot; Filev said. &quot;CraftStory is riding that wave and going very deep into a specific format: long-form, engaging, human-centric video.&quot;<h3>Why computer vision expertise matters in generative AI video</h3>Erukhimov&#x27;s credibility stems from his deep roots in computer vision rather than the transformer architectures that have dominated recent AI advances. He was an early contributor to <a href="https://opencv.org/">OpenCV</a> — the Open Source Computer Vision Library that has become the de facto standard for computer vision applications, with over <a href="https://github.com/opencv/opencv">84,000 stars on GitHub</a>.When Intel reduced its support for OpenCV in the mid-2000s, Erukhimov co-founded Itseez with the explicit goal of maintaining and advancing the library. The company expanded OpenCV significantly and pivoted toward automotive safety systems before Intel acquired it in 2016.Filev said this background is precisely what makes Erukhimov well-positioned for video generation. &quot;What people sometimes miss is that generative AI video isn&#x27;t just about the generative part. It&#x27;s about understanding motion, facial dynamics, temporal coherence, and how humans actually move,&quot; Filev said. &quot;Victor has spent his career mastering exactly those problems.&quot;<h3>Enterprise focus targets training videos and product demos</h3>While much of the public excitement around AI video generation has centered on creative tools for consumers, CraftStory is pursuing a decidedly enterprise-focused strategy.&quot;We are definitely thinking about B2B more than consumer,&quot; Erukhimov said. &quot;We&#x27;re thinking about companies, specifically software companies, being able to make cool training videos and product videos and launch videos.&quot;The logic is straightforward: corporate training, product tutorials, and customer education videos often run several minutes and require consistent quality throughout. A 10-second AI clip cannot effectively demonstrate how to use enterprise software or explain a complex product feature.&quot;If you need a longer-form video, then you should go with us,&quot; Erukhimov said. &quot;We can create up to five minutes, consistent video, high quality.&quot;Filev echoed this assessment. &quot;One huge gap in this market is the lack of models that can generate consistent videos over longer sequences — and that&#x27;s extremely important for real-world use,&quot; he said. &quot;If you&#x27;re creating a commercial for your company, a 10-second video, no matter how good it looks, just isn&#x27;t enough. You need 30 seconds, you need two minutes — you need more.&quot;The company anticipates cost savings for customers. Filev suggested that &quot;a small business owner could create content in minutes that previously would have cost $20,000 and taken two months to produce.&quot;CraftStory is also courting creative agencies that produce video content for corporate clients, with the value proposition centered on cost and speed: agencies can record an actor on camera and transform that footage into a finished AI video, rather than managing expensive multi-day shoots.The next major development on CraftStory&#x27;s roadmap is a text-to-video model that would allow users to generate long-form content directly from scripts. The team is also developing support for moving-camera scenarios, including the popular &quot;walk-and-talk&quot; format common in high-end advertising.<h3>Where CraftStory fits in a fragmented competitive landscape</h3>CraftStory enters a crowded and rapidly evolving market. OpenAI&#x27;s <a href="https://openai.com/index/sora-2/">Sora 2</a>, while not yet publicly available, has generated significant buzz. Google&#x27;s <a href="https://deepmind.google/models/veo/">Veo models</a> are advancing quickly. <a href="https://runwayml.com/">Runway</a>, <a href="https://pika.art/login">Pika</a>, and <a href="https://stability.ai/">Stability AI </a>all offer video generation tools with different capabilities.Erukhimov acknowledged the competitive pressure but emphasized that CraftStory serves a distinct niche focused on human-centric videos. He positioned rapid innovation and market capture as the company&#x27;s primary strategy rather than relying on technical moats.Filev sees the market fragmenting into distinct layers, with large tech companies serving as &quot;API providers of powerful, general-purpose generation models&quot; while specialized players like CraftStory focus on specific use cases. &quot;If the big players are building the engines, CraftStory is building the production studio and assembly line on top,&quot; he said.Model 2.0 is available now at app.craftstory.com/model-2.0, with the company offering early access to users and enterprises interested in testing the technology. Whether a lightly-funded startup can capture meaningful market share against deep-pocketed incumbents remains uncertain, but Erukhimov is characteristically confident about the opportunity ahead.&quot;AI-generated video will soon become the primary way companies communicate their stories,&quot; he said.

أطلقت CraftStory، وهي شركة ناشئة جديدة في مجال الذكاء الاصطناعي أسسها مبتكرو OpenCV، نظامًا لتوليد الفيديو قادرًا على إنتاج مقاطع فيديو واقعية تركز على الإنسان تصل مدتها إلى خمس دقائق. تتجاوز هذه التكنولوجيا بشكل كبير قدرات المنافسين مثل Sora من OpenAI وVeo من Google، الذين لديهم حدود زمنية أقصر. حصلت الشركة الناشئة على تمويل بقيمة مليوني دولار لدعم نهجها المبتكر في صناعة الفيديو بالذكاء الاصطناعي.

CraftStory, una nueva startup de IA fundada por los creadores de OpenCV, ha lanzado un sistema de generación de video capaz de producir videos realistas centrados en humanos de hasta cinco minutos de duración. Esta tecnología supera significativamente a competidores como Sora de OpenAI y Veo de Google, que tienen límites de duración más cortos. La startup ha asegurado 2 millones de dólares en financiamiento para apoyar su enfoque innovador en la industria de video de IA.

CraftStory, une nouvelle startup d'IA fondée par les créateurs d'OpenCV, a lancé un système de génération vidéo capable de produire des vidéos réalistes centrées sur l'humain d'une durée allant jusqu'à cinq minutes. Cette technologie surpasse considérablement les concurrents tels que Sora d'OpenAI et Veo de Google, qui ont des limites de durée plus courtes. La startup a sécurisé 2 millions de dollars de financement pour soutenir son approche innovante de l'industrie vidéo IA.

CraftStory, a new AI startup founded by the creators of OpenCV, has launched a video generation system capable of producing realistic human-centric videos up to five minutes long. This technology significantly outpaces competitors like OpenAI's Sora and Google's Veo, which have shorter duration limits. The startup has secured $2 million in funding to support its innovative approach to the AI video industry.

OpenCV founders launch AI video startup to take on OpenAI and Google

<a href="https://petapixel.com/2025/11/19/remote-cameras-may-have-captured-first-recorded-tool-use-by-a-wild-wolf/"><img width="1600" height="840" src="https://petapixel.com/assets/uploads/2025/11/wolve-tool-use.jpg" class="attachment-card-large size-card-large wp-post-image" alt="A wolf stands at the edge of a rocky shoreline, holding an orange and white fishing bobber in its mouth. A fishing net and additional gear are lying on the ground nearby. Rippling water is in the background." decoding="async" fetchpriority="high" /></a>Remote cameras have captured footage of wild wolves pulling crab traps out of the sea by their lines to eat the bait inside -- in the first evidence of possible tool use by the canines.
[<a href="https://petapixel.com/2025/11/19/remote-cameras-may-have-captured-first-recorded-tool-use-by-a-wild-wolf/">Read More</a>]

التقطت الكاميرات عن بُعد لقطات لذئاب برية تستخدم تقنية تتضمن سحب فخاخ السلطعون من البحر للوصول إلى الطعم داخلها. وهذا يمثل أول دليل موثق على استخدام محتمل للأدوات من قبل هذه الكلاب.

Cámaras remotas han grabado a lobos salvajes utilizando una técnica que consiste en sacar trampas de cangrejos del mar para acceder al cebo en su interior. Esto marca la primera evidencia documentada de un posible uso de herramientas por parte de estos caninos.

Des caméras à distance ont enregistré des loups sauvages utilisant une technique qui consiste à tirer des pièges à crabes de la mer pour accéder à l'appât à l'intérieur. Cela marque la première preuve documentée d'un potentiel usage d'outils par ces canidés.

Remote cameras have recorded wild wolves using a technique that involves pulling crab traps from the sea to access the bait inside. This marks the first documented evidence of potential tool use by these canines.

Remote Cameras May Have Captured First Recorded Tool Use by a Wild Wolf

The AI startup Firebird Inc. has received US government approval to export Nvidia Corp. chips to Armenia for a supercomputer project in the country, part of a global push to expand artificial intelligence infrastructure.

حصلت شركة Firebird Inc. الناشئة في مجال الذكاء الاصطناعي على موافقة الحكومة الأمريكية لتصدير شرائح Nvidia Corp. إلى أرمينيا، مما يسهل إنشاء مشروع حاسوب فائق في البلاد. تأتي هذه المبادرة كجزء من جهد عالمي أوسع لتعزيز بنية الذكاء الاصطناعي التحتية.

La startup de IA Firebird Inc. ha recibido la aprobación del gobierno de EE. UU. para exportar chips de Nvidia Corp. a Armenia, facilitando el establecimiento de un proyecto de supercomputadora en el país. Esta iniciativa forma parte de un esfuerzo global más amplio para mejorar la infraestructura de inteligencia artificial.

La startup d'IA Firebird Inc. a obtenu l'approbation du gouvernement américain pour exporter des puces de Nvidia Corp. en Arménie, facilitant ainsi l'établissement d'un projet de superordinateur dans le pays. Cette initiative s'inscrit dans un effort mondial plus large pour améliorer l'infrastructure de l'intelligence artificielle.

AI startup Firebird Inc. has obtained approval from the US government to export Nvidia Corp. chips to Armenia, facilitating the establishment of a supercomputer project in the country. This initiative is part of a broader global effort to enhance artificial intelligence infrastructure.

AI Startup Firebird Gets US Approval to Use Nvidia Chips in Armenian Data Center

Gesture of goodwill temporarily cools EU–China dispute that rattled global car supply chains.
The post <a href="https://www.techrepublic.com/article/news-netherlands-nexperia-chipmaker-control/">Netherlands Pauses Move to Seize Chinese-Owned Chipmaker Nexperia</a> appeared first on <a href="https://www.techrepublic.com">TechRepublic</a>.

أوقفت هولندا مؤقتًا جهودها للاستيلاء على شركة Nexperia لصناعة الرقائق المملوكة للصين، وهو إجراء يُعتبر بمثابة لفتة حسن نية تهدف إلى تهدئة التوترات في النزاع القائم بين الاتحاد الأوروبي والصين. تأتي هذه الخطوة وسط مخاوف بشأن تأثيرها على سلاسل الإمداد العالمية التي تأثرت بالفعل بنقص أشباه الموصلات.

Los Países Bajos han detenido temporalmente sus esfuerzos por apoderarse del fabricante de chips Nexperia, de propiedad china, un gesto que se considera como una buena voluntad para aliviar las tensiones en la disputa entre la UE y China. Esta decisión se produce en medio de preocupaciones sobre el impacto en las cadenas de suministro globales, que ya se han visto afectadas por la escasez de semiconductores.

Les Pays-Bas ont temporairement suspendu leurs efforts pour saisir le fabricant de puces Nexperia, détenu par des Chinois, un geste perçu comme une volonté d'apaiser les tensions dans le conflit en cours entre l'UE et la Chine. Cette décision intervient dans un contexte d'inquiétudes concernant l'impact sur les chaînes d'approvisionnement mondiales, déjà affectées par des pénuries de semi-conducteurs.

The Netherlands has temporarily halted its efforts to seize control of the Chinese-owned chipmaker Nexperia, a move seen as a gesture of goodwill aimed at easing tensions in the ongoing EU-China dispute. This decision comes amid concerns over the impact on global car supply chains, which have been affected by semiconductor shortages.

Netherlands Pauses Move to Seize Chinese-Owned Chipmaker Nexperia

استقال لاري سامرز من مجلس إدارة OpenAI، كما أفاد نيويورك تايمز. تأتي استقالته بعد التدقيق في اتصالاته السابقة مع المدان بجريمة الاعتداء الجنسي جيفري إبستين. تمثل هذه الخطوة انسحابًا كبيرًا لسامرز من الأدوار العامة وسط انتقادات متزايدة.

Larry Summers ha renunciado a la junta de OpenAI, según informa The New York Times. Su renuncia se produce tras el escrutinio sobre sus comunicaciones pasadas con el delincuente sexual condenado Jeffrey Epstein. Esta decisión marca un paso significativo en el retiro de Summers de los roles públicos en medio de una creciente crítica.

Larry Summers a démissionné du conseil d'administration d'OpenAI, selon le New York Times. Sa démission fait suite à un examen minutieux de ses communications passées avec le délinquant sexuel condamné Jeffrey Epstein. Cette décision marque un retrait significatif de Summers de ses rôles publics face à une critique croissante.

Larry Summers has resigned from the board of OpenAI, as reported by The New York Times. His resignation follows scrutiny over his past communications with convicted sex offender Jeffrey Epstein. This decision marks a significant step in Summers' withdrawal from public roles amid growing criticism.

Larry Summers Resigns From OpenAI’s Board

arXiv:2511.10664v1 Announce Type: new 
Abstract: Large language models (LLMs) have achieved impressive results in high-resource languages like English, yet their effectiveness in low-resource and morphologically rich languages remains underexplored. In this paper, we present a comprehensive evaluation of seven cutting-edge LLMs -- including GPT-4o, GPT-4, Claude~3.5~Sonnet, LLaMA~3.1, Mistral~Large~2, LLaMA-2~Chat~13B, and Mistral~7B~Instruct -- on a new cross-lingual benchmark covering \textbf{Cantonese, Japanese, and Turkish}. Our benchmark spans four diverse tasks: open-domain question answering, document summarization, English-to-X translation, and culturally grounded dialogue. We combine \textbf{human evaluations} (rating fluency, factual accuracy, and cultural appropriateness) with automated metrics (e.g., BLEU, ROUGE) to assess model performance.
  Our results reveal that while the largest proprietary models (GPT-4o, GPT-4, Claude~3.5) generally lead across languages and tasks, significant gaps persist in culturally nuanced understanding and morphological generalization. Notably, GPT-4o demonstrates robust multilingual performance even on cross-lingual tasks, and Claude~3.5~Sonnet achieves competitive accuracy on knowledge and reasoning benchmarks. However, all models struggle to some extent with the unique linguistic challenges of each language, such as Turkish agglutinative morphology and Cantonese colloquialisms. Smaller open-source models (LLaMA-2~13B, Mistral~7B) lag substantially in fluency and accuracy, highlighting the resource disparity. We provide detailed quantitative results, qualitative error analysis, and discuss implications for developing more culturally aware and linguistically generalizable LLMs. Our benchmark and evaluation data are released to foster reproducibility and further research.

تقيّم دراسة حديثة أداء سبعة نماذج متقدمة للغة (LLMs) على لغات ذات موارد منخفضة وغنية بالمورفولوجيا، تحديدًا الكانتونية واليابانية والتركية. تبرز الأبحاث فعالية النماذج في مهام مثل الإجابة على الأسئلة المفتوحة، تلخيص الوثائق، الترجمة، والحوار المستند إلى الثقافة. على الرغم من النتائج المثيرة للإعجاب في اللغات ذات الموارد العالية، تشير الدراسة إلى أن فعالية LLMs في هذه اللغات الأقل دراسة لا تزال غير مستكشفة.

Un estudio reciente evalúa el rendimiento de siete modelos de lenguaje avanzados (LLMs) en lenguas de bajos recursos y morfológicamente ricas, específicamente cantonés, japonés y turco. La investigación destaca la efectividad de los modelos en tareas como respuesta a preguntas de dominio abierto, resumen de documentos, traducción y diálogo culturalmente fundamentado. A pesar de los resultados impresionantes en lenguas de altos recursos, el estudio indica que la efectividad de los LLMs en estas lenguas menos estudiadas sigue siendo poco explorada.

Une étude récente évalue la performance de sept modèles de langage avancés (LLMs) sur des langues à faibles ressources et morphologiquement riches, notamment le cantonais, le japonais et le turc. La recherche met en évidence l'efficacité des modèles dans des tâches telles que la réponse à des questions en domaine ouvert, la synthèse de documents, la traduction et le dialogue culturellement ancré. Malgré des résultats impressionnants dans les langues à fortes ressources, l'étude indique que l'efficacité des LLMs dans ces langues moins étudiées reste sous-explorée.

A recent study evaluates the performance of seven advanced large language models (LLMs) on low-resource and morphologically rich languages, specifically Cantonese, Japanese, and Turkish. The research highlights the models' effectiveness in tasks such as open-domain question answering, document summarization, translation, and culturally grounded dialogue. Despite impressive results in high-resource languages, the study indicates that the effectiveness of LLMs in these less-studied languages remains underexplored.

Evaluating Modern Large Language Models on Low-Resource and Morphologically Rich Languages:A Cross-Lingual Benchmark Across Cantonese, Japanese, and Turkish

arXiv:2506.22481v2 Announce Type: replace-cross 
Abstract: In recent years, significant advancements in the field of Natural Language Processing (NLP) have positioned commercialized language models as wide-reaching, highly useful tools. In tandem, there has been an explosion of multidisciplinary research examining how NLP tasks reflect, perpetuate, and amplify social biases such as gender and racial bias. A significant gap in this scholarship is a detailed analysis of how queer sexualities are encoded and (mis)represented by both NLP systems and practitioners. Following previous work in the field of AI fairness, we document how sexuality is defined and operationalized via a survey and analysis of 55 articles that quantify sexuality-based NLP bias. We find that sexuality is not clearly defined in a majority of the literature surveyed, indicating a reliance on assumed or normative conceptions of sexual/romantic practices and identities. Further, we find that methods for extracting biased outputs from NLP technologies often conflate gender and sexual identities, leading to monolithic conceptions of queerness and thus improper quantifications of bias. With the goal of improving sexuality-based NLP bias analyses, we conclude with recommendations that encourage more thorough engagement with both queer communities and interdisciplinary literature.

أدت التطورات الأخيرة في معالجة اللغة الطبيعية (NLP) إلى استخدام واسع النطاق لنماذج اللغة، مما أثار أبحاثًا حول كيفية انعكاس وتعزيز التحيزات الاجتماعية، بما في ذلك التحيزات الجندرية والعرقية. ومع ذلك، هناك فجوة ملحوظة في تحليل كيفية تمثيل الهويات الجنسية غير التقليدية في أنظمة NLP. تكشف دراسة شملت 55 مقالًا أن مفهوم الجنسية غالبًا ما يكون غير محدد بوضوح، مما يعتمد على افتراضات معيارية حول الهويات والممارسات الجنسية والرومانسية، مما يثير مخاوف بشأن كيفية تشغيل مفهوم الجنسية في أبحاث التحيز في NLP.

Los avances recientes en el procesamiento del lenguaje natural (NLP) han llevado a un uso generalizado de modelos de lenguaje, lo que ha provocado investigaciones sobre cómo se reflejan y amplifican los sesgos sociales, incluidos los sesgos de género y raciales. Sin embargo, existe una notable brecha en el análisis de cómo se representan las sexualidades queer en los sistemas de NLP. Una encuesta de 55 artículos revela que la sexualidad a menudo está mal definida, dependiendo de suposiciones normativas sobre las identidades sexuales y románticas, lo que plantea preocupaciones sobre la operacio…

Les avancées récentes en traitement du langage naturel (NLP) ont conduit à une utilisation généralisée des modèles linguistiques, suscitant des recherches sur la réflexion et l'amplification des biais sociaux, y compris les biais de genre et raciaux. Cependant, il existe un écart notable dans l'analyse de la représentation des sexualités queer dans les systèmes NLP. Une enquête sur 55 articles révèle que la sexualité est souvent mal définie, reposant sur des hypothèses normatives concernant les identités sexuelles et romantiques, ce qui soulève des préoccupations quant à l'opérationnalisation …

Recent advancements in Natural Language Processing (NLP) have led to the widespread use of language models, prompting research into the reflection and amplification of social biases, including gender and racial bias. However, there is a notable gap in the analysis of how queer sexualities are represented in NLP systems. A survey of 55 articles reveals that sexuality is often poorly defined, relying on normative assumptions about sexual and romantic identities, which raises concerns about the operationalization of sexuality in NLP bias research.

Theories of "Sexuality" in Natural Language Processing Bias Research

arXiv:2503.11858v3 Announce Type: replace 
Abstract: Large Language Models (LLMs) have demonstrated great potential as evaluators of NLG systems, allowing for high-quality, reference-free, and multi-aspect assessments. However, existing LLM-based metrics suffer from two major drawbacks: reliance on proprietary models to generate training data or perform evaluations, and a lack of fine-grained, explanatory feedback. In this paper, we introduce OpeNLGauge, a fully open-source, reference-free NLG evaluation metric that provides accurate explanations based on error spans. OpeNLGauge is available as a two-stage ensemble of larger open-weight LLMs, or as a small fine-tuned evaluation model, with confirmed generalizability to unseen tasks, domains and aspects. Our extensive meta-evaluation shows that OpeNLGauge achieves competitive correlation with human judgments, outperforming state-of-the-art models on certain tasks while maintaining full reproducibility and providing explanations more than twice as accurate.

OpeNLGauge هي مقياس مفتوح المصدر جديد لتقييم أنظمة توليد اللغة الطبيعية (NLG) باستخدام نماذج اللغة الكبيرة (LLMs). على عكس المقاييس الحالية التي تعتمد على نماذج ملكية، يوفر OpeNLGauge تقييمات بدون مرجع ويقدم تفسيرات دقيقة تعتمد على نطاقات الأخطاء. تم تصميمه ليكون قابلاً للتكيف مع مهام ومجالات متنوعة، حيث يظهر ارتباطًا تنافسيًا مع أحكام البشر ويتفوق على بعض النماذج المتطورة مع ضمان القابلية للتكرار.

OpeNLGauge es una nueva métrica de código abierto para la evaluación de sistemas de Generación de Lenguaje Natural (NLG) que utiliza Modelos de Lenguaje de Gran Tamaño (LLMs). A diferencia de las métricas existentes que dependen de modelos propietarios, OpeNLGauge ofrece evaluaciones sin referencia y proporciona explicaciones detalladas basadas en rangos de error. Está diseñada para ser adaptable a diversas tareas y dominios, mostrando una correlación competitiva con los juicios humanos y superando a algunos modelos de última generación, garantizando la reproducibilidad.

OpeNLGauge est une nouvelle métrique open-source pour l'évaluation des systèmes de génération de langage naturel (NLG) utilisant des modèles de langage de grande taille (LLM). Contrairement aux métriques existantes qui dépendent de modèles propriétaires, OpeNLGauge propose des évaluations sans référence et fournit des explications détaillées basées sur des plages d'erreurs. Elle est conçue pour être adaptable à diverses tâches et domaines, montrant une corrélation compétitive avec les jugements humains et surpassant certains modèles à la pointe de la technologie tout en garantissant la reprodu…

OpeNLGauge is a newly introduced open-source metric for evaluating Natural Language Generation (NLG) systems using Large Language Models (LLMs). Unlike existing metrics that depend on proprietary models, OpeNLGauge offers reference-free evaluations and provides detailed explanations based on error spans. It is designed to be adaptable to various tasks and domains, demonstrating competitive correlation with human judgments and outperforming some state-of-the-art models while ensuring reproducibility.

OpeNLGauge: An Explainable Metric for NLG Evaluation with Open-Weights LLMs

arXiv:2511.14112v1 Announce Type: new 
Abstract: Automatic ICD coding from clinical text is a critical task in medical NLP but remains hindered by the extreme long-tail distribution of diagnostic codes. Thousands of rare and zero-shot ICD codes are severely underrepresented in datasets like MIMIC-III, leading to low macro-F1 scores. In this work, we propose a data-centric framework that generates high-quality synthetic discharge summaries to mitigate this imbalance. Our method constructs realistic multi-label code sets anchored on rare codes by leveraging real-world co-occurrence patterns, ICD descriptions, synonyms, taxonomy, and similar clinical notes. Using these structured prompts, we generate 90,000 synthetic notes covering 7,902 ICD codes, significantly expanding the training distribution. We fine-tune two state-of-the-art transformer-based models, PLM-ICD and GKI-ICD, on both the original and extended datasets. Experiments show that our approach modestly improves macro-F1 while maintaining strong micro-F1, outperforming prior SOTA. While the gain may seem marginal relative to the computational cost, our results demonstrate that carefully crafted synthetic data can enhance equity in long-tail ICD code prediction.

يعد الترميز التلقائي لرموز ICD من النصوص السريرية أمرًا ضروريًا في معالجة اللغة الطبيعية الطبية، ولكنه يواجه تحديات بسبب توزيع الرموز التشخيصية الطويلة. العديد من رموز ICD النادرة ممثلة تمثيلًا ناقصًا في مجموعات البيانات مثل MIMIC-III، مما يؤدي إلى انخفاض درجات macro-F1. يقدم هذا العمل إطارًا مركزيًا للبيانات يولد ملخصات خروج اصطناعية عالية الجودة للتخفيف من هذا الخلل. باستخدام أنماط التواجد الواقعية وموارد أخرى، يتم إنتاج 90,000 ملاحظة اصطناعية تغطي 7,902 رمز ICD، مما يزيد بشكل كبير من توزيع التدريب. يُظهر ضبط النموذجين PLM-ICD وGKI-ICD على هذه المجموعات من البيانات تحسينات متواضعة في درجات m…

El codificación automática de ICD a partir de textos clínicos es esencial en el procesamiento del lenguaje natural médico, pero enfrenta desafíos debido a la distribución de larga cola de los códigos diagnósticos. Muchos códigos ICD raros están subrepresentados en conjuntos de datos como MIMIC-III, lo que resulta en bajos puntajes macro-F1. Este trabajo presenta un marco centrado en los datos que genera resúmenes de alta calidad para mitigar este desequilibrio. Utilizando patrones de co-ocurrencia del mundo real y otros recursos, se generan 90,000 notas sintéticas que cubren 7,902 códigos ICD,…

Le codage automatique des ICD à partir de textes cliniques est essentiel en NLP médical, mais il est confronté à des défis en raison de la distribution longue traîne des codes diagnostiques. De nombreux codes ICD rares sont sous-représentés dans des ensembles de données comme MIMIC-III, entraînant de faibles scores macro-F1. Ce travail propose un cadre centré sur les données qui génère des résumés de sortie synthétiques pour remédier à ce problème. En utilisant des modèles de co-occurrence du monde réel et d'autres ressources, la méthode produit 90 000 notes synthétiques pour 7 902 codes ICD, …

Automatic ICD coding from clinical text is essential in medical NLP but faces challenges due to the long-tail distribution of diagnostic codes. Many rare ICD codes are underrepresented in datasets like MIMIC-III, resulting in low macro-F1 scores. This work introduces a data-centric framework that generates synthetic discharge summaries to address this issue. By utilizing real-world co-occurrence patterns and other resources, the method produces 90,000 synthetic notes for 7,902 ICD codes, enhancing the training distribution. Fine-tuning of PLM-ICD and GKI-ICD models on these datasets shows mode…

TurkEmbed: Turkish Embedding Model on NLI & STS Tasks

Was this article worth reading? Share it