arXiv:2511.07689v1 Announce Type: cross 
Abstract: Evaluating the factual consistency of abstractive text summarization remains a significant challenge, particularly for long documents, where conventional metrics struggle with input length limitations and long-range dependencies. In this work, we systematically evaluate the reliability of six widely used reference-free factuality metrics, originally proposed for short-form summarization, in the long-document setting. We probe metric robustness through seven factuality-preserving perturbations applied to summaries, namely paraphrasing, simplification, synonym replacement, logically equivalent negations, vocabulary reduction, compression, and source text insertion, and further analyze their sensitivity to retrieval context and claim information density. Across three long-form benchmark datasets spanning science fiction, legal, and scientific domains, our results reveal that existing short-form metrics produce inconsistent scores for semantically equivalent summaries and exhibit declining reliability for information-dense claims whose content is semantically similar to many parts of the source document. While expanding the retrieval context improves stability in some domains, no metric consistently maintains factual alignment under long-context conditions. Finally, our results highlight concrete directions for improving factuality evaluation, including multi-span reasoning, context-aware calibration, and training on meaning-preserving variations to enhance robustness in long-form summarization. We release all code, perturbed data, and scripts required to reproduce our results at https://github.com/zainmujahid/metricEval-longSum.

أظهرت دراسة حديثة تقييم موثوقية ست مقاييس للاتساق الواقعي بدون مرجع لتلخيص الوثائق الطويلة، مما يكشف عن تناقضات كبيرة في أدائها. وأبرزت الأبحاث أن المقاييس الحالية لتلخيص النصوص القصيرة تعاني من قيود في الاعتماد على السياق الطويل وتنتج درجات غير موثوقة للملخصات المتشابهة دلاليًا. هذا الأمر مهم لأن التلخيص الدقيق ضروري لفهم الوثائق المعقدة عبر مجالات متعددة، بما في ذلك الخيال العلمي والقانون والنصوص العلمية.

Un estudio reciente evaluó la fiabilidad de seis métricas de factualidad sin referencia para la resumación de documentos largos, revelando inconsistencias significativas en su rendimiento. La investigación destacó que las métricas existentes para resúmenes cortos luchan con las dependencias a largo plazo y producen puntuaciones poco fiables para resúmenes semánticamente similares. Esto es importante ya que una resumación precisa es crucial para entender documentos complejos en varios dominios, incluyendo ciencia ficción, derecho y textos científicos.

Une étude récente a évalué la fiabilité de six métriques de factualité sans référence pour la résumation de documents longs, révélant des incohérences significatives dans leur performance. La recherche a souligné que les métriques existantes pour les résumés courts ont du mal avec les dépendances à long terme et produisent des scores peu fiables pour des résumés sémantiquement similaires. Cela est important car une résumation précise est cruciale pour comprendre des documents complexes dans divers domaines, y compris la science-fiction, le droit et les textes scientifiques.

A recent study evaluated the reliability of six reference-free factuality metrics for long-document summarization, revealing significant inconsistencies in their performance. The research highlighted that existing short-form metrics struggle with long-range dependencies and produce unreliable scores for semantically similar summaries. This matters as accurate summarization is crucial for understanding complex documents across various domains, including science fiction, legal, and scientific texts.

Stress Testing Factual Consistency Metrics for Long-Document Summarization

<A HREF="https://www.businessinsider.com/meta-vibes-ai-internal-documents-show-daily-active-users-2025-11"><IMG VSPACE="4" HSPACE="4" BORDER="0" ALIGN="RIGHT" SRC="http://www.techmeme.com/251118/i40.jpg"></A>
<A HREF="http://www.techmeme.com/251118/p40#a251118p40" TITLE="Techmeme permalink"><IMG WIDTH=11 HEIGHT=12 SRC="http://www.techmeme.com/img/pml.png" STYLE="border:none;padding:0;margin:0;"></A> Pranav Dixit / <A HREF="https://www.businessinsider.com/">Business Insider</A>: 
<A HREF="https://www.businessinsider.com/meta-vibes-ai-internal-documents-show-daily-active-users-2025-11">Internal docs show Meta's Vibes AI video feed has about 2M DAUs as of November 9, up 1% from the previous week; ~52% of returning users prompted the AI</A>&nbsp; &mdash;&nbsp; - Internal documents show how Meta's new Vibes AI video feed is performing across its biggest markets.

تكشف الوثائق الداخلية أن خدمة الفيديو Vibes AI التابعة لشركة ميتا تضم حوالي 2 مليون مستخدم نشط يوميًا اعتبارًا من 9 نوفمبر، بزيادة قدرها 1% عن الأسبوع السابق. وقد تفاعل حوالي 52% من المستخدمين العائدين مع ميزة الذكاء الاصطناعي. تسلط هذه البيانات الضوء على زيادة التفاعل مع المحتوى المدعوم بالذكاء الاصطناعي من ميتا، مما يشير إلى اتجاه إيجابي في تفاعل المستخدمين واحتفاظهم.

Documentos internos revelan que el feed de video Vibes AI de Meta tiene aproximadamente 2 millones de usuarios activos diarios (UAD) a partir del 9 de noviembre, lo que representa un aumento del 1% respecto a la semana anterior. Aproximadamente el 52% de los usuarios que regresan interactuaron con la función de IA. Estos datos de rendimiento destacan el creciente compromiso con el contenido impulsado por IA de Meta, indicando una tendencia positiva en la interacción y retención de usuarios.

Des documents internes révèlent que le flux vidéo Vibes AI de Meta compte environ 2 millions d'utilisateurs actifs quotidiens (UAQ) au 9 novembre, soit une augmentation de 1 % par rapport à la semaine précédente. Environ 52 % des utilisateurs revenants ont interagi avec la fonctionnalité AI. Ces données de performance mettent en lumière l'engagement croissant envers le contenu alimenté par l'IA de Meta, indiquant une tendance positive dans l'interaction et la fidélisation des utilisateurs.

Internal documents reveal that Meta's Vibes AI video feed has approximately 2 million daily active users (DAUs) as of November 9, reflecting a 1% increase from the previous week. Notably, around 52% of returning users interacted with the AI feature. This performance data highlights the growing engagement with Meta's AI-driven content, indicating a positive trend in user interaction and retention.

Internal docs show Meta's Vibes AI video feed has about 2M DAUs as of November 9, up 1% from the previous week; ~52% of returning users prompted the AI (Pranav Dixit/Business Insider)

GlobalFoundries acquires AMF to become top silicon photonics foundry.
The post <a href="https://www.eetimes.com/gf-targets-1-billion-silicon-photonics-revenue-with-amf-acquisition/">GF Targets $1 Billion Silicon Photonics Revenue with AMF Acquisition</a> appeared first on <a href="https://www.eetimes.com">EE Times</a>.

استحوذت شركة GlobalFoundries على AMF، مما يضعها في موقع رائد في سوق الفوتونيك السيليكون. تهدف هذه الصفقة إلى تحقيق إيرادات تصل إلى مليار دولار من الفوتونيك السيليكون، وهي تقنية تدمج المكونات الضوئية مع الدوائر السيليكونية، مما يعزز سرعة وكفاءة نقل البيانات. تعكس هذه الخطوة التزام GlobalFoundries بتوسيع قدراتها في التقنيات المتقدمة وتلبية الطلب المتزايد على حلول الحوسبة عالية الأداء.

GlobalFoundries ha adquirido AMF, posicionándose como un jugador líder en el mercado de la fotónica de silicio. Esta adquisición tiene como objetivo generar 1.000 millones de dólares en ingresos de fotónica de silicio, una tecnología que integra componentes ópticos con circuitos de silicio, mejorando la velocidad y eficiencia de la transmisión de datos. Este movimiento refleja el compromiso de GlobalFoundries de expandir sus capacidades en tecnologías avanzadas y satisfacer la creciente demanda de soluciones informáticas de alto rendimiento.

GlobalFoundries a acquis AMF, se positionnant comme un acteur majeur sur le marché de la photonique silicium. Cette acquisition vise à générer 1 milliard de dollars de revenus provenant de la photonique silicium, une technologie qui intègre des composants optiques avec des circuits en silicium, améliorant ainsi la vitesse et l'efficacité de la transmission des données. Ce mouvement reflète l'engagement de GlobalFoundries à élargir ses capacités dans les technologies avancées et à répondre à la demande croissante de solutions informatiques haute performance.

GlobalFoundries has acquired AMF, positioning itself as a leading player in the silicon photonics market. This acquisition aims to generate $1 billion in revenue from silicon photonics, a technology that integrates optical components with silicon circuits, enhancing data transmission speeds and efficiency. The move reflects GlobalFoundries' commitment to expanding its capabilities in advanced technologies and meeting the growing demand for high-performance computing solutions.

GF Targets $1 Billion Silicon Photonics Revenue with AMF Acquisition

Black Friday is just two weeks away, and in the lead-up to the sales event, I've collected the best early Chromebook deals across major retailers.

يقترب يوم الجمعة السوداء بعد أسبوعين، وقد بدأت العروض المبكرة على أجهزة Chromebook تظهر في المتاجر الكبرى. يبرز هذا المقال أكثر من 20 عرضًا مبكرًا، مما يوفر للمستهلكين فرصة للتوفير في مشترياتهم خلال العطلات.

El Black Friday se acerca en dos semanas, y ya han comenzado a aparecer ofertas anticipadas en Chromebooks en los principales minoristas. Este artículo destaca más de 20 ventas anticipadas, brindando a los consumidores la oportunidad de ahorrar en sus compras navideñas.

Le Black Friday approche dans deux semaines, et des offres précoces sur les Chromebooks commencent à apparaître chez les principaux détaillants. Cet article met en avant plus de 20 ventes anticipées, offrant aux consommateurs l'occasion d'économiser sur leurs achats de vacances.

Black Friday is approaching in two weeks, and early Chromebook deals have started to appear across major retailers. This article highlights over 20 early sales, providing consumers with an opportunity to save on their holiday shopping.

Best early Black Friday Chromebook deals 2025: 20+ sales out early

Buy now, pay later firm says pay has risen by 60% with staff numbers mostly cut by natural attrition and tech investmentKlarna has claimed that AI-related savings have allowed the buy now, pay later company to increase staff salaries by nearly 60%, but hinted it could slash more jobs after nearly halving its workforce over the past three years.Chief executive Sebastian Siemiatkowski said headcount had dropped from 5,527 to 2,907 since 2022, mostly as a result of natural attrition, with departing staff replaced by technology rather than by new staff members. <a href="https://www.theguardian.com/business/2025/nov/18/buy-now-pay-later-klarna-ai-helped-halve-staff-boost-pay">Continue reading...</a>

أعلنت شركة كلارنا، المتخصصة في الدفع لاحقًا، عن زيادة بنسبة 60% في رواتب موظفيها، مشيرةً إلى أن هذه الزيادة جاءت نتيجة التوفير الناتج عن الاستثمارات في الذكاء الاصطناعي. وقد خفضت الشركة عدد موظفيها من 5,527 إلى 2,907 خلال السنوات الثلاث الماضية، وذلك بشكل رئيسي من خلال الاستقالة الطبيعية، حيث تم استبدال الموظفين المغادرين بالتكنولوجيا بدلاً من توظيف موظفين جدد. وأشار الرئيس التنفيذي سيباستيان سييمياتكوفسكي إلى أنه قد يتم تقليص المزيد من الوظائف مع استمرار الشركة في الاستفادة من التكنولوجيا لتعزيز الكفاءة.

Klarna, una empresa de 'compra ahora, paga después', ha informado de un aumento del 60% en los salarios de su personal, atribuyendo este incremento a los ahorros generados por inversiones en inteligencia artificial. La compañía ha reducido su plantilla de 5,527 a 2,907 en los últimos tres años, principalmente por medio de la baja natural, con la tecnología reemplazando a los empleados que se van en lugar de contratar nuevos. El CEO Sebastian Siemiatkowski indicó que podrían producirse más despidos a medida que la empresa continúe aprovechando la tecnología para mejorar la eficiencia.

Klarna, une entreprise de paiement différé, a annoncé une augmentation de 60 % des salaires de ses employés, attribuant cette hausse aux économies réalisées grâce aux investissements en IA. L'entreprise a réduit son effectif de 5 527 à 2 907 au cours des trois dernières années, principalement par attrition naturelle, la technologie remplaçant les employés partants au lieu d'embaucher de nouveaux. Le PDG Sebastian Siemiatkowski a indiqué que d'autres suppressions d'emplois pourraient se produire alors que l'entreprise continue d'exploiter la technologie pour améliorer son efficacité.

Klarna, a buy now, pay later company, has reported a 60% increase in staff salaries, attributing this rise to savings generated from AI investments. The company has reduced its workforce from 5,527 to 2,907 over the past three years, primarily through natural attrition, with technology replacing departing employees instead of hiring new staff. CEO Sebastian Siemiatkowski indicated that further job cuts could occur as the company continues to leverage technology to enhance efficiency.

Klarna says AI drive has helped halve staff numbers and boost pay

Evaluating Generative AI: A Novel Metric - Perceptual Diversity

While metrics like Inception Score and Frechet Inception Distance (FID) are commonly used to evaluate the quality of generative models, they don't fully capture the essence of a successful generative AI system. Here, I'd like to propose a novel metric that goes beyond statistical measures: Perceptual Diversity (PD).

What is Perceptual Diversity?

Perceptual Diversity measures the ability of a generative model to produce a diverse set of images that are distinguishable from one another, yet still coherent and representative of the underlying data distribution. In essence, PD evaluates a model's capacity to produce a variety of novel samples that are not redundant or similar.

Example: Generative AI for Architectural Design

Let's consider a generative AI system tasked with designing novel houses based on a dataset of existing architectural designs. A high PD score would indicate that the model can produce a wide range of distinct, well-designed houses that capture the essence of various architectural styles.

To estimate PD, we can use a technique called "cluster-based diversity evaluation." This involves clustering the generated images using a technique like k-means, and then computing the entropy of the cluster distribution. The higher the entropy, the more diverse the generated samples.

Example Results

Using a Generative Adversarial Network (GAN) model trained on a dataset of 1000 architectural designs, we obtained the following results:

<ul>
<li>Average Inception Score: 5.2</li>
<li>Average FID Score: 10.5</li>
<li>Average Perceptual Diversity (PD): 0.85</li>
</ul>

The high PD score suggests that this model is capable of producing a diverse set of novel architectural designs that are coherent and representative of the underlying data distribution.

Conclusion

Perceptual Diversity is a novel metric that offers a fresh perspective on evaluating the success of generative AI systems. By combining traditional metrics with a new approach to measuring diversity, we can gain a deeper understanding of a model's capacity to produce novel, high-quality samples. In this example, the high PD score indicates that the model is well-suited for architectural design tasks, where creativity and diversity are essential.




Publicado automáticamente

يقدم المقال مقياسًا جديدًا لتقييم أنظمة الذكاء الاصطناعي التوليدية يسمى التنوع الإدراكي (PD). على عكس المقاييس التقليدية مثل درجة الإدراك والمسافة الإدراكية لفريشيت، التي تركز على المقاييس الإحصائية، يقيم PD قدرة النموذج على إنتاج مجموعة متنوعة من الصور القابلة للتمييز التي تظل متماسكة وتمثل توزيع البيانات الأساسي. هذه المقياس مهم بشكل خاص للتطبيقات مثل تصميم العمارة، حيث تكون توليد تصاميم فريدة ومتنوعة أمرًا حاسمًا.

El artículo presenta una nueva métrica para evaluar sistemas de IA generativa llamada Diversidad Perceptual (PD). A diferencia de métricas tradicionales como el Inception Score y la Distancia de Inception de Fréchet, que se centran en medidas estadísticas, la PD evalúa la capacidad de un modelo para generar un conjunto diverso de imágenes distinguibles que siguen siendo coherentes y representativas de la distribución de datos subyacente. Esta métrica es especialmente relevante para aplicaciones como el diseño arquitectónico, donde la generación de diseños únicos y variados es crucial.

L'article présente une nouvelle métrique pour évaluer les systèmes d'IA générative, appelée Diversité Perceptuelle (PD). Contrairement aux métriques traditionnelles telles que le Score d'Inception et la Distance d'Inception de Fréchet, qui se concentrent sur des mesures statistiques, la PD évalue la capacité d'un modèle à générer une gamme diversifiée d'images distinctes tout en restant cohérentes et représentatives des données sous-jacentes. Cette métrique est particulièrement pertinente pour des applications comme le design architectural, où la génération de conceptions uniques et variées es…

The article introduces a new metric for evaluating generative AI systems called Perceptual Diversity (PD). Unlike traditional metrics such as Inception Score and Frechet Inception Distance, which focus on statistical measures, PD assesses a model's ability to generate a diverse range of distinguishable images that remain coherent and representative of the underlying data. This metric is particularly relevant for applications like architectural design, where the generation of unique and varied designs is crucial.

**Evaluating Generative AI: A Novel Metric - Perceptual Dive

Black Friday is Nov. 28, but major retailers like Amazon, Best Buy, and Walmart are already slashing prices on top tech.

Best early Black Friday deals 2025: 55+ deals on TVs, laptops, streaming, and more

arXiv:2511.10809v2 Announce Type: replace 
Abstract: Linear Predictive Clustering (LPC) partitions samples based on shared linear relationships between feature and target variables, with numerous applications including marketing, medicine, and education. Greedy optimization methods, commonly used for LPC, alternate between clustering and linear regression but lack global optimality. While effective for separable clusters, they struggle in non-separable settings where clusters overlap in feature space. In an alternative constrained optimization paradigm, Bertsimas and Shioda (2007) formulated LPC as a Mixed-Integer Program (MIP), ensuring global optimality regardless of separability but suffering from poor scalability. This work builds on the constrained optimization paradigm to introduce two novel approaches that improve the efficiency of global optimization for LPC. By leveraging key theoretical properties of separability, we derive near-optimal approximations with provable error bounds, significantly reducing the MIP formulation's complexity and improving scalability. Additionally, we can further approximate LPC as a Quadratic Pseudo-Boolean Optimization (QPBO) problem, achieving substantial computational improvements in some settings. Comparative analyses on synthetic and real-world datasets demonstrate that our methods consistently achieve near-optimal solutions with substantially lower regression errors than greedy optimization while exhibiting superior scalability over existing MIP formulations.

Near-optimal Linear Predictive Clustering in Non-separable Spaces via Mixed Integer Programming and Quadratic Pseudo-Boolean Reductions

arXiv:2511.07947v2 Announce Type: replace-cross 
Abstract: Machine learning models constitute valuable intellectual property, yet remain vulnerable to model extraction attacks (MEA), where adversaries replicate their functionality through black-box queries. Model watermarking counters MEAs by embedding forensic markers for ownership verification. Current black-box watermarks prioritize MEA survival through representation entanglement, yet inadequately explore resilience against sequential MEAs and removal attacks. Our study reveals that this risk is underestimated because existing removal methods are weakened by entanglement. To address this gap, we propose Watermark Removal attacK (WRK), which circumvents entanglement constraints by exploiting decision boundaries shaped by prevailing sample-level watermark artifacts. WRK effectively reduces watermark success rates by at least 88.79% across existing watermarking benchmarks.
  For robust protection, we propose Class-Feature Watermarks (CFW), which improve resilience by leveraging class-level artifacts. CFW constructs a synthetic class using out-of-domain samples, eliminating vulnerable decision boundaries between original domain samples and their artifact-modified counterparts (watermark samples). CFW concurrently optimizes both MEA transferability and post-MEA stability. Experiments across multiple domains show that CFW consistently outperforms prior methods in resilience, maintaining a watermark success rate of at least 70.15% in extracted models even under the combined MEA and WRK distortion, while preserving the utility of protected models.

Class-feature Watermark: A Resilient Black-box Watermark Against Model Extraction Attacks

arXiv:2511.06854v2 Announce Type: replace-cross 
Abstract: Irregularly sampled time series (ISTS), characterized by non-uniform time intervals with natural missingness, are prevalent in real-world applications. Existing approaches for ISTS modeling primarily rely on observed values to impute unobserved ones or infer latent dynamics. However, these methods overlook a critical source of learning signal: the reconstruction error inherently produced during model training. Such error implicitly reflects how well a model captures the underlying data structure and can serve as an informative proxy for unobserved values. To exploit this insight, we propose iTimER, a simple yet effective self-supervised pre-training framework for ISTS representation learning. iTimER models the distribution of reconstruction errors over observed values and generates pseudo-observations for unobserved timestamps through a mixup strategy between sampled errors and the last available observations. This transforms unobserved timestamps into noise-aware training targets, enabling meaningful reconstruction signals. A Wasserstein metric aligns reconstruction error distributions between observed and pseudo-observed regions, while a contrastive learning objective enhances the discriminability of learned representations. Extensive experiments on classification, interpolation, and forecasting tasks demonstrate that iTimER consistently outperforms state-of-the-art methods under the ISTS setting.

تُعتبر السلاسل الزمنية المأخوذة بشكل غير منتظم (ISTS) شائعة في التطبيقات الواقعية، حيث تتميز بفترات زمنية غير متساوية وغيابات طبيعية. تعتمد الأساليب الحالية لنمذجة ISTS بشكل أساسي على القيم المرصودة لاستنتاج القيم غير المرصودة، متجاهلة إشارة التعلم الناتجة عن خطأ إعادة البناء الذي يتم إنتاجه أثناء تدريب النموذج. يقترح الإطار المقترح iTimER استغلال هذا الخطأ في إعادة البناء لتحسين تعلم التمثيل من خلال توليد ملاحظات زائفة للعلامات الزمنية غير المرصودة، مما يحسن نمذجة ISTS.

Las series temporales muestreadas de manera irregular (ISTS) son comunes en aplicaciones del mundo real, caracterizadas por intervalos de tiempo no uniformes y ausencias naturales. Los métodos existentes para la modelización de ISTS suelen depender de valores observados para inferir los no observados, ignorando la señal de aprendizaje proveniente del error de reconstrucción generado durante el entrenamiento del modelo. El marco propuesto iTimER aprovecha este error de reconstrucción para mejorar el aprendizaje de la representación generando pseudo-observaciones para marcas de tiempo no observa…

Les séries temporelles échantillonnées de manière irrégulière (ISTS) sont courantes dans les applications réelles, caractérisées par des intervalles de temps non uniformes et des absences naturelles. Les méthodes traditionnelles de modélisation des ISTS s'appuient souvent sur des valeurs observées pour inférer celles non observées, négligeant le signal d'apprentissage provenant de l'erreur de reconstruction produite lors de l'entraînement du modèle. Le cadre proposé iTimER exploite cette erreur de reconstruction pour améliorer l'apprentissage de la représentation en générant des pseudo-observa…

Irregularly sampled time series (ISTS) are common in real-world applications, characterized by non-uniform time intervals and natural missingness. Traditional ISTS modeling methods often rely on observed values to infer unobserved ones, neglecting the learning signal from reconstruction error produced during model training. The proposed iTimER framework leverages this reconstruction error to enhance representation learning by generating pseudo-observations for unobserved timestamps, thus improving the modeling of ISTS.

Stress Testing Factual Consistency Metrics for Long-Document Summarization

Was this article worth reading? Share it