CTO at NCSC Summary: week ending December 17th
Experimenting and evidencing our way to cyber resilience...
Welcome to the weekly highlights and analysis of the blueteamsec subreddit (and my wider reading). Not everything makes it in, but the best bits do.
Operationally this week nothing overly of note beyond the usual tempo.
In the high-level this week:
Further reporting related to the FSB outing last week
Russia's FSB malign activity: factsheet - The UK government has publicly attributed malign cyber activity to parts of three Russian Intelligence services: the FSB, SVR and GRU, with each having their own remits.
Australian statement on Russian cyber targeting of democratic processes - Attempts to use cyber to interfere in democratic processes are unacceptable and must stop.
Two Russian Nationals Working with Russia’s Federal Security Service Charged with Global Computer Intrusion Campaign - A federal grand jury in San Francisco returned an indictment on Tuesday charging two individuals with a campaign to hack into computer networks in the United States, the United Kingdom, other North Atlantic Treaty Organization member countries and Ukraine, all on behalf of the Russian government.
Researching the hard problems in hardware security - Introducing the next chapter of the NCSC research problem book, which aims to inspire research on the biggest impact topics in hardware cyber security.
US National Defense Authorization Act For Fiscal Year 2024 - lots and lots of cyber including Pilot program to improve cyber cooperation with covered foreign military partners in Southeast Asia.
FBI Policy Notice - entitled "Cyber Victim Requests to Delay Securities and Exchange Commission Public Disclosure Police Notice 1297N." - How law enforcement and regulations align.
Don’t hack it alone: Calls for Australians to report ransomware attacks - The AFP is renewing calls for Australian businesses to move quickly to report ransomware attacks to law enforcement, with recent research confirming that victims who reported incidents to authorities experienced significant time and cost savings.
FCC Reminds Carriers of SIM Fraud Prevention Obligations - The new warning is consistent with the findings released by the Department of Homeland Security (DHS)’s Cyber Safety Review Board (CSRB) in August, that outlined how threat actors from a global extortion-focused cyber hacker group engaged in fraudulent SIM swaps to carry out data breaches in furtherance of ransom and extortion schemes.
Minister lashes DP World hack failure - Home Affairs Minister Clare O’Neil has lashed stevedore DP World for failing to patch its tech systems despite public warnings, leading to the hack that forced the company to shut down of 40 per cent of the country’s port capacity.
HIPAA fine of $480,000 - Lafourche Medical Group, LLC Resolution Agreement and Corrective Action Plan - LMG never conducted a Security Rule risk analysis and LMG never implemented procedures to regularly review records of information system activity prior to the 2021 security [event]
Trafficking for cyber fraud an increasingly globalised crime, Interpol says - Interpol said its first operation targeting human-trafficking fueled cyber fraud showed the criminal industry was going global, spreading beyond its origins in Southeast Asia, with scam centres emerging as far away as Latin America.
CISA and ENISA enhance their Cooperation - The European Union Agency for Cybersecurity (ENISA) has signed a Working Arrangement with the Cybersecurity and Infrastructure Security Agency (CISA) of the US, in the areas of capacity-building, best practices exchange and boosting situational awareness.
SSSCIP and the German partners strengthen partnerships on cybersecurity - It is aimed at supporting Ukraine in fostering professionals in line with European cybersecurity standards and requirements to enhance global cybersecurity.
SSSCIP Members Attend a Training Event Arranged by the International Telecommunication Union - As a part of the combined team with the Spanish National Cybersecurity Institute (INCIBE), they won the first place twice in a series of practical exercises, and took the second and the seventh place in another two.
Reporting on/from China
China’s cyber army is invading critical U.S. services - The Chinese military is ramping up its ability to disrupt key American infrastructure, including power and water utilities as well as communications and transportation systems, according to U.S. officials and industry security officials.
China’s Hackers Are Expanding Their Strategic Objectives - China-based hackers are accessing U.S. infrastructure and developing methods to disrupt it in the event of conflict.
US anti-virus software "Trend Micro" R&D center withdraws from China - Cao Yongle said that China's political environment is no longer suitable for foreign Internet companies to survive, including the "Counterespionage Law", "National Security Law" and related laws are becoming more and more stringent
UAE’s top AI group vows to phase out Chinese hardware to appease US - G42 of the United Arab Emirates is making the move to ensure its access to US-made chips by allaying concerns among its American partners
Artificial intelligence
Artificial Intelligence Act: deal on comprehensive rules for trustworthy AI - MEPs reached a political deal with the Council on a bill to ensure AI in Europe is safe, respects fundamental rights and democracy, while businesses can thrive and expand.
Congress and E.U. diverge on AI policy, as Brussels races to reach deal - The split-screen events on either side of the Atlantic underscore the challenges of regulating artificial intelligence, a rising priority for governments around the world
We asked a government advisor about 2 key problems (and solutions) in AI regulation - The answer, according to Jennings, is not to rely on any one country’s legislative body. To do so would be pointless anyway. AI software and technology are borderless.
Human Rights Impact Assessment and AI - this chapter suggests a model for human rights impact assessment (HRIA) as part of the broader HRESIA model.
Exploring Synthetic Data for Artificial Intelligence and Autonomous Systems: A Primer - an overview of the main opportunities and limitations of synthetic data in the training of AI systems. While synthetic data can be a proxy for real-world data and help shorten training cycles, among other benefits, there are also significant risks and challenges associated with its use.
Cyber proliferation
Reuters Takes Down Blockbuster Hacker-for-Hire Investigation After Indian Court Order - Lawfare in action..
Reflections this week come from discussing with various parties how we might approach evidencing and quantifying the efficacy of cyber security solutions both in isolation and as part of systems. The spectrum of views and anxiety in places is understandable and profound. Clearly there are incentives for and against such transparency - which is clearly going to be one of the bigger hurdles to overcome, let alone how the experiments are designed.
Enjoying this? Don’t get via e-mail? Subscribe:
Think someone else would benefit? Share:
All attribution is by others and not the UK Government, please see the legal text at the end.
Have a lovely Thursday
Ollie
Cyber threat intelligence
Who is doing what to whom and how.
Reporting on Russia
Fighting Ursa Aka APT28: Illuminating a Covert Campaign
Reporting on wider alleged exploitation of CVE-2023-23397 from Russia in this reporting. Shows that when an actor has a capability they may use it far and wide.
[We] discovered a third, recently active campaign in which Fighting Ursa also used this vulnerability. The group conducted this most recent campaign between September-October 2023, targeting at least nine organizations in seven nations.
Of the 14 nations targeted throughout all three campaigns, all are organizations within NATO member countries, except for entities in Ukraine, Jordan and the United Arab Emirates. These organizations included critical infrastructure and entities that provide an information advantage in diplomatic, economic and military affairs.
Target organizations included those related to:
Energy production and distribution
Pipeline operations
Materiel, personnel and air transportation
Ministries of Defense
Ministries of Foreign Affairs
Ministries of Internal Affairs
Ministries of the Economy
https://unit42.paloaltonetworks.com/russian-apt-fighting-ursa-exploits-cve-2023-233397/
ITG05 operations leverage Israel-Hamas conflict lures to deliver Headlace malware
Golo Mühr, Claire Zaboeva and Joe Fasulo detail what looks like a rather targeted campaign by this threat actor. ITG05 is known as APT28 to others and note the different CVE in use by them.
As of December 2023, [we] uncovered multiple lure documents that predominately feature the ongoing Israel-Hamas war to facilitate the delivery of the ITG05 exclusive Headlace backdoor. The newly discovered campaign is directed against targets based in at least 13 nations worldwide and leverages authentic documents created by academic, finance and diplomatic centers. ITG05’s infrastructure ensures only targets from a single specific country can receive the malware, indicating the highly targeted nature of the campaign.
This is the first known use of the Israel-Hamas conflict by ITG05 to conduct campaigns delivering the exclusive Headlace backdoor.
The campaign leverages documents associated with the United Nations, the Bank of Israel, the United States Congressional Research Service, the European Parliament, a Ukrainian think tank and an Azerbaijan-Belarus Intergovernmental Commission.
[We] observed the deployment of Headlace and secondary payloads to be specifically targeted toward at least 13 nations.
Some of the uncovered lures are contained in a .RAR archive exploiting the CVE-2023-38831 vulnerability, others use DLL-hijacking to run Headlace.
Headlace is a multi-component malware including a dropper, a VBS launcher and a backdoor using MSEdge in headless mode to continuously download secondary payloads, likely to exfiltrate credentials and sensitive information.
Russian influence and cyber operations adapt for long haul and exploit war fatigue
Clint Watts provides an insight on how some aspects of our economy and society can be weaponised in information operations. Makes the time I got Spike from Buffy and Riker to wish happy birthday to my wife seem pedestrian by comparison.
Since July 2023, Russia-aligned influence actors have tricked celebrities into providing video messages that were then used in pro-Russian propaganda. These videos were then manipulated to falsely paint Ukrainian President Volodymyr Zelensky as a drug addict. This is one of the insights in the latest biannual report on Russian digital threats “Russian Threat Actors Dig In, Prepare to Seize on War Fatigue”
As described in more detail in the report, this campaign aligns with the Russian government’s broader strategic efforts during the period from March to October 2023, across cyber and influence operations (IO), to stall Ukrainian military advances and diminish support for Kyiv.
Ukraine's top mobile operator hit by biggest cyberattack of war so far
High-level press reporting here from Reuters. Gives validation on why the UK passed the Telecommunications Security Act to ensure separation of the management plain to privileged access workstations.
Kyivstar's IT systems 'partially destroyed'
CEO says attack connected to war with Russia
Cellular and internet connections down
Ukraine investigating possibility of Russian state involvement
Ukrainian intelligence attacks and paralyses Russia’s tax system
Quote from the Ukraine government:
During the special operation, military spies managed to break into one of the well-protected key central servers of the Federal Taxation Service (FTS of the Russian Federation), and then into more than 2,300 of its regional servers throughout Russia, as well as on the territory of temporarily occupied Crimea.
As a result of the cyberattack, all servers were infected with malware...
The Russians have been unsuccessfully trying to restore the work of the Russian tax authorities for the fourth day in a row. The experts say the paralysis in the work of the Federal Taxation Service of the Russian Federation will last at least a month. At the same time, resuscitation of the tax system of the aggressor state in full is impossible.
https://www.pravda.com.ua/eng/news/2023/12/12/7432737/
Reporting on China
Sandman APT | China-Based Adversaries Embrace Lua
Aleksandar Milenkoski details this campaign that they attribute to China. The thing of note is how the attribution was achieved beyond the technical.
The Sandman APT is likely associated with suspected China-based threat clusters known to use the KEYPLUG backdoor, in particular a cluster jointly presented by PwC and Microsoft at Labscon 2023 – STORM-0866/Red Dev 40.
The Sandman’s Lua-based malware LuaDream and the KEYPLUG backdoor were observed co-existing in the same victim environments.
Sandman and STORM-0866/Red Dev 40 share infrastructure control and management practices, including hosting provider selections, and domain naming conventions.
The implementation of LuaDream and KEYPLUG reveals indicators of shared development practices and overlaps in functionalities and design, suggesting shared functional requirements by their operators.
The use of the Lua development paradigm in the cyberespionage domain, historically associated with actors considered Western or Western-aligned, is likely being adopted by a broader range of adversaries, including those with ties to China.
https://www.sentinelone.com/labs/sandman-apt-china-based-adversaries-embrace-lua/
Reporting on North Korea
Operation Blacksmith: Lazarus targets organizations worldwide using novel Telegram-based malware written in DLang
Jungsoo An, Asheer Malhotra and Vitor Ventura detail alleged evolution in tradecraft by the hermit kingdom. Of note is the use of a lesser known language likely in an attempt to avoid detection engines, algorithms and signatures.
{We] recently discovered a new campaign conducted by the Lazarus Group we’re calling “Operation Blacksmith,” employing at least three new DLang-based malware families, two of which are remote access trojans (RATs), where one of these uses Telegram bots and channels as a medium of command and control (C2) communications. We track this Telegram-based RAT as “NineRAT” and the non-Telegram-based RAT as “DLRAT.” We track the DLang-based downloader as “BottomLoader.”
Our latest findings indicate a definitive shift in the tactics of the North Korean APT group Lazarus Group. Over the past year and a half, Talos has disclosed three different remote access trojans (RATs) built using uncommon technologies in their development, like QtFramework, PowerBasic and, now, DLang.
[We] observed an overlap between our findings in this campaign conducted by Lazarus including tactics, techniques and procedures (TTPs) consistent with the North Korean state-sponsored group Onyx Sleet (PLUTIONIUM), also known as the Andariel APT group. Andariel is widely considered to be an APT sub-group under the Lazarus umbrella.
This campaign consists of continued opportunistic targeting of enterprises globally that publicly host and expose their vulnerable infrastructure to n-day vulnerability exploitation such as CVE-2021-44228 (Log4j). We have observed Lazarus target manufacturing, agricultural and physical security companies.
https://blog.talosintelligence.com/lazarus_new_rats_dlang_and_telegram/
Analysis of attack samples suspected of Lazarus (APT-Q-1) involving NPM package supply chain
Chinese reporting on alleged hermit kingdom activity who are apparently continuing to use open source supply chain routes to get footholds. Of note in this reporting is the use of an encrypted payload.
[We] recently discovered a batch of relatively complex downloader samples. These samples are loaded through multiple layers of nested PE files, and finally download the subsequent payload from the C2 server and execute it. One of the C2 server IP addresses was recently disclosed for use in a software supply chain attack, where the attacker delivered malware by disguising itself as an encryption-related npm package. Combining the content of the above report and the information of the downloader sample itself, it can be confirmed that these downloader malware are related to this npm package supply chain attack.
Analysis of the activities of APT-C-28 (ScarCruft) targeting the deployment of Chinotto components in South Korea
Further Chinese reporting on potential hermit kingdom activity. The interesting aspect of this reporting was the use of emotive topics e.g. Fukushima sewage in their lures.
discovered that the organization hosted malicious attack files in the backend of a website, involving ZIP and RAR type compressed package files carrying payloads. These payloads are released or loaded remotely with malicious scripts, and then load Powershell type files without files. The Chinotto Trojan carries out secret theft operations, and the number of remote control commands loaded by the Trojan has increased, indicating that the organization is constantly optimizing and updating its payloads to achieve the purpose of stealing secrets.
Reporting on Iran
Iranian Cyber Av3ngers Compromise Unitronics Systems
A situation which can only be described as not ideal here with regards to the proliferation of information.
The NewBloodProject Telegram channel, which is associated with GhostSec and describes itself as "a community for learning everything related to ethical hacking and hacktivism," posted a PDF guide to compromising ICS and SCADA systems in March. This guide contains a section dedicated to Unitronics devices
https://www.secureworks.com/blog/iranian-cyber-av3ngers-compromise-unitronics-systems
Iranian Hacktivist Proxies Escalate Activities Beyond Israel
An analysis here which implies the spill over might have been deliberate.
Expanded Cyber Frontline: Recent developments in cyber warfare reveal a shift in the activities of Iranian hacktivist proxies. Initially concentrated on Israel, these groups are now extending their cyber operations to include targets in other countries, with a particular emphasis on the United States.
Emerging Narrative from Iranian Hacktivist Groups: Analysis shows that at least four Iranian hacktivist groups are now actively claiming to be targeting U.S. entities. This shift is characterized by a mix real successful attacks, reuse and reclaim of old attacks and leaks, and what seems to be exaggerated and falsified claims.
Strategy of Iranian Affiliated Groups: Groups such as CyberAv3ngers and Cyber Toufan appear to be adopting a narrative of retaliation in their cyberattacks. By opportunistically targeting U.S. entities using Israeli technology, these hacktivist proxies try to achieve a dual retaliation strategy—claiming to target both Israel and the U.S. in a single, orchestrated cyber assault.
Reporting on Other Actors
UTG-Q-003: Supply Chain Poisoning of 7ZIP on the Microsoft App Store
Interesting Chinese reporting that a stealer got into the Microsoft App Store claiming to be 7ZIP.
It remains unclear how the attackers managed to upload the malicious installation package to the Microsoft App Store. According to [our] big data platform, the earliest download of the 7z-soft software occurred on March 17, 2023.
Discovery
How we find and understand the latent compromises within our environments.
A (beta) Canarytoken for Active Directory Credentials
Roberto and team drop this goodness to help impose operational cost on our adversaries. Fly my canaries.. fly..
Our newest AD tokens allow you to create fake credentials that can be left in all the familiar places, but without a heavy software component. A single, light-weight script that runs on your Domain Controller lets you know when the fake credentials are used.
https://blog.thinkst.com/2023/12/a-beta-canarytoken-for-active-directory-credentials.html
MyApps and Excessive App Access
Threat actors often browse MyApps in Microsoft deployed environments excessively. Here is the hunting query to detect that activity. Love this for its simplicity.
gist.github.com/cbecks2/0fb02238829b5ea21f51a1e71b90b990
Domain and Website Attribution beyond WHOIS
Silvia Sebastián, Raluca-Georgia Diugan, Juan Caballerom, Iskander Sanchez-Rola and Leyla Bilge detail an approach which appears to improve.
Currently, WHOIS is the main method for identifying which company or individual owns a domain or website. But, WHOIS usefulness is limited due to privacy protection services and data redaction. We present a novel automated approach for domain and website attribution. When WHOIS data does not reveal the owner, our approach leverages information from multiple other sources such as passive DNS, TLS certificates, and the analysis of website content. We propose a novel ranking technique to select the domain owner among multiple identified entities. Our approach identifies the domain owner with an F1 score of 0.94 compared to 0.54 for WHOIS. When applied on 3,001 tracker domains from the popular Disconnect list, it identifies needed updates to the list. It also attributes 84% of previously unattributed tracker domains.
Defence
How we proactively defend our environments.
Microsoft Incident Response lessons on preventing cloud identity compromise
Lessons from the vendor who gets at scale visibility of the thematics. The transparency and insight here is of note.
The team has observed common misconfigurations for both Microsoft Entra ID and on-premises Active Directory across various industry verticals. While Microsoft Entra ID differs from on-premises Active Directory in how it functions and how it is architected, similar high-level incident response and hardening principles can be applied to both. Concepts such as administrative least privilege, regularly reviewing access and application permissions and reviewing activity are important to secure both Active Directory and Microsoft Entra ID.
Incident Writeups
How they got in and what they did.
+1500 HuggingFace API Tokens were exposed, leaving millions of Meta-Llama, Bloom, and Pythia users vulnerable
Bar Lanyado showed the impact that a robust research methodology and low hanging fruit can have. If this isn’t a wake up call to most I am not sure what would be.
Uncovered an unprecedented number of 1681 valid tokens through HuggingFace and GitHub
Exposed high-valued organization accounts like Meta, Microsoft, Google, and Vmware
Gained full access to Meta-Llama, Bloom, Pythia, and HuggingFace repositories
“Protecting From Within”
Transparency here.
A review commissioned by the Police Service of Northern Ireland (PSNI) and the Northern Ireland Policing Board (NIPB), into the PSNI data breach of 8th August 2023
Vulnerability
Our attack surface.
CVE-2023-45866: Unauthenticated Bluetooth keystroke-injection in Android, Linux, macOS and iOS
Long tail of vulnerability right here..
Unauthenticated Bluetooth keystroke-injection in Android, Linux, macOS and iOS
Android devices are vulnerable whenever Bluetooth is enabled
Linux/BlueZ requires that Bluetooth is discoverable/connectable
iOS and macOS are vulnerable when Bluetooth is enabled and a Magic Keyboard has been paired with the phone or computer
https://github.com/skysafe/reblog/tree/main/cve-2023-45866
SonicWall WXA: Authentication Bypass and RCE Vulnerability
Adam Crosser identifies a hard coded secret..
The root of the vulnerability lies in the fact that the appliance exposes a dispatcher web service that leverages a hardcoded secret key to authenticate users invoking the API interface. An attacker that has reverse engineered the dispatcher service can recover this hard coded secret key and leverage it to authenticate to the service on all other instances of the WXA appliance.
https://www.praetorian.com/blog/sonicwall-wxa-authentication-bypass-and-rce-vulnerability/
AutoSpill: Credential Leakage from Mobile Password Managers
Ankit Gangwal, Shubham Singh and Abhijeet Srivastava show that academia is getting really quite good at applied vulnerability research.
The majority of popular Android PMs considered in our experiments were found vulnerable to AutoSpill; even when the app hosting the WebView is not actively participating in the leak. Android intermediates in the autofill process because of its app sandboxing. Hence, the responsibility for any credential leakage is often stranded between PMs and the Android system. We investigate the root causes of AutoSpill and propose countermeasures to fundamentally fix AutoSpill for both the parties. We responsibly disclosed our findings to the affected PMs and Android security team.
https://dl.acm.org/doi/10.1145/3577923.3583658
Offense
Attack capability, techniques and trade-craft.
The Obvious, the Normal, and the Advanced: A Comprehensive Analysis of Outlook Attack Vectors
Haifei Li providers an interesting summary of which this table is one of the more useful artefacts
Making Okta do keylogging for you
Luke Jennings opens a new waterfront of offensive research of Living off the Cloud (LotC) - you can thank me later for that.
.. we’ll look at how Okta’s AD synchronization is pretty much SAMLjacking on steroids. We’ll also consider how it can be used as a stealthy watering-hole style lateral movement attack too.
To be clear, this isn't a vulnerability in Okta that circumvents a security boundary and needs to be patched. This is offensive use of a product feature, the SaaS version of living off the land (LOTL).
https://pushsecurity.com/blog/oktajacking/
VectorKernel: PoCs for Kernelmode rootkit techniques research
Some excellent work here which will cause paint for EDRs.
PoCs for Kernelmode rootkit techniques research or education. Currently focusing on Windows OS. All modules support 64bit OS only.
https://github.com/daem0nc0re/VectorKernel
Exploitation
What is being exploited.
Nothing this week beyond what is covered off in the Threat Intelligence section above.
Tooling and Techniques
Low level tooling and techniques for attack and defence researchers…
Blog: LLVM CFI and Cross-Language LLVM CFI Support for Rust
Ramon de C Valle closes a gap between languages with this, super powerful.
We’re pleased to share that we’ve worked with the Rust community to add LLVM CFI and cross-language LLVM CFI (and LLVM KCFI and cross-language LLVM KCFI) to the Rust compiler as part of our work in the Rust Exploit Mitigations Project Group. This is the first cross-language, fine-grained, forward-edge control flow protection implementation for mixed-language binaries that we know of.
Leaky Address Masking: Exploiting Unmasked Spectre Gadgets with Noncanonical Address Translation
Mathe Hertogh, Sander Wiebing and Cristiano Giuffrida show the arms race between offensive research and defensive research is alive and kicking.
Linear Address Masking (LAM) is a recently announced Intel feature that enables the CPU to mask off some upper bits before dereferencing a 64-bit pointer
We specifically focus on the BHI Spectre variant and show that, despite mitigations believed to eradicate the attack surface, our exploit can abuse a variety of gadgets in the latest Linux kernel and leak the root password hash within minutes from kernel memory
https://download.vusec.net/papers/slam_sp24.pdf
SAFTE: A self-injection based anti-fuzzing technique
Jianyi Zhang, Zhenkui Li, Yudong Liu, Zezheng Sun and Zhiqiang Wang deliver research which shows some interesting promise.
In this paper, we perform a systematic analysis of software protection techniques and design a novel self-injection based anti-fuzzing techniques, called SAFTE.
As a consequence of this approach, let the operational and procedural status of the program is hidden, thereby precluding the fuzzer from obtaining accurate feedback or crash-related information. Therefore, it makes the fuzzer completely useless.
https://www.sciencedirect.com/science/article/pii/S0045790623004044?via%3Dihub
Footnotes
Some other small (and not so small) bits and bobs which might be of interest.
Aggregate reporting
2023 Oct – Threat Trend Report on APT Groups - released December 8th
Cornucopia: Temporal Safety for CHERI Heaps - It extends the CheriBSD virtual-memory subsystem to track capability flow through memory and provides a concurrent kernel-resident revocation service that is amenable to multi-processor and hardware acceleration - older paper but including to address the misconception that CHERI can’t deal with temporal.
Artificial intelligence
Tree of Attacks: Jailbreaking Black-Box LLMs Automatically - In empirical evaluations, we observe that TAP generates prompts that jailbreak state-of-the-art LLMs (including GPT4 and GPT4-Turbo) for more than 80% of the prompts using only a small number of queries.
Mark My Words: Analyzing and Evaluating Language Model Watermarks - Kirchenbauer et al. can watermark Llama2-7B-chat with no perceivable loss in quality, the watermark can be detected with fewer than 100 tokens, and the scheme offers good tamper-resistance to simple attacks
Language Model Inversion Next-token probabilities can reveal significant information about preceding text; proposes a method for recovering unknown prompts from the model’s current distribution output - On Llama-2 7b, our inversion method reconstructs prompts with a BLEU of 59 and token-level F1 of 78 and recovers 27% of prompts exactly.
Hashmarks: Privacy-Preserving Benchmarks for High-Stakes AI Evaluation - we propose hashmarking, a protocol for evaluating language models in the open without having to disclose the correct answers. In its simplest form, a hashmark is a benchmark whose reference solutions have been cryptographically hashed prior to publication
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator - Experiments demonstrate that Chain of Code outperforms Chain of Thought and other baselines across a variety of benchmarks; on BIG-Bench Hard, Chain of Code achieves 84%, a gain of 12% over Chain of Thought.
A guide to LLM inference and performance - Calculating the operations per byte possible on a given GPU and comparing it to the arithmetic intensity of our model’s attention layers reveals where the bottleneck is: compute or memory. We can use this information to pick the appropriate GPU for model inference and, if our use case allows, use techniques like batching to better utilize our GPU resources.
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models - Testing on advanced MATH reasoning and APPS coding benchmarks using PaLM-2 models, we find that ReSTEM scales favorably with model size and significantly surpasses fine-tuning only on human data. Overall, our findings suggest self-training with feedback can substantially reduce dependence on human-generated data.
Large Language Models for Mathematicians - Based on recent studies, we then outline best practices and potential issues and report on the mathematical abilities of language models. Finally, we shed light on the potential of LMMs to change how mathematicians work.
Nash Learning from Human Feedback - In this study, we introduce an alternative pipeline for the fine-tuning of LLMs using pairwise human feedback. Our approach entails the initial learning of a preference model, which is conditioned on two inputs given a prompt, followed by the pursuit of a policy that consistently generates responses preferred over those generated by any competing policy, thus defining the Nash equilibrium of this preference model.
Calibrated Language Models Must Hallucinate - our analysis also suggests that there is no statistical reason that pretraining will lead to hallucination on facts that tend to appear more than once in the training data (like references to publications such as articles and books, whose hallucinations have been particularly notable and problematic) or on systematic facts (like arithmetic calculations).
Books
None this week
Events
None this week
Unless stated otherwise, linked or referenced content does not necessarily represent the views of the NCSC and reference to third parties or content on their websites should not be taken as endorsement of any kind by the NCSC. The NCSC has no control over the content of third party websites and consequently accepts no responsibility for your use of them.
This newsletter is subject to the NCSC website terms and conditions which can be found at https://www.ncsc.gov.uk/section/about-this-website/terms-and-conditions and you can find out more about how will treat your personal information in our privacy notice at https://www.ncsc.gov.uk/section/about-this-website/privacy-statement.