Fujitsu launches new technologies to protect conversational AI from hallucinations and adversarial attacks

0
120
Fujitsu leverages AI technology to realize energy saving in network operations
Fujitsu leverages AI technology to realize energy saving in network operations

Two new technologies aim for enhanced trustworthiness in AI and will be delivered through Fujitsu’s AI Platform, Kozuchi (codename).

Tokyo, September 26, 2023 (JCN Newswire): Fujitsu today announced the launch of two new AI trust technologies to improve the reliability of the responses from conversational AI models. The newly developed technologies include a technique to detect hallucinations in conversational AI models—a phenomenon in which generative AI creates incorrect or unrelated output—and a technique jointly developed at its small research lab (1) at Ben Gurion University to detect phishing site URLs implanted in the responses of the AI through poisoning attacks that inject false information.

With the new technologies, Fujitsu aims to provide corporate and individual users with a tool to evaluate the reliability of replies from conversational AI, ultimately contributing to a more secure use of AI across a range of use cases, including for businesses aiming to implement the technology in actual operations.

Professor Yuval Elovici, Ben Gurion University, comments: “Generative AI stands as a critical domain, and within it, the hallucination detection technology Fujitsu has developed emerges as pivotal for establishing trustworthy conversational AI systems. Researchers from Ben-Gurion University (BGU) and Fujitsu have pioneered an innovative technique to enhance the security of AI-based URL filtering against adversarial threats. Our breakthrough focuses on tabular data, resulting in a more resilient defence mechanism against adversarial attacks in the realm of AI-driven URL filtering. Moving ahead, Fujitsu and Ben-Gurion University are set to collaborate on forging novel security-centric advancements within the realm of generative AI.”

Fujitsu will include these new technologies in its conversational AI core engine provided through the “Fujitsu Kozuchi (code name)—Fujitsu AI Platform,” which offers users access to a wide range of powerful AI and ML technologies. The technology to detect hallucinations in conversational AI will be available to users in Japan starting September 28, 2023, and the technology to detect phishing site URLs in responses to conversational AI will be available in October 2023. The new technologies will be both available to corporate users as a demo environment via Kozuchi and to individual users via a dedicated portal site (2). Fujitsu plans to roll out of both technologies to the global market in the future.

FujitsuAIFig1

Figure 1. Overview of trusted conversational AINewly developed technologies1. Technology for highly accurate detection of hallucination in responses of conversational AI

When applying conversational AI to business operations, businesses often use the technology to extract information related to questions from pre-registered business data and add the data as reference information when asking questions to an external conversational AI. While this method provides accurate replies and reduces hallucinations, complete prevention of hallucinations represents an ongoing issue as conversational AI, in some cases, is unable to correctly extract information related to questions and accordingly creates unrelated, incorrect replies. Although there are methods to estimate the degree to which the reply of an AI might be a hallucination (hallucination score), accurate estimation of this score remains a difficult task as conversational AI uses various different phrases to express the same fact.

Based on the observation that conversational AI frequently generates incorrect information for proper nouns and numbers and that the contents of replies tend to differ with repeated questions, Fujitsu has developed a technology to identify and focus on parts of sentences where hallucinations are likely to occur.

To calculate a highly accurate hallucination score, the new technology first breaks down the AI’s reply into three parts (subject, predicate, object, etc.) and then automatically identifies named entities within the reply. As a next step, the technology leaves these named entities blank and repeatedly asks the external AI to more accurately define these specific expressions. (Figure 2)

Fujitsu benchmarked this technology using open data, including the WikiBio GPT-3 Hallucination Dataset (3), and found that it could improve the accuracy of detection (AUC-ROC) (4) by approximately 22% compared to other state-of-the-art methods for detecting AI hallucinations, such as SelfCheckGPT (5).

FujitsuAIFig2

Figure 2. Overview of technology to detect hallucinations in conversational AI2. Technology for detection of phishing URLs in responses of conversational AI

As conversational AI creates responses based on its training data, hostile entities can trick the AI into creating responses that include manipulated information such as phishing URLs that lead to fake websites by implanting malicious information in the AI training data.

To address this issue, Fujitsu has developed a technology to detect manipulated URLs in the responses of conversational AI. Once the technology identifies a phishing URL, it issues a warning message to users.

Fujitsu’s new technology not only detects phishing URLs but also increases the AI’s resistance against existing attacks, tricking AI models into making deliberate misjudgments to ensure highly reliable responses by the AI. The newly developed technology leverages a technique jointly developed by Fujitsu and Ben-Gurion University of the Negev at the Fujitsu Small Research Lab established at Ben-Gurion University. The technology leverages the tendency that hostile entities often attack a single type of AI model and detects malicious data by processing information with various different AI models and evaluating the difference in rationale for the judgement result.

The technology can not only be used for the detection of phishing URLs but also to prevent general attacks to deceive AI models that use tabular data and can thus also be used to avoid attacks on other services.

FujitsuAIFig3

Figure 3. Overview of technology to detect phishing URLs

[1] Fujitsu Small Research Lab: An initiative where Fujitsu researchers are embedded at technology incubators at universities in Japan and internationally to conduct joint research with some of the leading minds in their fields, including professors as well as the next generation of researchers.
[2] Individual users can also try out Fujitsu’s advanced APIs and web applications by creating an account on the Fujitsu Research Portal. Fujitsu Research Portal: a portal site that has been open to the public since June 2023 to provide registered users access to trial versions of Fujitsu’s advanced technologies. Fujitsu offers advanced technologies to corporate users via “Fujitsu Kozuchi (code name): Fujitsu AI Platform” and to individual users through this portal site.
[3] WikiBio GPT-3 Hallucination Dataset: Benchmark data based on Wikipedia for hallucination detection
[4] AUC-ROC (Area Under the Curve of the Receiver Operating Characteristic Curve): The area under the curve of the curve obtained when the threshold value of the judgement is changed with respect to the abnormality score by placing the true positive rate on the vertical axis and the false positive rate on the horizontal axis. A random anomaly score is 0.5, and a perfect answer is 1.0. It is generally considered that a certain level of performance can be achieved when it is higher than 0.7.
[5] SelfCheckGPT: A hallucination detection technology developed at the University of Cambridge, UK

Fujitsu’s Commitment to the Sustainable Development Goals (SDGs)

The Sustainable Development Goals (SDGs) adopted by the United Nations in 2015 represent a set of common goals to be achieved worldwide by 2030. Fujitsu’s purpose—”to make the world more sustainable by building trust in society through innovation”—is a promise to contribute to the vision of a better future empowered by the SDGs.

Also readMartech rewards those who persevere with passion and purpose, says Shweta Sharma, Partner & CBO at Adglobal360

Do FollowCIO News LinkedIn Account | CIO News Facebook | CIO News Youtube | CIO News Twitter

About us:

CIO News, a proprietary of Mercadeo, produces award-winning content and resources for IT leaders across any industry through print articles and recorded video interviews on topics in the technology sector such as Digital Transformation, Artificial Intelligence (AI), Machine Learning (ML), Cloud, Robotics, Cyber-security, Data, Analytics, SOC, SASE, among other technology topics.