SDF Chatter

4,778 readers
222 users here now
founded 2 years ago
ADMINS
SDF
1
 
 

cross-posted from: https://lemmy.sdf.org/post/32830658

[This is an op-ed by Valentin Weber, senior research fellow with the German Council on Foreign Relations. He is the author of the International Forum for Democratic Studies report “Data-Centric Authoritarianism: How China’s Development of Frontier Technologies Could Globalize Repression.” His research covers the intersection of cybersecurity, artificial intelligence, quantum technologies, and technological spheres of influence.]

[...]

While the financial, economic, technological, and national-security implications of DeepSeek’s achievement have been widely covered, there has been little discussion of its significance for authoritarian governance. DeepSeek has massive potential to enhance China’s already pervasive surveillance state, and it will bring the Chinese Communist Party (CCP) closer than ever to its goal of possessing an automated, autonomous, and scientific tool for repressing its people.

[...]

With the world’s largest public AI-surveillance networks — “smart cities” — Chinese police started to amass vast amounts of data. But some Chinese experts lamented that smart cities were not actually that smart: They could track and find pedestrians and vehicles but could not offer concrete guidance to authorities — such as providing police officers with different options for handling specific situations.

[...]

China’s surveillance-industrial complex took a big leap in the mid-2010s. Now, AI-powered surveillance networks could do more than help the CCP to track the whereabouts of citizens (the chess pawns). It could also suggest to the party which moves to make, which figures to use, and what strategies to take.

[...]

Inside China, such a network of large-scale AGI [Artificial General Intelligence] systems could autonomously improve repression in real time, rooting out the possibility of civic action in urban metropolises. Outside the country, if cities such as Kuala Lumpur, Malaysia — where China first exported Alibaba’s City Brain system in 2018 — were either run by a Chinese-developed city brain that had reached AGI or plugged into a Chinese city-brain network, they would quietly lose their governance autonomy to these highly complex systems that were devised to achieve CCP urban-governance goals.

[...]

As China’s surveillance state begins its third evolution, the technology is beginning to shift from merely providing decision-making support to actually acting on the CCP’s behalf.

[...]

The next step in the evolution of China’s surveillance state will be to integrate generative-AI models like DeepSeek into urban surveillance infrastructures. Lenovo, a Hong Kong corporation with headquarters in Beijing, is already rolling out programs that fuse LLMs with public-surveillance systems. In [the Spanish city of] Barcelona, the company is administering its Visual Insights Network for AI (VINA), which allows law enforcement and city-management personnel to search and summarize large amounts of video footage instantaneously.

[...]

The CCP, with its vast access to the data of China-based companies, could use DeepSeek to enforce laws and intimidate adversaries in myriad ways — for example, deploying AI police agents to cancel a Lunar New Year holiday trip planned by someone required by the state to stay within a geofenced area; or telephoning activists after a protest to warn of the consequences of joining future demonstrations. It could also save police officers’ time. Rather than issuing “invitations to tea” (a euphemism for questioning), AI agents could conduct phone interviews and analyze suspects’ voices and emotional cues for signs of repentance. Police operators would, however, still need to confirm any action taken by AI agents.

[...]

DeepSeek and similar generative-AI tools make surveillance technology smarter and cheaper. This will likely allow the CCP to stay in power longer, and propel the export of Chinese AI surveillance systems across the world — to the detriment of global freedom.

[Edit typo.]

2
 
 

cross-posted from: https://lemmy.sdf.org/post/31892983

Archived

TLDR:

  • China has developed an Artificial Intelligence (AI) system that adds to its already powerful censorship machine, scanning content for all kinds of topics like corruption, military issues, Taiwan politics, satire
  • The discovery was accidental, security researchers found an Elasticsearch database unsecured on the web, hosted by Chinese company Baidu
  • Experts highlight that AI-driven censorship is evolving to make state control over public discourse even more sophisticated, especially after recent releases like China's AI model DeepSeek

A complaint about poverty in rural China. A news report about a corrupt Communist Party member. A cry for help about corrupt cops shaking down entrepreneurs.

These are just a few of the 133,000 examples fed into a sophisticated large language model that’s designed to automatically flag any piece of content considered sensitive by the Chinese government.

A leaked database seen by TechCrunch reveals China has developed an AI system that supercharges its already formidable censorship machine, extending far beyond traditional taboos like the Tiananmen Square massacre.

The system appears primarily geared toward censoring Chinese citizens online but could be used for other purposes, like improving Chinese AI models’ already extensive censorship.

Xiao Qiang, a researcher at UC Berkeley who studies Chinese censorship and who also examined the dataset, told TechCrunch that it was “clear evidence” that the Chinese government or its affiliates want to use LLMs to improve repression.

“Unlike traditional censorship mechanisms, which rely on human labor for keyword-based filtering and manual review, an LLM trained on such instructions would significantly improve the efficiency and granularity of state-led information control,” Qiang said.

[...]

The dataset was discovered by security researcher NetAskari, who shared a sample with TechCrunch after finding it stored in an unsecured Elasticsearch database hosted on a Baidu server [...] There’s no indication of who, exactly, built the dataset, but records show that the data is recent, with its latest entries dating from December 2024.

[...]

An LLM for detecting dissent

In language eerily reminiscent of how people prompt ChatGPT, the system’s creator tasks an unnamed LLM to figure out if a piece of content has anything to do with sensitive topics related to politics, social life, and the military. Such content is deemed “highest priority” and needs to be immediately flagged.

Top-priority topics include pollution and food safety scandals, financial fraud, and labor disputes, which are hot-button issues in China that sometimes lead to public protests — for example, the Shifang anti-pollution protests of 2012.

Any form of “political satire” is explicitly targeted. For example, if someone uses historical analogies to make a point about “current political figures,” that must be flagged instantly, and so must anything related to “Taiwan politics.” Military matters are extensively targeted, including reports of military movements, exercises, and weaponry.

[...]

Inside the training data

From this huge collection of 133,000 examples that the LLM must evaluate for censorship, TechCrunch gathered 10 representative pieces of content.

Topics likely to stir up social unrest are a recurring theme. One snippet, for example, is a post by a business owner complaining about corrupt local police officers shaking down entrepreneurs, a rising issue in China as its economy struggles.

Another piece of content laments rural poverty in China, describing run-down towns that only have elderly people and children left in them. There’s also a news report about the Chinese Communist Party (CCP) expelling a local official for severe corruption and believing in “superstitions” instead of Marxism.

There’s extensive material related to Taiwan and military matters, such as commentary about Taiwan’s military capabilities and details about a new Chinese jet fighter. The Chinese word for Taiwan (台湾) alone is mentioned over 15,000 times in the data.

[...]

The dataset [...] say that it’s intended for “public opinion work,” which offers a strong clue that it’s meant to serve Chinese government goals [...] Michael Caster, the Asia program manager of rights organization Article 19, explained that “public opinion work” is overseen by a powerful Chinese government regulator, the Cyberspace Administration of China (CAC), and typically refers to censorship and propaganda efforts.

[...]

Repression is getting smarter

[...]

Traditionally, China’s censorship methods rely on more basic algorithms that automatically block content mentioning blacklisted terms, like “Tiananmen massacre” or “Xi Jinping,” as many users experienced using DeepSeek for the first time.

But newer AI tech, like LLMs, can make censorship more efficient by finding even subtle criticism at a vast scale. Some AI systems can also keep improving as they gobble up more and more data.

“I think it’s crucial to highlight how AI-driven censorship is evolving, making state control over public discourse even more sophisticated, especially at a time when Chinese AI models such as DeepSeek are making headwaves,” Xiao, the Berkeley researcher, said.

3
 
 

cross-posted from: https://lemmy.sdf.org/post/31583546

Archived

Security researcher Tenable successfully used DeepSeek to create a keylogger that could hide an encrypted log file on disk as well as develop a simple ransomware executable.

At its core, DeepSeek can create the basic structure for malware. However, it is not capable of doing so without additional prompt engineering as well as manual code editing for more advanced features. For instance, DeepSeek struggled with implementing process hiding. "We got the DLL injection code it had generated working, but it required lots of manual intervention," Tenable writes in its report.

"Nonetheless, DeepSeek provides a useful compilation of techniques and search terms that can help someone with no prior experience in writing malicious code the ability to quickly familiarize themselves with the relevant concepts."

"Based on this analysis, we believe that DeepSeek is likely to fuel further development of malicious AI-generated code by cybercriminals in the near future."

4
 
 

cross-posted from: https://lemmy.sdf.org/post/31552333

A Trust Report for DeepSeek R1 by VIJIL, a security resercher company, indicates critical levels of risk with security and ethics, high levels of risk with privacy, stereotype, toxicity, hallucination, and fairness, a moderate level of risk with performance, and a low level of risk with robustness.

[–] Hotznplotzn 1 points 2 months ago

A study from EnkryptAI (pdf) confirms that DeepSeek is prone to delivering misinformation and harmful content. It claims that the model is:

  • 3x more biased than Claude-3 Opus
  • 4x more vulnerable to generating insecure code than OpenAI’s O1
  • 4x more toxic than GPT-4o
  • 11x more likely to generate harmful output versus OpenAI O1
  • 3.5x more likely to produce Chemical, Biological, Radiological, and Nuclear (CBRN) content​ than OpenAI O1 and Claude-3 Opus
5
 
 

cross-posted from: https://lemmy.sdf.org/post/31525284

Archived

[...]

While the financial, economic, technological, and national-security implications of DeepSeek’s achievement have been widely covered, there has been little discussion of its significance for authoritarian governance. DeepSeek has massive potential to enhance China’s already pervasive surveillance state, and it will bring the Chinese Communist Party (CCP) closer than ever to its goal of possessing an automated, autonomous, and scientific tool for repressing its people.

Since its inception in the early 2000s, the Chinese surveillance state has undergone three evolutions. In the first, which lasted until the early 2010s, the CCP obtained situational awareness — knowledge of its citizens’ locations and behaviors — via intelligent-monitoring technology. In the second evolution, from the mid-2010s till now, AI systems began offering authorities some decision-making support. Today, we are on the cusp of a third transformation that will allow the CCP to use generative AI’s emerging reasoning capabilities to automate surveillance and hone repression.

[...]

China’s surveillance-industrial complex took a big leap in the mid-2010s. Now, AI-powered surveillance networks could do more than help the CCP to track the whereabouts of citizens (the chess pawns). It could also suggest to the party which moves to make, which figures to use, and what strategies to take.

[...]

Inside China, such a network of large-scale AGI [artificial general intelligence] systems could autonomously improve repression in real time, rooting out the possibility of civic action in urban metropolises. Outside the country, if cities such as Kuala Lumpur, Malaysia — where China first exported Alibaba’s City Brain system in 2018 — were either run by a Chinese-developed city brain that had reached AGI or plugged into a Chinese city-brain network, they would quietly lose their governance autonomy to these highly complex systems that were devised to achieve CCP urban-governance goals.

[...]

As China’s surveillance state begins its third evolution, the technology is beginning to shift from merely providing decision-making support to actually acting on the CCP’s behalf.

[...]

DeepSeek [...] is this technology that would, for example, allow a self-driving car to recognize road signs even on a street it had never traveled before. [...] The advent of DeepSeek has already impelled tech experts in the United States to take similar approaches. Researchers at Stanford University managed to produce a powerful AI system for under US$50, training it on Google’s Gemini 2.0 Flash Thinking Experimental. By driving down the cost of LLMs, including for security purposes, DeepSeek will thus enable the proliferation of advanced AI and accelerate the rollout of Chinese surveillance infrastructure globally.

[...]

The next step in the evolution of China’s surveillance state will be to integrate generative-AI models like DeepSeek into urban surveillance infrastructures. Lenovo, a Hong Kong corporation with headquarters in Beijing, is already rolling out programs that fuse LLMs with public-surveillance systems. In Barcelona, the company is administering its Visual Insights Network for AI (VINA), which allows law enforcement and city-management personnel to search and summarize large amounts of video footage instantaneously.

[...]

The CCP, with its vast access to the data of China-based companies, could use DeepSeek to enforce laws and intimidate adversaries in myriad ways — for example, deploying AI police agents to cancel a Lunar New Year holiday trip planned by someone required by the state to stay within a geofenced area; or telephoning activists after a protest to warn of the consequences of joining future demonstrations. It could also save police officers’ time. Rather than issuing “invitations to tea” (a euphemism for questioning), AI agents could conduct phone interviews and analyze suspects’ voices and emotional cues for signs of repentance.

[...]

[–] TORFdot0@lemmy.world 4 points 2 months ago (1 children)

Haha I didn’t even notice. Yeah I’m not running deepseek on my 98 build

6
7
 
 

cross-posted from: https://lemmy.sdf.org/post/29755539

South Korea has accused Chinese AI startup DeepSeek of sharing user data with the owner of TikTok in China.

"We confirmed DeepSeek communicating with ByteDance," the South Korean data protection regulator told Yonhap News Agency.

The country had already removed DeepSeek from app stores over the weekend over data protection concerns.

...

[–] notfromhere@lemmy.ml 3 points 3 months ago

This is referring to probably the Qwen 32B R1 Distill which is a fine tune by DeepSeek of Qwen 32B. This is not referring to R1 671B.

8
 
 

Archived

Here is the data at Hugging Face.

A team of international researchers from leading academic institutions and tech companies upended the AI reasoning landscape on Wednesday with a new model that matched—and occasionally surpassed—one of China's most sophisticated AI systems: DeepSeek.

OpenThinker-32B, developed by the Open Thoughts consortium, achieved a 90.6% accuracy score on the MATH500 benchmark, edging past DeepSeek's 89.4%.

The model also outperformed DeepSeek on general problem-solving tasks, scoring 61.6 on the GPQA-Diamond benchmark compared to DeepSeek's 57.6. On the LCBv2 benchmark, it hit a solid 68.9, showing strong performance across diverse testing scenarios.

...

[–] Turbonics 1 points 3 months ago

This report assumes that Deepseek purchased every GPU they trained on instead of renting it. Pure garbage written by an American AI company trying to keep the bubble afloat.

[–] theturtlemoves@hexbear.net 6 points 3 months ago

When we asked it in Chinese for the Wenchuan earthquake death toll and other politically sensitive data, the model searched exclusively for “official data” (官方统计数据) to obtain “accurate information.”

DeepSeek R1 acted like a completely different model in English. It provided sources based in Western countries for facts about the Wenchuan earthquake and Taiwanese identity and addressed criticisms of the Chinese government.

People seem to forget that LLMs are basically repeating whatever they read. An LLM's training data in language X is going to include more sources from countries where X is spoken.

9
 
 

Here is the original report.

The research firm SemiAnalysis has conducted an extensive analysis of what's actually behind DeepSeek in terms of training costs, refuting the narrative that R1 has become so efficient that the compute resources from NVIDIA and others are unnecessary. Before we dive into the actual hardware used by DeepSeek, let's take a look at what the industry initially perceived. It was claimed that DeepSeek only utilized "$5 million" for its R1 model, which is on par with OpenAI GPT's o1, and this triggered a retail panic, which was reflected in the US stock market; however, now that the dust has settled, let's take a look at the actual figures.

...

10
 
 

cross-posted from: https://lemmy.sdf.org/post/29331548

Archived

[The article shows very good examples I can't paraphrase here, but they are very illuminating.]

Is Taiwan an independent country? When pointing out DeepSeek’s propaganda problems, journalists and China watchers have tended to prompt the LLM with questions like these about the “Three T’s” (Tiananmen, Taiwan, and Tibet) — obvious political red lines that are bound to meet a stony wall of hedging and silence. “Let’s talk about something else,” DeepSeek tends to respond. Alternatively, questions of safety regarding DeepSeek tend to focus on whether data will be sent to China.

Experts say this is all easily fixable. Kevin Xu has pointed out that the earlier V3 version, released in December, will discuss topics such as Tiananmen and Xi Jinping when it is hosted on local computers — beyond the grasp of DeepSeek’s cloud software and servers.

[...]

But do coders and Silicon Valley denizens know what they should be looking for? As we have written at CMP, Chinese state propaganda is not about censorship per se, but about what the Party terms “guiding public opinion” (舆论导向). “Guidance,” which emerged in the aftermath of the Tiananmen Massacre in 1989, is a more comprehensive approach to narrative control that goes beyond simple censorship. While outright removal of unwanted information is one tactic, “guidance” involves a wide spectrum of methods to shape public discourse in the Party’s favor. These can include restricting journalists’ access to events, ordering media to emphasize certain facts and interpretations, deploying directed narrative campaigns, and drowning out unfavorable information with preferred content.

Those testing DeepSeek for propaganda shouldn’t simply be prompting the LLM to cross simple red lines or say things regarded as “sensitive.” They should be mindful of the full range of possible tactics to achieve “guidance.”

[...]

We tested DeepSeek R1 in three environments: locally on our computers — using “uncensored” versions downloaded from Hugging Face — on servers hosted by Hugging Face, and on the interface most people are using DeepSeek through: the app connected to Chinese servers. The DeepSeek models were not the same (R1 was too big to test locally, so we used a smaller version), but across all three categories, we identified tactics frequently used in Chinese public opinion guidance.

[...]

The “uncensored” version of DeepSeek’s software [...] puts official messaging first, treating the government as the sole source of accurate information on anything related to China. When we asked it in Chinese for the Wenchuan earthquake death toll and other politically sensitive data, the model searched exclusively for “official data” (官方统计数据) to obtain “accurate information.” As such, it could not find “accurate” statistics for Taiwanese identity — something that is regularly and extensively polled by a variety of institutions in Taiwan. All we got is boilerplate: Taiwan “has been an inalienable part of China since ancient times” and any move toward independent nationhood is illegal.

[...]

Tailored Propaganda?

DeepSeek R1 seems to modify its answers depending on what language is used and the location of the user’s device. DeepSeek R1 acted like a completely different model in English. It provided sources based in Western countries for facts about the Wenchuan earthquake and Taiwanese identity and addressed criticisms of the Chinese government.

Chinese academics are aware that AI has this potential. In a journal under the CCP’s Propaganda Department last month, a journalism professor at China’s prestigious Fudan University made the case that China “needs to think about how the generative artificial intelligence that is sweeping the world can provide an alternative narrative that is different from ‘Western-centrism’” — namely, by providing answers tailored to different foreign audiences.

[...]

DeepSeek’s answers have been subtly adapted to different languages and trained to reflect [Chinese] state-approved views.

[...]

11
 
 

cross-posted from: https://lemmy.sdf.org/post/29128134

Archived

A NowSecure mobile application security and privacy assessment has uncovered multiple security and privacy issues in the DeepSeek iOS mobile app that lead us to urge enterprises to prohibit/forbid its usage in their organizations.

...

Key Risks Identified:

  • Unencrypted Data Transmission: The app transmits sensitive data over the internet without encryption, making it vulnerable to interception and manipulation.
  • Weak & Hardcoded Encryption Keys: Uses outdated Triple DES encryption, reuses initialization vectors, and hardcodes encryption keys, violating best security practices.
  • Insecure Data Storage: Username, password, and encryption keys are stored insecurely, increasing the risk of credential theft.
  • Extensive Data Collection & Fingerprinting: The app collects user and device data, which can be used for tracking and de-anonymization.
  • Data Sent to China & Governed by PRC Laws: User data is transmitted to servers controlled by ByteDance, raising concerns over government access and compliance risks.

...

How to Mitigate the DeepSeek iOS App Risks

It is difficult, if not impossible, at this time to immediately mitigate the numerous security, privacy and data risks that exist in the DeepSeek iOS today. Over time, we hope the security issue will be remediated and that some of the practices impacting privacy could be addressed. But for US and EU based businesses and government agencies, it is difficult to mitigate the storage, analysis and processing of data in the People’s Republic of China. Of course, each organization can make this determination themselves and hopefully the risks outlined above provide insights and a path towards a more secure and secure iOS app.

In the meantime, there are immediate steps companies and government agencies can take:

  1. Immediately stop using the DeepSeek iOS app until security and privacy failures are sufficiently mitigated
  2. Determine if the data collection, privacy policy, terms of service and legal jurisdiction are issues that put your organization at risk
  3. Consider leveraging the DeepSeek open source model via hosted solutions from companies like Microsoft or via self-hosting the model (e.g. via Hugging Face)
  4. Investigate alternative AI apps that offer the DeepSeek open source model but with better security, privacy and data governance. Or consider other AI offerings that address your organization’s needs

...

12
 
 

cross-posted from: https://lemmy.sdf.org/post/28910537

Archived

Researchers claim they had a ‘100% attack success rate’ on jailbreak attempts against Chinese AI DeepSeek

"DeepSeek R1 was purportedly trained with a fraction of the budgets that other frontier model providers spend on developing their models. However, it comes at a different cost: safety and security," researchers say.

A research team at Cisco managed to jailbreak DeepSeek R1 with a 100% attack success rate. This means that there was not a single prompt from the HarmBench set that did not obtain an affirmative answer from DeepSeek R1. This is in contrast to other frontier models, such as o1, which blocks a majority of adversarial attacks with its model guardrails.

...

In other related news, experts are cited by CNBC that DeepSeek’s privacy policy “isn’t worth the paper it is written on."

...

view more: next ›