US plans massive data sweep

george_again · Feb 9, 2006

http://www.csmonitor.com/2006/0209/p01s02-uspo.html?s=hns

US plans massive data sweep

Little-known data-collection system could troll news, blogs, even e-mails. Will it go too far?
By Mark Clayton | Staff writer of The Christian Science Monitor

The US government is developing a massive computer system that can collect huge amounts of data and, by linking far-flung information from blogs and e-mail to government records and intelligence reports, search for patterns of terrorist activity.

The system - parts of which are operational, parts of which are still under development - is already credited with helping to foil some plots. It is the federal government's latest attempt to use broad data-collection and powerful analysis in the fight against terrorism. But by delving deeply into the digital minutiae of American life, the program is also raising concerns that the government is intruding too deeply into citizens' privacy.

"We don't realize that, as we live our lives and make little choices, like buying groceries, buying on Amazon, Googling, we're leaving traces everywhere," says Lee Tien, a staff attorney with the Electronic Frontier Foundation. "We have an attitude that no one will connect all those dots. But these programs are about connecting those dots - analyzing and aggregating them - in a way that we haven't thought about. It's one of the underlying fundamental issues we have yet to come to grips with."

The core of this effort is a little-known system called Analysis, Dissemination, Visualization, Insight, and Semantic Enhancement (ADVISE). Only a few public documents mention it. ADVISE is a research and development program within the Department of Homeland Security (DHS), part of its three-year-old "Threat and Vulnerability, Testing and Assessment" portfolio. The TVTA received nearly $50 million in federal funding this year.

DHS officials are circumspect when talking about ADVISE. "I've heard of it," says Peter Sand, director of privacy technology. "I don't know the actual status right now. But if it's a system that's been discussed, then it's something we're involved in at some level."

Data-mining is a key technology

A major part of ADVISE involves data-mining - or "dataveillance," as some call it. It means sifting through data to look for patterns. If a supermarket finds that customers who buy cider also tend to buy fresh-baked bread, it might group the two together. To prevent fraud, credit-card issuers use data-mining to look for patterns of suspicious activity.

What sets ADVISE apart is its scope. It would collect a vast array of corporate and public online information - from financial records to CNN news stories - and cross-reference it against US intelligence and law-enforcement records. The system would then store it as "entities" - linked data about people, places, things, organizations, and events, according to a report summarizing a 2004 DHS conference in Alexandria, Va. The storage requirements alone are huge - enough to retain information about 1 quadrillion entities, the report estimated. If each entity were a penny, they would collectively form a cube a half-mile high - roughly double the height of the Empire State Building.

But ADVISE and related DHS technologies aim to do much more, according to Joseph Kielman, manager of the TVTA portfolio. The key is not merely to identify terrorists, or sift for key words, but to identify critical patterns in data that illumine their motives and intentions, he wrote in a presentation at a November conference in Richland, Wash.

For example: Is a burst of Internet traffic between a few people the plotting of terrorists, or just bloggers arguing? ADVISE algorithms would try to determine that before flagging the data pattern for a human analyst's review.

At least a few pieces of ADVISE are already operational. Consider Starlight, which along with other "visualization" software tools can give human analysts a graphical view of data. Viewing data in this way could reveal patterns not obvious in text or number form. Understanding the relationships among people, organizations, places, and things - using social-behavior analysis and other techniques - is essential to going beyond mere data-mining to comprehensive "knowledge discovery in databases," Dr. Kielman wrote in his November report. He declined to be interviewed for this article.

One data program has foiled terrorists

Starlight has already helped foil some terror plots, says Jim Thomas, one of its developers and director of the government's new National Visualization Analytics Center in Richland, Wash. He can't elaborate because the cases are classified, he adds. But "there's no question that the technology we've invented here at the lab has been used to protect our freedoms - and that's pretty cool."

As envisioned, ADVISE and its analytical tools would be used by other agencies to look for terrorists. "All federal, state, local and private-sector security entities will be able to share and collaborate in real time with distributed data warehouses that will provide full support for analysis and action" for the ADVISE system, says the 2004 workshop report.

A program in the shadows

Yet the scope of ADVISE - its stage of development, cost, and most other details - is so obscure that critics say it poses a major privacy challenge.

"We just don't know enough about this technology, how it works, or what it is used for," says Marcia Hofmann of the Electronic Privacy Information Center in Washington. "It matters to a lot of people that these programs and software exist. We don't really know to what extent the government is mining personal data."

Even congressmen with direct oversight of DHS, who favor data mining, say they don't know enough about the program.

"I am not fully briefed on ADVISE," wrote Rep. Curt Weldon (R) of Pennsylvania, vice chairman of the House Homeland Security Committee, in an e-mail. "I'll get briefed this week."

Privacy concerns have torpedoed federal data-mining efforts in the past. In 2002, news reports revealed that the Defense Department was working on Total Information Awareness, a project aimed at collecting and sifting vast amounts of personal and government data for clues to terrorism. An uproar caused Congress to cancel the TIA program a year later.

Echoes of a past controversial plan

ADVISE "looks very much like TIA," Mr. Tien of the Electronic Frontier Foundation writes in an e-mail. "There's the same emphasis on broad collection and pattern analysis."

But Mr. Sand, the DHS official, emphasizes that privacy protection would be built-in. "Before a system leaves the department there's been a privacy review.... That's our focus."

Some computer scientists support the concepts behind ADVISE.

"This sort of technology does protect against a real threat," says Jeffrey Ullman, professor emeritus of computer science at Stanford University. "If a computer suspects me of being a terrorist, but just says maybe an analyst should look at it ... well, that's no big deal. This is the type of thing we need to be willing to do, to give up a certain amount of privacy."

Others are less sure.

"It isn't a bad idea, but you have to do it in a way that demonstrates its utility - and with provable privacy protection," says Latanya Sweeney, founder of the Data Privacy Laboratory at Carnegie Mellon University. But since speaking on privacy at the 2004 DHS workshop, she now doubts the department is building privacy into ADVISE. "At this point, ADVISE has no funding for privacy technology."

She cites a recent request for proposal by the Office of Naval Research on behalf of DHS. Although it doesn't mention ADVISE by name, the proposal outlines data-technology research that meshes closely with technology cited in ADVISE documents.

Neither the proposal - nor any other she has seen - provides any funding for provable privacy technology, she adds.

-----------

Some in Congress push for more oversight of federal data-mining

Amid the furor over electronic eavesdropping by the National Security Agency, Congress may be poised to expand its scrutiny of government efforts to "mine" public data for hints of terrorist activity.

"One element of the NSA's domestic spying program that has gotten too little attention is the government's reportedly widespread use of data-mining technology to analyze the communications of ordinary Americans," said Sen. Russell Feingold (D) of Wisconsin in a Jan. 23 statement.

Senator Feingold is among a handful of congressmen who have in the past sponsored legislation - unsuccessfully - to require federal agencies to report on data-mining programs and how they maintain privacy.

Without oversight and accountability, critics say, even well-intentioned counterterrorism programs could experience mission creep, having their purview expanded to include non- terrorists - or even political opponents or groups. "The development of this type of data-mining technology has serious implications for the future of personal privacy," says Steven Aftergood of the Federation of American Scientists.

Even congressional supporters of the effort want more information about data-mining efforts.

"There has to be more and better congressional oversight," says Rep. Curt Weldon (R) of Pennsylvania and vice chairman of the House committee overseeing the Department of Homeland Security. "But there can't be oversight till Congress understands what data-mining is. There needs to be a broad look at this because they [intelligence agencies] are obviously seeing the value of this."

Data-mining - the systematic, often automated gleaning of insights from databases - is seen "increasingly as a useful tool" to help detect terrorist threats, the General Accountability Office reported in 2004. Of the nearly 200 federal data-mining efforts the GAO counted, at least 14 were acknowledged to focus on counterterrorism.

While privacy laws do place some restriction on government use of private data - such as medical records - they don't prevent intelligence agencies from buying information from commercial data collectors. Congress has done little so far to regulate the practice or even require basic notification from agencies, privacy experts say.

Indeed, even data that look anonymous aren't necessarily so. For example: With name and Social Security number stripped from their files, 87 percent of Americans can be identified simply by knowing their date of birth, gender, and five-digit Zip code, according to research by Latanya Sweeney, a data-privacy researcher at Carnegie Mellon University.

In a separate 2004 report to Congress, the GAO cited eight issues that need to be addressed to provide adequate privacy barriers amid federal data-mining. Top among them was establishing oversight boards for such programs.

.

Laura · Feb 9, 2006

Since we know that they aren't really looking for terrorists, since THEY invented them and faked 9-11, we have to think that what they are really looking for is networks of normal people who have the potential to wake others up and stimulate a broad, general resistance to the pathocratic state.

One of the QFG researchers commented the other day:

Well something I noticed at the weekend listening to the 'serious'
radio station over here "BBC Radio 4", was in an interview with some
commentator, I forget who, who was pushing the theme of 'networks' in
relation to terrorists, " THEY have these networks..."

Put me in mind of the "Chavez = Hitler" from Rumsfeld. Network = Evil
doers (therefore be suspicious of anything that comes from networking).

They need to establish this kind of sweeping association, that if/when
truth does start to emerge about 911 etc, the "truth movement" can be
brushed off by hanging the "network" label on it. Its a network,
therefore at some unseen level it must lead back to THEM.

Seems that another major point that Lobaczewski made is exactly so:

The actions of [pathocracy] affect an entire society, starting with the leaders and infiltrating every town, business, and institution. The pathological social structure gradually covers the entire country creating a “new class” within that nation.

This privileged class [of pathocrats] feels permanently threatened by the “others”, i.e. by the majority of normal people. Neither do the pathocrats entertain any illusions about their personal fate should there be a return to the system of normal man.

That last part is pretty chilling. Anybody and everybody who is not psychopathic is "the enemy." And of course, that is about 94% of the population. So they have to work pretty hard to develop methods to keep control over so many people when they are actually so few.

That is one reason why I have begun to rethink the idea that the internet will be shut down completely. It is such a fantastic tool for them to keep tabs on everybody, why would they throw that away? Why would they go back to listening to phone calls and opening mail exclusively when they can just create a program to do the sweeps, dump everything into a database, let a machine sort it, and have it spit out the names of people who might be resistors.

I guess what that means is that the "real resistance" is going to have to give up the net sooner or later, or develop a special language or way of talking about things that is not so easily parsed.

j0da · Feb 9, 2006

Special language - that's a good point, Laura.

For anyone interested in the subject check wikipedia for "Paradyzja" or "Paradise, the World in Orbit" by Janusz A. Zajdel, polish science-fiction novelist.

It makes me wonder...I've read that book as a kid and I wouldn't expect it to be really useful in my wildest dreams. How fun, I've chosen interesting time to live on Earth :)

george_again · Feb 10, 2006

Laura said:
That is one reason why I have begun to rethink the idea that the internet will be shut down completely. It is such a fantastic tool for them to keep tabs on everybody, why would they throw that away? Why would they go back to listening to phone calls and opening mail exclusively when they can just create a program to do the sweeps, dump everything into a database, let a machine sort it, and have it spit out the names of people who might be resistors.

Now that I think about it more, I don't think they will shut down the Internet completely. Seems as though massive data sweeps started decades ago and the Internet is just another more effectively method.

Back in the early 80's personal computers started to become incredibly popular. More and more people bought them, and soon people started putting up dial-in electronic bulletin board systems (BBS). For those who never heard of a BBS, you can think of it sort of like this message form here at Signs of the Times, except without graphics -- they used only text. They were place to post messages back and forth with people. There was no email, no Web, no file downloads, nothing except a simple forum to post messages. And if you wanted to use a BBS you had to use a modem on your computer to dial directly to the PC running the BBS. So for example, if you wanted to visit a dozen BBS systems then you had to connect to them one at a time and pay long distance charges too if they were outside your calling area.

Anyway, BBS systems popped up all over the world by the thousands. People started communicating more openly and felt relatively secure behind the veil of a computer screen, so many people said and did things that they wouldn't normally say or do. So snooping on BBS systems would have been a great way to find out what people think and do.

Keeping up with all the activity was probably difficult for the snoops since early BBS systems were not networked. Later, BBS systems did gain a pseudo network through a message sharing system that passed messages back and forth between participating systems. But even with then there were still countless BBS systems that didn't participate in message sharing, either because it was costly (long distance bills) or because they simply didn't have the hardware and software to do it, or because they didn't want to. There were a ton of private BBS systems accessible only by invitation.

At the same time, the DoD had this little thing call the Internet, which provided connectivity between all sorts of differing computers. It was closed to the public and the only outsiders (as far as I know) that could connect to it were universities (unless a person happened to gleen a dialin account somehow).

Then right about the time BBS systems were really making inroads into the mainstream awareness, suddenly the DoD decided to open up the Internet and it exploded onto the world scene. Guess what then happened to BBS systems? They faded into almost total obscurity very quickly as people opted instead for the rapidly growing Internet.

I seriously doubt the move to open the Internet to the global public was coincidental, particulary given the role of the military today. Instead, I think it was the answer to a growing problem (information collection) and a deliberate move.

John Chang · Feb 10, 2006

I dunno, the RIAA is having a hard enough time shutting down file sharing services, and all of that is taking place out in the open, more or less. Nothing prevents them from tracking you now, it's just that there's too many people to track. They are trying to sue people, but the courts are so slow. And everything else is moving so fast.

As far as shutting the whole internet down, too much of our society depends on it. It has become like power and water and telephone, another utility. Every attempt before the internet to create a closed proprietary network has failed, some of them quite miserably.

Of course, maybe they're willing to risk shutting all of civilization down, turn the clock back to 1950, to maintain power. But then you're faced with a paradox, because then, what are you actually ruling over, then? You might be king, but you're king of not much. Then again, maybe we're not dealing with completely rational people.

Repressive regimes instinctively know the internet is bad, and do all kinds of crazy things to keep it out of their systems. Because once it's inside, there's not much you can do to control it, one way or another.

anart · Feb 10, 2006

It seems that the information available to the RIAA and the information available to DoD or NSA are two entirely different kettles of fish. If we err on the side of caution, we would assume that the data mining technology can do much much more than they say that it can. In that case, an encoded language is the only way to communicate, but the dissemination of the encoded language would be very tricky and the necessity of using it would virtually stop the expansion of the network. In the earlier era of the Cassiopaean transmissions, when asked if the craft over the house at the time was there to monitor the session at hand, wasn't their response something close to, "imagine a screen displaying your thoughts as they occur (or something along those lines - apologies for the paraphrase)" - meaning that nothing can really be hidden from the PTB if they want access to it? If this is indeed the case, then why not just continue being our honest selves and let the chips fall where they may? Perhaps when faced with an overwhelming situation, I respond with refusing to respond, thus basically giving up, and that is not a good thing - but the one thing no one and nothing can take away from us is exactly who we are. Once again, I have no answers, but I'm willing to listen if anyone else has any. =)

Justin · Feb 10, 2006

Although the NSA has expertise in code-breaking, it is possible that even a simple coded system would not get flagged in these massive dragnets of information gathering and sorting done by computers.

US plans massive data sweep

george_again

Guest

Laura

Administrator

j0da

Jedi Council Member

george_again

Guest

John Chang

Jedi

anart

A Disturbance in the Force

Justin

Jedi Master