name
Jedi Master
I posted some days ago info about a tool (Gephi) which can be used to represent data as graphs, which are easier to understand at a glance than plain text. I was worried it would overwhelm people because I have no idea of what level of computer proficiency people have or their professional backgrounds.
(Edited Stub)
I posted below a short list of tools which I have used more or less extensively for analyzing data and/or to represent it graphically. The main criterium for posting them here is ease of use (and that I have used them to some extent). My intention is to introduce some tools which hopefully will support what SOTT does (and not become a distraction).
A good starting point in the internet for people interested in data mining and related themes is the KDNuggets website --> http://www.kdnuggets.com
Warning: the site can be a bit overwhelming because of the amount of information it provides.
Basic pre-requisites to use these tools are:
- Know how to use a text editor
- Know how to use a spreadsheet program, like Excel or Openoffice/Libreoffice
- Know your data, how it is stored, what kind of information you would like to get from it <<--- Most important
- Additionally, an understanding of basic mathematics and statistic concepts like sums, averages, standard deviations ... set (union, disjunction) and logic (and, or, not) operations ... is very helpful for anything related to the analysis of data.
- Beyond that, even a basic knowledge of regular expressions is helpful.
Basic Tools for data analysis:
- Text editors - are included with all (most) systems. Examples are notepad (Windows), vi (linux). Use to edit and organize your data. There are many commercial and free plaintext editors available.
- Spreadsheets - These work with grids of data or formulas. The most well-known examples are Excel (commercial) and Openoffice/Libreoffice (Opensource). Spreadsheets are universal tools used to store, organize, transform and evaluate data, and can be used to do complex calculations and to create graphs of numeric data.
Specialized tools - Network Visualization
A lot of what I have seen on this forum and in many of the articles published on SOTT.net deals with people, organizations and the relations between them. The following tools can represent such "social networks" as pictures, making them better understandable.
- Gephi - Visualization of graphs (networks) - I think that this one has been introduced already Website: http://gephi.org
- yED - This is another tool to edit and view graphs. It is simple and versatile and at the same time powerul. Its features partially overlap with gephi. yED is distributed as a java .jar file, what means that it can run on any system which has java, and that you will probably need to install java if you have Windows (download from _www.java.com). The website for yED is http://www.yworks.com/en/products_yed_about.html
- Cytoscape - a tool similar to Gephi but IMO more difficult to handle. It hails from the bioinformatics field but has many other applications. Not recommended for beginners. Website: http://cytoscape.org
- Article on Social Networks at Wikipedia --> http://en.wikipedia.org/wiki/Social_network
Advanced universal tools
These tools can be used for a vast array of tasks related to data mining, data analysis, conversion, storage etc. Similar to spreadsheets they have a very low user entry level, but they also have features which allow for them to be used in very complex scenarios. If you understand spreadsheets and Lego, then you will likely know how to use these tools. One example use of such tools would be to download an excel file from the internet, do some needed transformations and load it into a database. These tools allow for doing operations on data, like calculations, replacing strings, selecting values, etc. but instead of showing you the data as a grid, you drop "steps" - icons representing some sort of calculation - on a canvas and connect them.
- Knime - The Konstanz Information Miner. just like spreadsheets, It offers many - several hundreds - of operators on data, it can read various types of data and you can even construct web spiders for various purposes with it. It should be intuitive enough for people to be able to put together simple analysis jobs on-the-fly. Be warned though, that it has many features and components which are not intuitive.
The website for Knime is http://knime.org
- Rapidminer - This tool is very similar to Knime, but IMO less intuitive (but it makes prettier graphics). If you think in "variables/samples" instead of "columns/rows" then you might find it interesting. Website for this product is http://rapid-i.com/
I will leave this here for now, in the hope that the info is helpful.
(Edited and expanded lots)
(Edited Stub)
I posted below a short list of tools which I have used more or less extensively for analyzing data and/or to represent it graphically. The main criterium for posting them here is ease of use (and that I have used them to some extent). My intention is to introduce some tools which hopefully will support what SOTT does (and not become a distraction).
A good starting point in the internet for people interested in data mining and related themes is the KDNuggets website --> http://www.kdnuggets.com
Warning: the site can be a bit overwhelming because of the amount of information it provides.
Basic pre-requisites to use these tools are:
- Know how to use a text editor
- Know how to use a spreadsheet program, like Excel or Openoffice/Libreoffice
- Know your data, how it is stored, what kind of information you would like to get from it <<--- Most important
- Additionally, an understanding of basic mathematics and statistic concepts like sums, averages, standard deviations ... set (union, disjunction) and logic (and, or, not) operations ... is very helpful for anything related to the analysis of data.
- Beyond that, even a basic knowledge of regular expressions is helpful.
Basic Tools for data analysis:
- Text editors - are included with all (most) systems. Examples are notepad (Windows), vi (linux). Use to edit and organize your data. There are many commercial and free plaintext editors available.
- Spreadsheets - These work with grids of data or formulas. The most well-known examples are Excel (commercial) and Openoffice/Libreoffice (Opensource). Spreadsheets are universal tools used to store, organize, transform and evaluate data, and can be used to do complex calculations and to create graphs of numeric data.
Specialized tools - Network Visualization
A lot of what I have seen on this forum and in many of the articles published on SOTT.net deals with people, organizations and the relations between them. The following tools can represent such "social networks" as pictures, making them better understandable.
- Gephi - Visualization of graphs (networks) - I think that this one has been introduced already Website: http://gephi.org
- yED - This is another tool to edit and view graphs. It is simple and versatile and at the same time powerul. Its features partially overlap with gephi. yED is distributed as a java .jar file, what means that it can run on any system which has java, and that you will probably need to install java if you have Windows (download from _www.java.com). The website for yED is http://www.yworks.com/en/products_yed_about.html
- Cytoscape - a tool similar to Gephi but IMO more difficult to handle. It hails from the bioinformatics field but has many other applications. Not recommended for beginners. Website: http://cytoscape.org
- Article on Social Networks at Wikipedia --> http://en.wikipedia.org/wiki/Social_network
Advanced universal tools
These tools can be used for a vast array of tasks related to data mining, data analysis, conversion, storage etc. Similar to spreadsheets they have a very low user entry level, but they also have features which allow for them to be used in very complex scenarios. If you understand spreadsheets and Lego, then you will likely know how to use these tools. One example use of such tools would be to download an excel file from the internet, do some needed transformations and load it into a database. These tools allow for doing operations on data, like calculations, replacing strings, selecting values, etc. but instead of showing you the data as a grid, you drop "steps" - icons representing some sort of calculation - on a canvas and connect them.
- Knime - The Konstanz Information Miner. just like spreadsheets, It offers many - several hundreds - of operators on data, it can read various types of data and you can even construct web spiders for various purposes with it. It should be intuitive enough for people to be able to put together simple analysis jobs on-the-fly. Be warned though, that it has many features and components which are not intuitive.
The website for Knime is http://knime.org
- Rapidminer - This tool is very similar to Knime, but IMO less intuitive (but it makes prettier graphics). If you think in "variables/samples" instead of "columns/rows" then you might find it interesting. Website for this product is http://rapid-i.com/
I will leave this here for now, in the hope that the info is helpful.
(Edited and expanded lots)