Tech Support > Dr. Bizaramor Strikes Back
Data Mining and Data Visualization Tools
name:
I posted some days ago info about a tool (Gephi) which can be used to represent data as graphs, which are easier to understand at a glance than plain text. I was worried it would overwhelm people because I have no idea of what level of computer proficiency people have or their professional backgrounds.
(Edited Stub)
I posted below a short list of tools which I have used more or less extensively for analyzing data and/or to represent it graphically. The main criterium for posting them here is ease of use (and that I have used them to some extent). My intention is to introduce some tools which hopefully will support what SOTT does (and not become a distraction).
A good starting point in the internet for people interested in data mining and related themes is the KDNuggets website --> http://www.kdnuggets.com
Warning: the site can be a bit overwhelming because of the amount of information it provides.
Basic pre-requisites to use these tools are:
- Know how to use a text editor
- Know how to use a spreadsheet program, like Excel or Openoffice/Libreoffice
- Know your data, how it is stored, what kind of information you would like to get from it <<--- Most important
- Additionally, an understanding of basic mathematics and statistic concepts like sums, averages, standard deviations ... set (union, disjunction) and logic (and, or, not) operations ... is very helpful for anything related to the analysis of data.
- Beyond that, even a basic knowledge of regular expressions is helpful.
Basic Tools for data analysis:
- Text editors - are included with all (most) systems. Examples are notepad (Windows), vi (linux). Use to edit and organize your data. There are many commercial and free plaintext editors available.
- Spreadsheets - These work with grids of data or formulas. The most well-known examples are Excel (commercial) and Openoffice/Libreoffice (Opensource). Spreadsheets are universal tools used to store, organize, transform and evaluate data, and can be used to do complex calculations and to create graphs of numeric data.
Specialized tools - Network Visualization
A lot of what I have seen on this forum and in many of the articles published on SOTT.net deals with people, organizations and the relations between them. The following tools can represent such "social networks" as pictures, making them better understandable.
- Gephi - Visualization of graphs (networks) - I think that this one has been introduced already :-) Website: http://gephi.org
- yED - This is another tool to edit and view graphs. It is simple and versatile and at the same time powerul. Its features partially overlap with gephi. yED is distributed as a java .jar file, what means that it can run on any system which has java, and that you will probably need to install java if you have Windows (download from _www.java.com). The website for yED is http://www.yworks.com/en/products_yed_about.html
- Cytoscape - a tool similar to Gephi but IMO more difficult to handle. It hails from the bioinformatics field but has many other applications. Not recommended for beginners. Website: http://cytoscape.org
- Article on Social Networks at Wikipedia --> http://en.wikipedia.org/wiki/Social_network
Advanced universal tools
These tools can be used for a vast array of tasks related to data mining, data analysis, conversion, storage etc. Similar to spreadsheets they have a very low user entry level, but they also have features which allow for them to be used in very complex scenarios. If you understand spreadsheets and Lego, then you will likely know how to use these tools. One example use of such tools would be to download an excel file from the internet, do some needed transformations and load it into a database. These tools allow for doing operations on data, like calculations, replacing strings, selecting values, etc. but instead of showing you the data as a grid, you drop "steps" - icons representing some sort of calculation - on a canvas and connect them.
- Knime - The Konstanz Information Miner. just like spreadsheets, It offers many - several hundreds - of operators on data, it can read various types of data and you can even construct web spiders for various purposes with it. It should be intuitive enough for people to be able to put together simple analysis jobs on-the-fly. Be warned though, that it has many features and components which are not intuitive.
The website for Knime is http://knime.org
- Rapidminer - This tool is very similar to Knime, but IMO less intuitive (but it makes prettier graphics). If you think in "variables/samples" instead of "columns/rows" then you might find it interesting. Website for this product is http://rapid-i.com/
I will leave this here for now, in the hope that the info is helpful.
(Edited and expanded lots)
Approaching Infinity:
I think this is a great idea. While reading certain books like Controversy of Zion and Family of Secrets (on the Bush family), I thought it would be great to see all the personal connections mentioned in diagram form. Now, maybe we can do it? Also, the Dutroux perps...
dant:
@ name: Please note that Gephi 0.8.1 is Beta.
I have joined the forum to complain about some
things that aren't working very well. This site is
moderated so, I cannot see if my posting will be
accepted. Ugh.
For me, these are the problems I found so far:
1) Preview simply is empty.
2) Snapshot hangs (spins its wheels)
3) Export: pdf,png,svg File... seems to be unstable.
a) If one uses PNG, selections Options, and sets "Transparent"
it remains in effect unless one restarts gephi. Unchecking
has no effect.
b) The saved file image node text is clipped at the borders just
like your posted image in the Fucilla thread.
c) A random situation occurred that I could no longer save
the image with full details, the text completely disappeared
for all objects (node, edges, etc.) and somehow, instead of
straight connecting lines, they are all curved connecting lines!
So basically, I am having trouble saving an image file.
Those who wish to use gephi as a project ought to join the gephi
forum to keep track of updates, report bugs, and so on, osit.
(Forum @ _http://forum.gephi.org)
name:
@Dant
Yes, I forgot to mention the Beta status :-( probably because I have become so accustomed to the bugs that I work around them.
I know some other tools, but gephi is IMO the simplest one to use, that's why I introduced it here.
Re your problems:
1) Click Refresh in the Preview page, then switch to Overview and back. Repeat if the first time it does not work. Zoom in and out.
2) Have never seen this one.
3.a) I have no problems exporting transparent images.
3.b) Oh how awkward of it to clip the picture:-) To work around this, place some (unconnected) nodes a bit outside of your graph before exporting.
3.c) It looks like it forgets stuff. Try the following to correct this situation:
Set your preferences in the pane on the left,
- click "Show Labels" under Node Labels.
- click "Show Labels" under Edge Labels.
- under "Edges", de-select "Curved"
- Check the fonts and set some which you have lest it fumbles that too in the output
Save your preset by using the small button right over the "Presets" dropdown.
It looks like a good idea to bring these things to the Gephi forum.
Here is the Fucilla graph re-exported to avoid the clipping:
dant:
@name
What I mean in 3a is that once you set to transparent,
you cannot unset it. You have to restart gephi.
Thanks for the instructional/tutorial details so
these instructions should help others as well.
I can create the file images and get the text from
being clipped!
I wonder if the other gephi details in the threads posted in
fucilla ought to be merged into this post so that this post
flows better?
Again thanks!
Navigation
[0] Message Index
[#] Next page
Go to full version