Data Mining and Data Visualization Tools

Congratulations for the good ans steady work!! Looks like I will have to look here more often :-)


Regarding DB Design:
  • For the case that I understand the discussion about IDs in the previous postings right, that ID are "optional" and using data for IDs, I have posted some considerations about this to pastebin --> http://pastebin.com/UPvyc9Pd (the weird bolding of keywords is from them).
    Please disregard if I understood your postings (seek10 & megan) wrong and there is no need for that.
  • I have been working for some time implementing a subset of NIEM (www.niem.gov) in postgres. It is related to this your work. if there is interest, I could post a schema dump during this month, when I feel it is presentable.
  • I could also help with data design/modelling if there is need - I know Postgres and Oracle.

Update: Corrected Pastebin entry
 
I made progress today -- the "Smoking" topic loads now, although there is some message loss due possibly to the fact that SMF sometimes uses an invalid URL query syntax where embedded equal signs are not properly escaped. I am going to have to incorporate an HTML "cleaner" I guess, but I didn't have time for that today. What I did do was write a kludge that changes "& " to "& " in messages bodies.

I can make the database and SSIS packages available to anyone here I know that can use them. I don't plan to post the files publicly, but I can provide access via a private forum that I have at http://far2go.net/forum. (It's currently unused). In the future we could have some purely technical discussion over there, although we should continue with this topic here for anything else. Details about installing SQL Server and loading databases, however, probably don't really belong here. (A private board here would work too, but I don't know if that would be worth the trouble. Either way is fine with me.)

If you would like to have access to the files, create an account at the above link and then PM me here on this forum with the username/email to confirm your identity (you can use any username/email there; it need not be the same as here). This offer is for those active in this topic only, and others that I know. I am being cautious because I am not sure how this database could be used by someone with different aims. If there is a problem with doing this, somebody please let me know.

In my day job I have been working to bring up SQL Server 2012 Master Data Services (MDS--included in Developer Edition) for evaluation. It's a fairly new product and the install/configure process is buggy (as you can see from my blog at http://far2go.net). Once it is running, though, it looks like a good way to build a data warehouse. You provide the model and it builds the database tables. It manages staging and versioning. It can assign surrogate keys as well. If the data already exists and you can represent it as an Excel spreadsheet, it can use that to build the model and upload the data. For my testing so far I have simply loaded data directly into Excel via a database connection and then sent it to MDS via the Excel add-in.
 
Was at a 'data visualizing marathon' this weekend, and one of the given data sets was a compiled health threat set from healthmap.org:
HealthMap is a team of researchers at Boston Children's Hospital that utilizes “online informal sources for disease outbreak monitoring and real-time surveillance of emerging public health threats.” They collect data from multiple sources that mention disease, including official reports, news aggregators, and eyewitnesses. They then compile and organize the data, with consistent labeling and metadata, and release it through their website and a live API. We have compiled alerts from 3 months in 2012 and aggregated this data into various slices (below).

Source URL: http://healthmap.org/

Time range: 91 days, from July 8th to November 8th, 2012

diseases versus alerts on time
table (csv)
[ disease name | day1 n° alerts | day 2 n° alerts | … | day 91 n° alerts ]
approx. dimensions 95x250

countries versus diseases
table (csv)
[ country name | disease 0 n° alerts | disease 1 n° alerts | … | disease 228 n° alerts ]
approx. dimensions 150x150

single disease: Meningitis (91 days, from 8th July to 8th November)
table (csv)
[ country name | day1 n° alerts | day 2 n° alerts | … | day 91 n° alerts ]
aprox. dimensions 95x25

comprehensive table
table (csv)
This file uses UTF-8 text encoding
[ country name | place name | lat. | long. | disease | date | summary | description | rating | feed | link ]
aprox. dimensions 11x12000

complete feed
file (json), > 10 MB
comprehensive information of alerts associated with disease and place, for the past 90 days
contains more than 12000 alerts

the particular compliation sets:
http://www.visualizing.org/datasets/healthmap-disease-alerts-jul-nov-2012

The running official visualisation:
http://healthmap.org/en/

The reason I post it raw is I'm still no good at extracting and processing data, so can't make any meaningful graphing for the disease tendencies and don't think healthmap.orgs visualisation is very helpful. Perhaps others would know how and find it interesting to plot outbreak tendencies.
 
parallel said:
...
The reason I post it raw is I'm still no good at extracting and processing data, so can't make any meaningful graphing for the disease tendencies and don't think healthmap.orgs visualisation is very helpful. Perhaps others would know how and find it interesting to plot outbreak tendencies.

Well, it looks a lot easier to extract and transform than forum data! What do you need done to it?
 
Megan said:
parallel said:
...
The reason I post it raw is I'm still no good at extracting and processing data, so can't make any meaningful graphing for the disease tendencies and don't think healthmap.orgs visualisation is very helpful. Perhaps others would know how and find it interesting to plot outbreak tendencies.

Well, it looks a lot easier to extract and transform than forum data! What do you need done to it?

I was merely thinking that it would be nice to see outbreaks (and other developments) mapped in a more clear way, showing colour coded entities growth in areas over time (on google earth for example). Basically having easier to read maps, but personally don't have the skills to script such a thing yet even if I had clear extracted data. Been pulling my education towards mapping but still need to bridge quite a few math/coding/organisation gaps. So can't say I need anything done with it other than just putting the info out there just in case someone else had the skills and interest to work with it.

A presentation showing some visualising trends (first 5 min).
_http://www.ted.com/talks/aaron_koblin.html
 
Back
Top Bottom