foofighter said:
Right, and like I said, location might be broken up into several fields. That way you can find out events happening in the same place or same region at a particular time.
Agreed. That makes the filtering much easier than doing a spatial query on lat/long.
foofighter said:
If all you want to do is get data and display it in a timeline, you're absolutely right. That's not rocket science, and a relational database might work ok to do it. The moment you want to do anything else with the same data you might be in trouble though.
Not even remotely true. I'm not speaking theoretically, here. I have a *lot* of experience implementing relational databases in very high traffic sites with various types of data structures. Let's not get into that here, though. This sort of discussion will only serve to suck energy out of the task at hand.
foofighter said:
But it all really depends on how you want to use the data, and how complex the data is going to be. For example, right now it seems you only handle events. Do you want to handle people too?
As I already wrote:
Burma Jones said:
What we have in the db schema that is implemented on the timeline yet but will be very soon:
tags (for filtering)
people involved (this will give an option of either getting more info on the person or filtering the timeline to show events involving this person)
foofighter said:
Like, "Dude X was born on 1234, son of Y, did Z in his life, knew A,B and C" and so on. That's a completely different way to look at similar data (it has the "born on" related to the events, but everything else is different), and which will require its own table if you use a relational database. But then it will be harder to do queries that relate events and people.
No, it isn't harder.
foofighter said:
But again, it all depends on if you want to do something like that in the first place. What you have now is simple, and perfectly fine. If you want to expand, it might be simplistic and not so practical. That's all.
No, it isn't difficult to expand on what I have. And remember, you haven't seen the entire database schema.
foofighter said:
I wasn't really thinking of the timeline, but more the Longwell widget:
http://simile.mit.edu/wiki/Longwell
and of course the Exhibit project:
http://www.simile-widgets.org/exhibit/
But we aren't implementing those widgets, we are implementing the timeline. Though as I said, if we do implement them or something else, delivering the data in whatever form they prefer from mysql is a trivial matter.
foofighter said:
So if you have a spreadsheet with 1000 rows, and about 15 columns, you want people to put that into a web form? If one row takes about 1 minute to enter (4 secs per cell), that'll be roughly 17 hours for the data entry. Is that what you are suggesting? How is that making it easy for those collecting the data? Also, what date is this: "09/05/06"? There's no way you can know, unless the format is given.
Yes, I was engaging in a bit of hyperbole there. It would be faster for folks to adjust the dates in a spreadsheet than to enter each into a web form, but needing to adjust the dates in a spreadsheet still requires extra work and is prone to error. The best way to handle the situation above is simply to attach a note saying something like "Dates are in the form yy/mm/dd" then parsing them is easy. If folks are data mining from the web, a given source will have a specific way in which it formats dates. Keeping each of those data sources in its own file would be very handy. Also, if the data source is from the web, we shouldn't ever have the problem of arbitrarily formatted dates.
The long and short of it is this: in order to move forward we need data. Spreadsheet form would be just lovely. XML, RDF or any other structured format would be just dandy to. Heck, mix and match if you like. Data translation from one structured format to another is an easy thing to do. Just lets get the data in one spot to play with. All of the talk doesn't really get to the heart of the matter: DO something. You can't steer a ship that isn't moving.
Here is a list of the columns needed in your spreadsheet, if that is how you collect data (absolutely necessary info is in green):
Name | Description | start_datetime | end_datetime | Image | Type | Country | Region/County | City | Postal Code | Lat | Long |
That should cover the basics of events. If I've left anything out, let me know, or just add it to your own spreadsheet and it will be accounted for in the data model.
People should be covered with:
First name | Middle Name | Last Name | Birth place | Birth date | Death Date | Description | event_ids |
I realize the first/middle/last name format doesn't really cover the majority of names in the world. One obvious solution is to put family name into last name and given name into first name, will everything else going into middle name. There might be a better solution to this, though. Anyone have one?
As for linking events to people, there will need to be some sort of id on the events data. You can simply use the row number from your spreadsheet. Since people would often be linked to more than one event, you can just put the spreadsheet row numbers into the event_ids column separated by commas (1,57,32058). Notice that I didn't use a comma in the number 32058. Just pointing that out since writing large numbers with commas can be a habit.
This should get us going. Please, feel free to add, suggest, etc. Let's just get this show on the road. :o)