Introduction
I thought it would be nice for the forum to have a more structured, dynamic and efficient way of searching through the C's session transcripts, and so I sat down one weekend afternoon put some ideas down that I thought would be useful for an application that serves that purpose. In a nutshell, the way envisage it would be to have sessions broken down into "Question and Answer" units, or QAs for short, which would displayed as search results (similar to the lawofone site, except.. well, better ) and be able to search these by keywords, groups of keywords (metawords), date, etc., which would be tabled in a database to make it all a lot smoother. Another feature I thought of would be to store start and end points for sequential QAs containing the same keywords, forming QA "blocks" (QAB), which would allow one to search for complete conversations within a session on a keyword or metaword. I should add this is not a typical "text-based" kind of search program, since we already have that on the forum (i.e. using advanced search, checking the transcript board only, and putting "Laura" in the "By User" field). This is more for quick access to all mentions of a particular word, topic, or group thereof, while excluding as much of the text that is not relevant to the search using predefined terms, so you'd be selecting words from a list and have various options available rather than typing what you want in a text field.
Before I go further, allow me to explicate some definitions:
These are what I would consider the basic features, more powerful tools, such as graphical displays of data, that build on these can always be added later (I'm a big fan of both simplicity and extensibility ;) )
Data Model
This is what I have so far for the database tables. I've tried to keep it as little repetition of data as possible and as far as I can tell, is all that is needed to implement the features listed above.
This is still very much a proto-design, so suggestions and ideas are welcome, though I think it would be best to keep it to details surrounding the core features and data model for now (not to say that suggestions for more core features aren't welcome :) ). GUI designs, on the other hand, are open game. Also I realize the name could be a little more creative too..
For anyone that is interested in working on this, my personally preferred language is java with Netbeans IDE (though if anyone knows a better one for wysiwyg GUIs and integration, let me know!), simply because it'll run on any platform and it's pretty easy to code with. As for the database, ideally it would be hosted on the Cass server, if this is ok with those in charge of that, so whatever DB software is being used there (I would guess (hope ) MySQL) would have to do. I can do simple ER design, but I'm really not proficient at all when comes to writing SQL queries (disclaimer). I haven't gone much beyond what I've written here besides some thought on a few essential algorithms, most of which are quite implicit as some of you may see from the data model. If there is enough interest and support for this, I can get to work on the application's architecture and class structure, which is my favorite part and kinda my specialty.. osit
And lastly.. please feel free to ask me to clarify anything that isn't making sense.
I thought it would be nice for the forum to have a more structured, dynamic and efficient way of searching through the C's session transcripts, and so I sat down one weekend afternoon put some ideas down that I thought would be useful for an application that serves that purpose. In a nutshell, the way envisage it would be to have sessions broken down into "Question and Answer" units, or QAs for short, which would displayed as search results (similar to the lawofone site, except.. well, better ) and be able to search these by keywords, groups of keywords (metawords), date, etc., which would be tabled in a database to make it all a lot smoother. Another feature I thought of would be to store start and end points for sequential QAs containing the same keywords, forming QA "blocks" (QAB), which would allow one to search for complete conversations within a session on a keyword or metaword. I should add this is not a typical "text-based" kind of search program, since we already have that on the forum (i.e. using advanced search, checking the transcript board only, and putting "Laura" in the "By User" field). This is more for quick access to all mentions of a particular word, topic, or group thereof, while excluding as much of the text that is not relevant to the search using predefined terms, so you'd be selecting words from a list and have various options available rather than typing what you want in a text field.
Before I go further, allow me to explicate some definitions:
- Session: A full session transcript, identified by its date.
- QA: A typical question and answer pair, considered as a whole, uniquely identified by a serial number across all QAs, from oldest to most recent.
- QAB: A sequence of QAs which are contain the same search-word.
- Result: Sessions, QAs and QABs.
- Search-word: Any word or term that is used to return results.
- Keyword: A search-word that is in the actual text or manually added to be associated with a QA (like a "tag", but it that sounds boring as well that it is logically the same thing)
- Metaword: A search-word that combines multiple keywords. It is not a really a "word" per se but actually an abstract grouping (I'm hesitant to use a word like "concept", . It is possible to make so that a keyword can be associated to multiple metawords, though care would have to be taken with this but at the same time might yield interesting results.
- Metasearch: When a metasearch is executed, all the results for keywords belonging to the metaword of the keywords in the search will be returned. If we allow multiple metawords per keyword, it could be recursive to a specified depth as well (i.e. search all the metawords of the resulting keywords of the first metawords, etc.)
These are what I would consider the basic features, more powerful tools, such as graphical displays of data, that build on these can always be added later (I'm a big fan of both simplicity and extensibility ;) )
- Load one or more formatted transcripts into a database from text, automatically associating existing keywords with new QAs.
- Add keywords that associate with QAs where the word exists in the body of the QA text.
- Create relationships between keywords allowing for convenient and organized expanded searches, aka metawords
- Display a list of results based one or more search-words.
- Options for results by Session, QA or QAB.
- Options to exclude keywords if they appear in the text and/or from meta-search.
Data Model
This is what I have so far for the database tables. I've tried to keep it as little repetition of data as possible and as far as I can tell, is all that is needed to implement the features listed above.
Code:
QA Table
------------------
| QA* | QA text* |
------------------
Session Table
-----------------------------------------
| Session (date)* | first QA* | last QA* |
-----------------------------------------
Search Table
-----------------------------
| QA | keyword*? | metaword |
-----------------------------
QAB Table
----------------------------------
| searchword | start QA | end QA |
----------------------------------
* = unique per row
*? = maybe unique per row
This is still very much a proto-design, so suggestions and ideas are welcome, though I think it would be best to keep it to details surrounding the core features and data model for now (not to say that suggestions for more core features aren't welcome :) ). GUI designs, on the other hand, are open game. Also I realize the name could be a little more creative too..
For anyone that is interested in working on this, my personally preferred language is java with Netbeans IDE (though if anyone knows a better one for wysiwyg GUIs and integration, let me know!), simply because it'll run on any platform and it's pretty easy to code with. As for the database, ideally it would be hosted on the Cass server, if this is ok with those in charge of that, so whatever DB software is being used there (I would guess (hope ) MySQL) would have to do. I can do simple ER design, but I'm really not proficient at all when comes to writing SQL queries (disclaimer). I haven't gone much beyond what I've written here besides some thought on a few essential algorithms, most of which are quite implicit as some of you may see from the data model. If there is enough interest and support for this, I can get to work on the application's architecture and class structure, which is my favorite part and kinda my specialty.. osit
And lastly.. please feel free to ask me to clarify anything that isn't making sense.