About Sessions.exe

@L: You are welcome, but please note that sessions.exe is old
and replaced with: here .

There are more comments, corrections, clarification, context,
updates, and additional dates added with the online version,
and please note the warning message:

We strongly discourage people from reading the Cassiopaean transcripts on their own, outside of the context provided by Laura's work, as in our experience people often misinterpret them and tend to project their own ideas, beliefs, and biases onto them. Therefore we advise the reader to read them in context, that is in the Wave and Adventures with Cassiopaea series.
 
Just to give a status update on my organizer: it successfully scrapes the session text for most sessions from the forum and writes it to JSON files. It's smart enough to separate the session date and parse it, the names of the participants, the session text, and any header/footer sections. Footnotes aren't recognized yet. I'm stuck trying to figure out how to read sessions like http://cassiopaea.org/forum/index.php?topic=16438.msg141363#msg141363 with the preceding text with multiple quotations in the first post and the session in the second post. If anyone who's technically savvy has any ideas, you are more than welcome to contribute. Simple Machines Forum is not the most user-friendly design to scrape. Alternatively, if there are plaintext versions of the sessions, that might be a lot easier.
 
In addition to scraping to JSON format, I'm also adding a feature to first scrape sessions to simply-formatted HTML files, similar to the ones included with sessions.exe. This is mainly to reduce repeated requests to the forum webserver while I write and test the program. I'm finding that the slight variances in format from session to session requires writing session-specific handling instructions. I figured out how to scrape http://cassiopaea.org/forum/index.php/topic,16438.msg141363.html#msg141363 and I'm continuing to make progress slowly but surely. Almost every session has something unique that doesn't fit the "generic" session scraper rules.
 
Hi endescent,

Not sure if it will help, but there might be an advantage in scraping the mobile version of a thread. I believe if you hack the URL of a thread, you will get the mobile version, which might be easier to work with. After the thread or message identifier, add a slash and then Wap2.html. For example, the June 11, 2011 thread (http://cassiopaea.org/forum/index.php/topic,23860.0.html) would become http://cassiopaea.org/forum/index.php/topic,23860/wap2.html

The result is closer to plain text and might help with parsing.

Fwiw,
Gonzo
 
You, Sir Gonzo, are a genius. That's just what I need! Only sorry I didn't think of it first. Thank you Gonzo, and thank you Cesar. Things should move much faster now.
 

Trending content

Back
Top Bottom