Session transcripts as one big HTML file

Sorry for my rant I reacted to the general questioning and replied to your question which was perfectly valid. I do not think you personally expected "a free lunch". I see that the thread is networking fine now. We can possibly learn how to create a useful tool/utility that can be used off-line in times of limited internet access.

Have you ever thought about changing your avatar goycobol? I've often thought that, even if it may be how you feel sometimes, having something a bit more inspiring might be a better idea, like something that represents where you want to be rather than where you feel you are (sometimes). Just a suggestion.
 
I was going to post the same thing! I have scraped the Cassiopaean Experiment transcripts that are available online (using a basic scraper that I wrote in Python) and stored them as individual text files on my computer. I use a feature in Notepad++ (a free application) called "Find in Files" to search for text inside the transcripts. Results are sorted by transcript date (which is the name of each session file) and include the sentences in which the search text occurs. When I click on any result, it opens the session that contains the search text.
Attaching an image below for reference-

View attachment 37087


@KS 's single page HTML with all the transcripts is also really cool. Love the clean formatting. Thanks a lot for sharing!!
You definitely need to share your work. How did you managed to transform HTML content to the plain text? Readability, by hand or maybe both?

It's is not a lack of gratitude. It was an overreaction. My apologies to @Luks for reacting to one particular question. I chose a poor way to express a general observation. There is so much technology that we use without thinking about the design effort that is required to make it efficient. I felt bad for @KS sharing a "free" utility and being flooded with questions but that is a part of refining an application.

I suppose everyone realizes the code may need some more testing and revisions but I think it may be useful as it is for some.
No need to feel bad for me, but thanks for the support! :) I was expecting feedback, because being a new member and posting some big HTML file out of nowhere might be viewed as a potentially malicious activity. Also, nobody asked for a such thing in the first place.

So far, I've identified the following issues:
  • 31 Oct 2001 - missing session
  • 3 Sep 2008 - session scraped incorrectly (quoted)
  • 22 Oct 2008 - session scraped incorrectly (quoted)
  • 3 Jan 2009 - session scraped incorrectly (quoted)
  • 9 Jun 2009 - session scraped incorrectly (quoted)
  • 22 Feb 2010 - session scraped incorrectly (2nd post)
 
This technology is not as simple as many think and @KS has done a great service for those who just expect to have everything on a silver platter. I would just say thanks to @KS for giving his/her best. If "time" permits @KS may find a way to tweak the application. But consider this process is a process of refinement. Believe me, I know the hours and effort it takes to deliver a quality product. Those who expect a "free lunch" will be always disappointed. I think @Scottie would understand. If we work together we can do great things but show a little appreciation when possible.
I agree this is amazing work- thank you KS!
 
Can you elaborate more or maybe post some screenshot while encountering that issue?
I am using an iPad, here are some screenshots of trying to open the updated file on Chrome & Mega (app).

791E8058-DAE2-48FE-A8E9-25CBB9F23C97.png 500233AB-4183-4B88-BD4F-7AA909CAF5DC.png
Here is a screenshot of the older file opened in Mega.
D4545148-4855-47B1-8522-1A4635AE551F.png
Maybe the updated file is too big for Mega, but I dunno why it won’t open in Chrome for me. 🤷🏼‍♀️
And once again, thank you very much for doing this @KS!
 
@KS .

I was just checking it out and there may be some things to iron out.
Comparing the forum version to the html version I noticed some odd character insertions.

Session 11 August 1996:
A: No, it would be a “discover”.

In the html version it is:

A: No, it would be a “discoverâ€.

Doing some test searching with my own PDF versions there seems to be some sessions/items missed using the html version.

It may still be useful for some.

@KS,

Whatever changes you made has fixed the above inserted character problem. Thanks. I really like what you have created.
 
You definitely need to share your work. How did you managed to transform HTML content to the plain text? Readability, by hand or maybe both?

I used Beautiful Soup, a Python library which did most of the work of scraping and converting the HTML/XML and storing the transcripts into individual .txt files. But there were a few transcripts for which I had to change the regular expressions used to parse the session date correctly (as there are multiple formats for the dates given in the transcript) or for which I had to manually copy-paste the transcript into a text file.

I am attaching the source code written in (Python + Beautiful Soup library) as well as all the 365 session transcripts in .txt format that I have scraped so far. The naming convention I used for a session is yyyymmdd to make it easier to look up text within the files using Notepad++ and also to store the files on my hard disk in chronological order.

If anyone wants to search any text, download the attached zip file called "Cassiopaean Experiment transcripts" and extract it to any folder on your computer. The folder contains all the transcripts as separate text files. Download and install the software Notepad++. Open Notepad++ and click on "Search" button at the top and select "Find in Files". Copy the location of the "Cassiopaean Experiment transcripts" folder that you downloaded previously in the "Directory :" field and type the text that you want to search for in the "Find what :" field. Make sure to keep other search options the same as in the attached pic below. Screenshot of a sample search text "knowledge" and results below-

Using Notepad++ for search within transcripts.JPG



It gives the following search results -


Cass transcript search results.JPG



If you click on any of the search results, it will open the transcript and go to the location of that text in the file. Hope this helps!


Cass transcripts search.JPG

For those interested in the scraper, its source code is also attached as a separate zip file. Any faults in the scraper are my own and encourage feedback for the same since I'm not a full time programmer, but only have basic programming knowledge which I used to build this scraper. My only intention of building a scraper was to be able to search the transcripts more quickly and easily than the online tool on the forum, which is great as well. My sincere thanks to Laura and the crew for making the transcripts freely available to everyone.
 

Attachments

Last edited:
Back
Top Bottom