Session transcripts

KJS

Dagobah Resident
FOTCM Member
Hello everyone, I've started a micro-project to merge Cassiopaea Session Transcripts into one, big, offline readable file in this thread some time ago. Many of you found it useful, so I decided to go one step further, and also provide EPUB version for offline reading. HTML version has also the images embedded as "data URLs", so there is no need to download any ZIP archive, images are "baked in".

The files are currently published to IPFS, which is a global content-addressable peer to peer file system. There is no need to install any software, there are public gateways that translate IPFS protocol to HTTPS, so we can use them to browse sessions:
If you have Brave browser, you can browse files directly:
ipfs://QmUAZhyLQTKRvEV6mD2WvtKrRHGAwpvsbxqLbBRSS2dgiw

The most recent session files are the ones with "-recent" suffix, and will be updated after session publication.

The next steps that I'd like to achieve, is to create MOBI version for Kindle, clean up scraped HTML files, and maybe do some full text indexing. If any of you have some interesting idea what else can be done, please post it.

Have a nice day :)
 
I was just thinking about EPUB creator based on your work and it's already here! Is your code open for others to create files by themselves?
Sure, I was intending to do that. I think that also building assets via GitHub pipelines will add much to the transparency. The problem is, all of the big services are "woke" (in case of GitHub, they even changed default branch to "main" from "master"), and this kind of work can be easily labelled as a "hate speech". But sure, I'll try to do that over the next weekend.
 
Hello KS,

Thank you very much for the work you've put in, both the html and epub files with embedded images are truly useful to me!

I've noticed that the contents of all four files are the same.
The "sessions-20210819-154349.html" and "sessions-recent.html" files even have the same IPFS address. Is this intentional? This way the files with the "-recent" suffix have no point as far as I understand.
 
Hello KS,

Thank you very much for the work you've put in, both the html and epub files with embedded images are truly useful to me!

I've noticed that the contents of all four files are the same.
The "sessions-20210819-154349.html" and "sessions-recent.html" files even have the same IPFS address. Is this intentional? This way the files with the "-recent" suffix have no point as far as I understand.
Hi Rico,

yes, both are the same, my intention was that files suffixed with "recent" should contain the latest sessions. I haven't generated newest one yet, tried to employ some CI/CD services for that, but I haven't had mental energy recently. A lot of negative things happened in my private life that resulted in my current mild depression and inability to cope with my commitments. Also, suddenly, my credit and debt cards stopped working in regard of online payments, which was quite odd, and cut me off from hosting services. I will update the project, but need some time for mental recovery. Sorry if I left someone down :-(

Regarding IPFS, this is the essence of its coolness: IPFS is content-addressable. Meaning, even if for ex. file has different name and attributes in the file system, it has always the same IPFS address, because the content is what matters.
 
There have been a couple of attempts to do something similar with the sessions, including one by me a long time ago. The trouble I ran into was the variations in format that required session specific rules to scrape the HTML. My idea involved full text search with tags indicating the topics discussed in each session to make navigation easier. I applaud any efforts to these ends. I never thought of using Github with CI/CD to automate the process…
 
Sorry if I left someone down :-(
You have let no one down, KS. Please don't think that. What you have done with this file is excellent and I have found it very helpful.

Also, I am sorry you are having a difficult time. Remember, you are not alone as you have a family right here on the forum, and, if you want to share, you can do so in the Swamp. Or not. It's up to you. Either way, I wish you the best in getting things straightened out so that you are on an even keel again. :hug:
 
Hi All,

well, I did something :) It'll probably satisfy some tech folks here. I've created some repositories on GitHub and used some of their spare minutes for GitHub Actions to automate some things. Mind you, that the tools are rather bare bones, but I've managed to put them together in less than day. So, for the repositories:

cassiopaea-tools
Some tools written in Go to scrape the forum session transcripts posts, and generate assets like EPUB or HTML files.

cassiopaea-actions
Basic GitHub Actions definitions that currently are scraping the forum and pushing artifacts to two other repositories. Ultimately, scraping actions will be triggered by cron, two times a month (currently triggered by pushing to master branch).

cassiopaea-assets
Automatically updated by GitHub Actions, this repository contains all the transcripts as HTML file per session, or as artifacts:
liberty239.github.io
GitHub Pages repository, automatically updated by GitHub Actions. Hosts all the transcripts merged into one, accessible as a web site hosted on GitHub:


If any of you want to collaborate, want to create organization and transfer ownership of the repositories, I'm happy to do that. Meanwhile, I'll be updating them in my free time, adding some features here and there, without any roadmap.

Why liberty239? I needed some other GitHub account to not be susceptible to account termination. Not being a creative person, the username was inspired by limited run of silver coin, that is also in my avatar here :)
 
You have let no one down, KS. Please don't think that. What you have done with this file is excellent and I have found it very helpful.

Also, I am sorry you are having a difficult time. Remember, you are not alone as you have a family right here on the forum, and, if you want to share, you can do so in the Swamp. Or not. It's up to you. Either way, I wish you the best in getting things straightened out so that you are on an even keel again. :hug:
Thanks @Nienna, looking at the problems touched in the Swamp section, mine are really insignificant and self-centered. Thanks for uplifting :)
 
looking at the problems touched in the Swamp section, mine are really insignificant and self-centered.
Problems are relative. What is hard to handle for one person is different from what is hard to handle for another. Yes, there are some catastrophic things that are just plain horrible. However, if one finds something problematic, then it usually is. Still wishing you the best.
 
There have been a couple of attempts to do something similar with the sessions, including one by me a long time ago. The trouble I ran into was the variations in format that required session specific rules to scrape the HTML. My idea involved full text search with tags indicating the topics discussed in each session to make navigation easier. I applaud any efforts to these ends. I never thought of using Github with CI/CD to automate the process…
Well, the way I'm currently doing that is to process session post with Readability (but with the implementation in Go), then through one of the HTML sanitizers, also replacing line breaks with paragraphs[1]. There were basically two problems, that needed special treating: one session name was misspelled (with triple "s"), the other one was posted in second post (22 February 2010 IIRC). If you want, you can play with processed HTML files located here:

1. cassiopaea-tools/sanitize.go at 89f11e090bf604dda82ceba3037ca62848d0cf9e · liberty239/cassiopaea-tools
 
Amazing work KS!

You have no idea how much I value having all the sessions available offline and online in an easily searchable format like this! This gives me freedom and abundance, and I feel playful and exited. THANK YOU!

I particularly enjoy coming up with keywords I'm interested in right now and learning through reading, interpreting and feeling into the results. WOW! (From the last few days: vegetar, dairy, milk, jesus, meditation, psilocybin, partner, hungar, microwave, orgasm, pork, cyst, tumor, cancer, marciniak.)
I use Cool Reader from F-Droid on /e/ OS on my phone and Foliate on MX-Linux on my laptop for the EPUB files. Firefox/IceCatMobile for the HTML.
 
I did something that I wanted to do for a long time: compile all the transcripts into PDF, and more compact EPUB files. I've attached the files, but consider this as a beta release because the transcripts are almost 3000 pages long in total, and it is really hard to spot errors or missing parts (even though I've spent a few hours doing so).

Technically, I've rendered extracted forum session transcript posts into Markdown, and then used Pandoc to assemble everything into the target format. Haven't had experience with Pandoc, but I've found that it is a great tool for self-publishing. I'll release the source code soon, just need to clean it up. I'd like to release it as a public Git repository, so it'll be possible to collectively edit and clean up some session files (there are things like links not working anymore, typos, etc.). Is it a good idea?

Edit: I'm unable to attach the EPUB file.
 

Attachments

  • sessions-beta.docx
    8.5 MB · Views: 11
  • sessions-beta.pdf
    12.6 MB · Views: 37
Hi All,

well, I did something :) It'll probably satisfy some tech folks here. I've created some repositories on GitHub and used some of their spare minutes for GitHub Actions to automate some things. Mind you, that the tools are rather bare bones, but I've managed to put them together in less than day. So, for the repositories:

cassiopaea-tools
Some tools written in Go to scrape the forum session transcripts posts, and generate assets like EPUB or HTML files.

cassiopaea-actions
Basic GitHub Actions definitions that currently are scraping the forum and pushing artifacts to two other repositories. Ultimately, scraping actions will be triggered by cron, two times a month (currently triggered by pushing to master branch).

cassiopaea-assets
Automatically updated by GitHub Actions, this repository contains all the transcripts as HTML file per session, or as artifacts:
liberty239.github.io
GitHub Pages repository, automatically updated by GitHub Actions. Hosts all the transcripts merged into one, accessible as a web site hosted on GitHub:


If any of you want to collaborate, want to create organization and transfer ownership of the repositories, I'm happy to do that. Meanwhile, I'll be updating them in my free time, adding some features here and there, without any roadmap.

Why liberty239? I needed some other GitHub account to not be susceptible to account termination. Not being a creative person, the username was inspired by limited run of silver coin, that is also in my avatar here :)
Hello KS,

https://liberty239.github.io says "Built by uid:1001@fv-az128-954 at Sun, 01 May 2022 01:31:29 UTC"

You wrote "used some of their spare minutes for GitHub Actions" - does this mean that the minutes for the actions have run out on 1 May?

Is there something I could do to have an up-to-date version of the EPUB and the HTML file?

Even after finding this page: GitHub Push - GitHub Marketplace I still don't understand what is GitHub Push Actions and how to use them :/.
 
Back
Top Bottom