Session transcripts as one big HTML file

Yes. @KS Why the gap? :shock:

This technology is not as simple as many think and @KS has done a great service for those who just expect to have everything on a silver platter. I would just say thanks to @KS for giving his/her best. If "time" permits @KS may find a way to tweak the application. But consider this process is a process of refinement. Believe me, I know the hours and effort it takes to deliver a quality product. Those who expect a "free lunch" will be always disappointed. I think @Scottie would understand. If we work together we can do great things but show a little appreciation when possible.
 
This technology is not as simple as many think and @KS has done a great service for those who just expect to have everything on a silver platter. I would just say thanks to @KS for giving his/her best. If "time" permits @KS may find a way to tweak the application. But consider this process is a process of refinement. Believe me, I know the hours and effort it takes to deliver a quality product. Those who expect a "free lunch" will be always disappointed. I think @Scottie would understand. If we work together we can do great things but show a little appreciation when possible.

I do not expect a free lunch. I just commented about the file because I expected that work which is named the C's sessions WILL BE complete collection of C's sessions (because this is the default interpretation of words: "C's sessions"), I mean, we do not write a post and delete the content from the inside, for instance.

Imagine situation:
One person: "I do not understand your post. Your post looks like plenty of its content had been deleted. It doesn't make sense now."
Another person: "I have done a great service. If you think that you expect "free lunch" you will be always disappointed."

Is it normal? I do not really think so.

I open the file in the Visual Code. ...This is one of the many editors... And what I see (see attachment):

Code in HTML is condemned. And this is a simple code like that (for instance):

"<a href="#session-21-March-2008">21 March</a><br/>"

So this is really not a huge work to add a line of code with the correct link to the session. And edit the specific C's session first with ready-script in Go language.

So I was wonder why such a thing happened? KS missed something? This is not a problem, because such thing happens :-) And I am not mad on KS or something, I just expect transparency and sincerity :-) If he needs a help then let him asks others and let us make a good job, instead of shares something incomplete. That's all :-)

I see that the main code is in Go language from Google, and I don't learn Go yet, so I do not help with Go (just now, I should learn something first about Go), but can help with HTML, like many others probably who have some computer science knowledge. So if KS wants my help or another person, we can help him with HTML if he asks.

BTW, I am very glad that KS helps us, and respect his code which I saw on the GitHub :-)

And let make things clear for others who do not know anything about computer science :-) Because such people can't understand what is the GitHub, what is such a strange thing there (code in go). They can't even know how to open the HTML file in any code editor.
 

Attachments

  • code.PNG
    code.PNG
    114.8 KB · Views: 17
Why the gap between 1998 and 2008?
Because the search results last page end with that 2008 session
and also the button 'View older results' in the bottom right doesn't work if there are many result/pages or if the results are much older (I don't know exactly why it doesn't work but that has been my experience usually). The sessions from 1998 to 2008 might have been posted earlier so they would appear as results further down / at the (invisible) end of the list. The date of the posts do not necessarily order in the same way as the actual dates of the sessions themself, for example Session 24 April 1996 has a post date of May 30, 2014.
I started a similar tool once (though never finished it) and what I did was crawling the cassiopaean-session-transcripts-by-date first to get the session links. But there are some individual issues to address like sessions quoting other session, other members having posted the session, the actual session not being the first post in the thread, or some sessions being contained in quote blocks while others are in plain post text etc. . Some sessions also contain other data like images or quoted articles.
 
Wow, thanks for feedback.

Why the gap between 1998 and 2008?

The scraper is using forum search functionality, and was not following "view older results" link. I was in a wrong assumption, that 10 pages of search results were enough :)

Because the search results last page end with that 2008 session
and also the button 'View older results' in the bottom right doesn't work if there are many result/pages or if the results are much older (I don't know exactly why it doesn't work but that has been my experience usually). The sessions from 1998 to 2008 might have been posted earlier so they would appear as results further down / at the (invisible) end of the list. The date of the posts do not necessarily order in the same way as the actual dates of the sessions themself, for example Session 24 April 1996 has a post date of May 30, 2014.
I started a similar tool once (though never finished it) and what I did was crawling the cassiopaean-session-transcripts-by-date first to get the session links. But there are some individual issues to address like sessions quoting other session, other members having posted the session, the actual session not being the first post in the thread, or some sessions being contained in quote blocks while others are in plain post text etc. . Some sessions also contain other data like images or quoted articles.

Thanks for the tips, I'll try to address that. Tricky cases.

@KS .

I was just checking it out and there may be some things to iron out.
Comparing the forum version to the html version I noticed some odd character insertions.

Session 11 August 1996:


In the html version it is:



Doing some test searching with my own PDF versions there seems to be some sessions/items missed using the html version.

It may still be useful for some.

I've added some HTML mumbo-jumbo that should inform the browser to use UTF-8 character encoding. I'll attach updated file in the first post ASAP.

Edit:
I was unable to edit the first post, so I've attached updated file in this one.
 

Attachments

  • sessions-1592643302.zip
    2.1 MB · Views: 51
Actually the 'View older results' does work, I was just confused. The session 2008 21 March is crawled because it's in the first 10 result pages. The missing sessions are in the next 10 pages after clicking 'View older results'. It's confusing because it goes back to page '1' which then is actually page 11 or something like that.
 
Nice, thanks! :thup:
Code:
c.Visit("https://cassiopaea.org/forum/search/145535/?q=Session&c[title_only]=1&c[users]=Laura&o=date")
Some sessions where posted by other members, for example

Ah yes, thanks! Multi-user search (multiple users in "users" query parameter) seems to not work as expected for me. 364 sessions scraped in the file attached to this post.
 

Attachments

  • sessions-1592646435.zip
    2.1 MB · Views: 40
Code:
    c.Visit("https://cassiopaea.org/forum/search/145535/?q=Session&c[title_only]=1&c[users]=Laura&o=date")
    c.Visit("https://cassiopaea.org/forum/search/155247/?q=Session&c[title_only]=1&c[users]=Chu&o=date")
There is one by andromeda as well
Maybe just search without user, just for 'session' in title in the session forum, without searching subforums.
 
Ah yes, thanks! Multi-user search (multiple users in "users" query parameter) seems to not work as expected for me. 364 sessions scraped in the file attached to this post.
Thanks for your work KS. I commented the way I commented because it was unreliable to share with people something that was incomplete (you did not check what was the exact results of your script in Go), because there are people who do not just open the file with and make changes inside code.

As I said in the first post in this thread, I still think that you do a very good job. :-D And thanks for making these corrections. I downloaded the newest HTML file and check the dates. All sessions which are on the forum I see in the HTML. I do not count exactly every link, but I looked closely and everything looks good. Thanks for your job.
 
Thank you @KS for the nice work.:thup:

After previous session exe outdated, I manually copied each session in to its own text file (.txt) and store it in one directory specifically for sessions. I use Notepad++ editor to do folder search of string , which gives preview of results in results window. This helps me to see lines where it is matching and file name and easy navigation between individual session an. If i decides to read specific session, I click on the line in preview window, it takes me to that session. 1592608626585.png

This makes me wonder, whether we can integrate a search tool with preview window like that?
 
Back
Top Bottom