Help with creating (and proofreading) transcripts of SOTT Talk Radio shows

Bear said:
Wow! I'm going to us it to finish up a part of a show I said I would do.

Here is a tutorial on how to do captions and I'm assuming you can just copy the text that shows up to the right of the Youtube video you are captioning and paste it into a document. Then it seems a matter of just checking the text in your document as you listen to the show. Is that what you did Str!ke?

http://www.youtube.com/watch?v=Y7FDktLN_f8

Well kind of, yes; except that I downloaded it (you can download it in .srt, .vtt and/or .sbv formats) and did some more things than that, which I explain below.

There's also another way to get a transcript: with After Effects and Premiere "Analyze Content" feature, it will create you the text it recognizes from the audio. Which particularly, I didn't find it as useful as YouTube's, because there were more wrong words, in my opinion. But you can try it too, if you want to.



Ok, so here's how I did it:
Summary/Workflow

I'm writing this if it helps someone that is doing a transcript; because it took me about 3 days to figure it out, so hopefully it won't take you so much time to do it.

First of all, there are probably other ways to do it (like with AfterEffects/Premiere 'Analyze Content' feature). This is how I did it:



1.Transcript file format .srt

· I'll assume you have downloaded the file from YouTube. Preferably download the .srt one. (I could explain if you need to: how to create a video if you have only audio, split it, upload it and download the captions file)

If you open it, you will see something like this:

Code:
1
00:00:00,229 --> 00:00:04,710
plan you just have to join community of
like-minded people

2
00:00:04,710 --> 00:00:08,050
way we did I mention sq when I'm back
immediately

3
00:00:08,050 --> 00:00:11,400
it's a community community here where
you show some interest for getting

So I thought I would need to remove, if possible: the timecodes and linebreaks, so it would be all like one big paragraph. (or at least that's what I prefer)


2. Removing Timecodes

I found two ways to do it: with a text editor or with a command on linux.


2.1. With a Text Editor (with RegExp)

· You will need a text editor program that can 'Find & Replace' with Regular Expressions (RegExp). I used Geany, I believe LibreOffice does too (I didn't try it though).

· Then 'Find and Replace' this for nothing (no characters):
  • "^[0-9]+$"
  • "^[0-9][0-9]:[0-9][0-9]:[0-9][0-9],[0-9][0-9][0-9] --> [0-9][0-9]:[0-9][0-9]:[0-9][0-9],[0-9][0-9][0-9]+$"
As far as I know, it didn't remove words. It worked for me, but you might as well check if it did for you.
Or if you know better regular expressions that can work; that's just what I could come up to.



2.2. With a command

· On Linux, you can type this, to remove the timecodes:
sed -r '/^[0-9]+$/{N;d}' filenameofyours.srt > newdesiredfilename.srt



That's it for the timecodes. Now you would have something like this:

Code:
plan you just have to join community of
like-minded people

way we did I mention sq when I'm back
immediately

it's a community community here where
you show some interest for getting

So next is to remove the line breaks, unless you prefer as it is, it's up to you.


3.Removing Line Breaks

Surprisingly, I found a macro that does it for you (and much more things). The only thing is, that is done in LibreOffice..
So, open your .srt file and:

1. Go to Tools> Macros> Organize Macros> LibreOffice Basic...
2. In 'Macro name' field, type whatever name you like (e.g. LineBreakMacro)
3. In 'Macro from' select "My Macros" and click 'New'.
4. It will open you a new window. On the right side, copy & paste the macro from this site (copy all the text in the code box, in the first post)
5. Then close the window that opened.
6. Go to Tools> Macros> Run Macro...
7. Double click to 'My Macros', then to 'Standard' and finally to whatever name you gave the macro, e.g. 'LineBreakMacro'
8. Then click Run.

Then it will prompt you and put:

· 'Yes'
· '0'
· 'Yes'
· '2'
· '0'
· '3'
· '2'
· 'Cancel'
· Erase the character it created. (It's at the end of the text)

Then again, run the macro. This time put:

· 'Yes'
· '0'
· 'Yes'
· '3'
· '2'
· 'Cancel'
· Erase the last character it created. (It's at the end of the text)
· Save it.

And that should be it.

Now you should have something like this:

plan you just have to join community of like-minded people way we did I mention sq when I'm back immediately it's a community community here where you show some interest for getting

Which I like it that way, so I can concentrate on the transcript rather than removing/erasing timecodes, linebreaks, etc.


4. A very useful program, Subtitle Edit

Next the program, which I find is probably the most useful tip for transcribing.

· Subtitle Edit, which can help you with: doing captions for a video, translating the caption to another language, export to several captions formats, and many other things.

se2.jpg

Image of subtitle edit.

· Particularly, I find it pretty useful because by having an .srt file (or any other caption file) the timecodes that it has, acts as a marker; so if I want to go to an specific line, I just double-click one and it puts the audio/video in that specific line and time. And also because it can loop a particular line and move easily between them, so I can just concentrate on transcribing and not on jumping backwards everytime to hear again what is said...

(I've tried too, with another program to slow down the audio, but sometimes when it's slower I hear different words than with the original audio.)

· I had on the left side of the screen, Subtitle Edit; and on the right side, a text editor, where I was just typing the transcript.





5. Final Thoughts

You could skip the automatic captions from YouTube, the only reason to have them, for me at least: it was to have some kind of framework or base to work on (so I don't have to type that much), and to help me with some words I just couldn't understand (sometimes it did actually).

With this workflow, I went from: 8 mins of transcription for 3 hours, to 5 mins. for 1 hour, approximately.

Well it worked for me, hope it does as well for you.


If you have any doubts don't hesitate to ask.
If anything, at least see the point 4, is the most helpful for me, imho.
It might seem like a lot but it's really not. I could remove the timecodes and linebreaks for you, if you want to.

Hope it helps
 
Str!ke said:
Well kind of, yes; except that I downloaded it (you can download it in .srt, .vtt and/or .sbv formats) and did some more things than that, which I explain below.

There's also another way to get a transcript: with After Effects and Premiere "Analyze Content" feature, it will create you the text it recognizes from the audio. Which particularly, I didn't find it as useful as YouTube's, because there were more wrong words, in my opinion. But you can try it too, if you want to.

[...]

If you have any doubts don't hesitate to ask.
If anything, at least see the point 4, is the most helpful for me, imho.
It might seem like a lot but it's really not. I could remove the timecodes and linebreaks for you, if you want to.

Hope it helps

Well I tried the Adobe Premiere Pro trial and it was a total disaster of a transcription, so I wouldn't recommend trying it. The main reason I tried it is for me to use Youtube I would need to figure out have to chop the radio cast into less than 15 min segments using an editor, so I don't have to give google my phone #. Then upload them for the transcription/caption, etc. So it is more of a basics of figuring out how to chop and upload to Youtube and getting the raw transcript for me than fine tuning the raw transcript.
 
Bear said:
Well I tried the Adobe Premiere Pro trial and it was a total disaster of a transcription, so I wouldn't recommend trying it. The main reason I tried it is for me to use Youtube I would need to figure out have to chop the radio cast into less than 15 min segments using an editor, so I don't have to give google my phone #. Then upload them for the transcription/caption, etc. So it is more of a basics of figuring out how to chop and upload to Youtube and getting the raw transcript for me than fine tuning the raw transcript.

Ok to split the video, I used handbrake which then you need to change it to 'Seconds' and put '0' through '900' (15 mins. * 60 = 900 sec), for every part you want, e.g.
For a video of 2 hours long, split by 15 mins, I'd have 8 parts. So in the 1st part, I put 0 - 900; 2nd part, 900 - 1800; 3rd part, 900 - 2700.
And so on.

Also the container set it to 'MP4 file'.

Something like this:


And that's it :), then you just upload it. I had to wait about an hour so the automatic caption appeared on the video.
 
Str!ke said:
And that's it :), then you just upload it. I had to wait about an hour so the automatic caption appeared on the video.

Great, thanks. Figure I'll give it a go this weekend.
 
The table has been updated:

https://cassiopaea.org/forum/index.php/topic,31252.msg413444.html#msg413444
 
Gandalf said:

Whoops, I just 'downdated' the database, Gandalf - I just completed the PR1 for Show #2 Gun Control USA: Do Guns Protect Civil Liberties? and checked it in. I also noticed that the original IT by Tempo is not showing as 'done' in the table you linked to.

By the way, I've updated the Transop database (the file details etc), as well as the other database (on the Yahoo SOTT group page) - should transcribers and proofreaders also use this thread to notify you of work completed?
 
The Strawman said:
Gandalf said:

Whoops, I just 'downdated' the database, Gandalf - I just completed the PR1 for Show #2 Gun Control USA: Do Guns Protect Civil Liberties? and checked it in. I also noticed that the original IT by Tempo is not showing as 'done' in the table you linked to.

By the way, I've updated the Transop database (the file details etc), as well as the other database (on the Yahoo SOTT group page) - should transcribers and proofreaders also use this thread to notify you of work completed?

Hi The Strawman,

That's ok since that table is not the official table. The Transop database and the Yahoo database are the official tables and the one that are used by the members of the translation/transcription group to select what has to be done.

The trancribers and proofreaders don't have to use that thread to inform me but they must tell it in the Yahoo group and indicate it in the 2 databases.

By the way as you must have noticed it, I read your post in the yahoo group about the fact that you have finished the PR of the second show and I am working on publishing it as soon as possible.

Thanks so much for your good work. :clap:
 
Gandalf said:
The table has been updated:

https://cassiopaea.org/forum/index.php/topic,31252.msg413444.html#msg413444

Hey Gandalf, I'm almost done transcribing episode # 22. Can't remember now if I mentioned it on here or not :huh:
 
Turgon said:
Gandalf said:
The table has been updated:

https://cassiopaea.org/forum/index.php/topic,31252.msg413444.html#msg413444

Hey Gandalf, I'm almost done transcribing episode # 22. Can't remember now if I mentioned it on here or not :huh:

Hi Turgon,

As I said previously, it is not necessary to mention it here but it is on the Yahoo group. ;)

However, if you mention it here, it does give an idea to the members of this forum what other transcription is coming.
 
New transcript available on SOTT (Show #2). :dance:

See the table for the direct link https://cassiopaea.org/forum/index.php/topic,31252.msg413444.html#msg413444
 
Gandalf said:
The Strawman said:
Gandalf said:

Whoops, I just 'downdated' the database, Gandalf - I just completed the PR1 for Show #2 Gun Control USA: Do Guns Protect Civil Liberties? and checked it in. I also noticed that the original IT by Tempo is not showing as 'done' in the table you linked to.

By the way, I've updated the Transop database (the file details etc), as well as the other database (on the Yahoo SOTT group page) - should transcribers and proofreaders also use this thread to notify you of work completed?

Hi The Strawman,

That's ok since that table is not the official table. The Transop database and the Yahoo database are the official tables and the one that are used by the members of the translation/transcription group to select what has to be done.

The trancribers and proofreaders don't have to use that thread to inform me but they must tell it in the Yahoo group and indicate it in the 2 databases.

By the way as you must have noticed it, I read your post in the yahoo group about the fact that you have finished the PR of the second show and I am working on publishing it as soon as possible.

Thanks so much for your good work. :clap:

My pleasure, Gandalf. Thank you :)
 
Episode 22 is up on transop and ready to be proofread. I'll be starting on episode 29, All and Everything Part 4 in a few days.
 
Turgon said:
Episode 22 is up on transop and ready to be proofread. I'll be starting on episode 29, All and Everything Part 4 in a few days.

After listening to Show #40: Nora Gedgaudus Interview and the information she shared about neuro-feedback therapy, I'm going to put Episode #29 on hold for the moment and focus on getting Nora's interview transcribed ASAP. I have a feeling there's going to be a thread posted up about this form of therapy very soon based on the discussion on the show.
 
I want to translate to spanish the Program #35 "SURVIVING THE END OF THE WORLD". But is it written in English? :)
Regards
Dulma
 
Back
Top Bottom