Str!ke
Padawan Learner
Bear said:Wow! I'm going to us it to finish up a part of a show I said I would do.
Here is a tutorial on how to do captions and I'm assuming you can just copy the text that shows up to the right of the Youtube video you are captioning and paste it into a document. Then it seems a matter of just checking the text in your document as you listen to the show. Is that what you did Str!ke?
http://www.youtube.com/watch?v=Y7FDktLN_f8
Well kind of, yes; except that I downloaded it (you can download it in .srt, .vtt and/or .sbv formats) and did some more things than that, which I explain below.
There's also another way to get a transcript: with After Effects and Premiere "Analyze Content" feature, it will create you the text it recognizes from the audio. Which particularly, I didn't find it as useful as YouTube's, because there were more wrong words, in my opinion. But you can try it too, if you want to.
Ok, so here's how I did it:
Summary/Workflow
I'm writing this if it helps someone that is doing a transcript; because it took me about 3 days to figure it out, so hopefully it won't take you so much time to do it.
First of all, there are probably other ways to do it (like with AfterEffects/Premiere 'Analyze Content' feature). This is how I did it:
1.Transcript file format .srt
· I'll assume you have downloaded the file from YouTube. Preferably download the .srt one. (I could explain if you need to: how to create a video if you have only audio, split it, upload it and download the captions file)
If you open it, you will see something like this:
Code:1 00:00:00,229 --> 00:00:04,710 plan you just have to join community of like-minded people 2 00:00:04,710 --> 00:00:08,050 way we did I mention sq when I'm back immediately 3 00:00:08,050 --> 00:00:11,400 it's a community community here where you show some interest for getting
So I thought I would need to remove, if possible: the timecodes and linebreaks, so it would be all like one big paragraph. (or at least that's what I prefer)
2. Removing Timecodes
I found two ways to do it: with a text editor or with a command on linux.
2.1. With a Text Editor (with RegExp)
· You will need a text editor program that can 'Find & Replace' with Regular Expressions (RegExp). I used Geany, I believe LibreOffice does too (I didn't try it though).
· Then 'Find and Replace' this for nothing (no characters):
As far as I know, it didn't remove words. It worked for me, but you might as well check if it did for you.
- "^[0-9]+$"
- "^[0-9][0-9]:[0-9][0-9]:[0-9][0-9],[0-9][0-9][0-9] --> [0-9][0-9]:[0-9][0-9]:[0-9][0-9],[0-9][0-9][0-9]+$"
Or if you know better regular expressions that can work; that's just what I could come up to.
2.2. With a command
· On Linux, you can type this, to remove the timecodes:
sed -r '/^[0-9]+$/{N;d}' filenameofyours.srt > newdesiredfilename.srt
That's it for the timecodes. Now you would have something like this:
Code:plan you just have to join community of like-minded people way we did I mention sq when I'm back immediately it's a community community here where you show some interest for getting
So next is to remove the line breaks, unless you prefer as it is, it's up to you.
3.Removing Line Breaks
Surprisingly, I found a macro that does it for you (and much more things). The only thing is, that is done in LibreOffice..
So, open your .srt file and:
1. Go to Tools> Macros> Organize Macros> LibreOffice Basic...
2. In 'Macro name' field, type whatever name you like (e.g. LineBreakMacro)
3. In 'Macro from' select "My Macros" and click 'New'.
4. It will open you a new window. On the right side, copy & paste the macro from this site (copy all the text in the code box, in the first post)
5. Then close the window that opened.
6. Go to Tools> Macros> Run Macro...
7. Double click to 'My Macros', then to 'Standard' and finally to whatever name you gave the macro, e.g. 'LineBreakMacro'
8. Then click Run.
Then it will prompt you and put:
· 'Yes'
· '0'
· 'Yes'
· '2'
· '0'
· '3'
· '2'
· 'Cancel'
· Erase the character it created. (It's at the end of the text)
Then again, run the macro. This time put:
· 'Yes'
· '0'
· 'Yes'
· '3'
· '2'
· 'Cancel'
· Erase the last character it created. (It's at the end of the text)
· Save it.
And that should be it.
Now you should have something like this:
plan you just have to join community of like-minded people way we did I mention sq when I'm back immediately it's a community community here where you show some interest for getting
Which I like it that way, so I can concentrate on the transcript rather than removing/erasing timecodes, linebreaks, etc.
4. A very useful program, Subtitle Edit
Next the program, which I find is probably the most useful tip for transcribing.
· Subtitle Edit, which can help you with: doing captions for a video, translating the caption to another language, export to several captions formats, and many other things.
Image of subtitle edit.
· Particularly, I find it pretty useful because by having an .srt file (or any other caption file) the timecodes that it has, acts as a marker; so if I want to go to an specific line, I just double-click one and it puts the audio/video in that specific line and time. And also because it can loop a particular line and move easily between them, so I can just concentrate on transcribing and not on jumping backwards everytime to hear again what is said...
(I've tried too, with another program to slow down the audio, but sometimes when it's slower I hear different words than with the original audio.)
· I had on the left side of the screen, Subtitle Edit; and on the right side, a text editor, where I was just typing the transcript.
5. Final Thoughts
You could skip the automatic captions from YouTube, the only reason to have them, for me at least: it was to have some kind of framework or base to work on (so I don't have to type that much), and to help me with some words I just couldn't understand (sometimes it did actually).
With this workflow, I went from: 8 mins of transcription for 3 hours, to 5 mins. for 1 hour, approximately.
Well it worked for me, hope it does as well for you.
If you have any doubts don't hesitate to ask.
If anything, at least see the point 4, is the most helpful for me, imho.
It might seem like a lot but it's really not. I could remove the timecodes and linebreaks for you, if you want to.
Hope it helps