Quick Win - Formatting YouTube Transcripts for the Web

Feb 12, 2023


In this video, I share with you a technique to easily format YouTube transcripts for use on your own website.

GitHub Gist


VS Code

DNN Platform

DNN Community

https://2sxc.org https://github.com/2sic/2sxc

0:08 thank you
0:12 hello hello it's Dean and Dave I hope
0:14 everybody is doing well today I thought
0:17 I would uh share with you something I
0:19 was playing around a bit today just
0:21 having a little bit of fun with a few
0:22 things and decided I wanted to do
0:26 something to kind of enhance the content
0:29 maybe even help SEO with my DNA day
0:34 website and I thought it was uh kind of
0:37 fun what I was trying to do and the way
0:40 they ended up solving it I thought maybe
0:42 I'd share that and maybe it would help
0:44 some of you out there that are they're
0:46 looking to do some things like this so
0:48 let me share my screen here and we'll go
0:50 through a few things
0:52 so what I wanted to do was on the DNA
0:57 day website when you go to a video
1:00 detail page
1:02 it currently doesn't show anything as it
1:05 relates to
1:07 the transcript of the video so if you're
1:11 out on YouTube you know you can actually
1:13 see the transcript of it and so forth
1:15 and I thought let me try to bring the
1:17 transcript into here and that can bring
1:19 multiple benefits to this page but
1:22 wasn't sure exactly how I would go about
1:25 doing that
1:26 so if I were to click on this video and
1:29 go to the YouTube channel and I'll go
1:31 ahead and pause it so that it doesn't
1:33 play you know you can actually go to the
1:35 the three dots over here and you can
1:37 actually see the transcript to the
1:39 entire video
1:41 so I was like well okay that sounds good
1:43 so maybe I can copy all of this text
1:46 here right
1:48 I'll just go ahead and do it scroll down
1:51 to the bottom
1:52 and I'll copy this text and I'm thinking
1:54 I could probably just easily manipulate
1:56 that and paste it into
1:58 my structure content Solution that's out
2:01 there
2:02 so when I copy that I've decided I would
2:05 look at it in Visual Studio code and see
2:07 what it looks like I tried pasting it in
2:10 different formats and so forth and
2:12 you'll see that things are highlighted a
2:13 little bit here because I've got
2:15 some things already done to prepare for
2:18 this particular video
2:22 so I was like well that's not really
2:23 formatted in a great way so how can I
2:26 manipulate this to to make it look
2:28 better on the site well first I want to
2:31 go ahead and go back to the site and
2:33 show you what I did as it relates to
2:36 the actual structure content solution
2:39 Let me refresh this site just to make
2:41 sure I'm logged in still yep I am still
2:43 logged in so in my my view for this this
2:46 is my detailed view I went ahead and
2:49 added a section here to do a collapsible
2:52 button based off of a new field that I
2:55 created so I created a field called
2:57 transcript and I'll show you where that
3:00 is if I go into my admin of the app I've
3:07 got data here and for each video I've
3:09 got a set of fields and I added a
3:12 transcript field I let that be a what
3:16 you see is what you get type input for
3:20 that so that we could just paste HTML
3:23 into there that's ultimately what I want
3:25 to do because I want a nicely formatted
3:27 transcript
3:30 let's just go ahead and do it for this
3:32 this particular video here with me and
3:35 adderson Oliveira not Anderson my name
3:39 is not anderson.com I believe it's the
3:41 site those that have been around DNN
3:44 Community for a while will appreciate
3:46 that one we miss you adderson
3:49 um so let's see we've got this I've
3:51 opened up the video over here I've
3:52 already copied the transcript into my
3:55 clipboard and I've pasted it into Visual
3:58 Studio code now one thing that you'll
4:01 notice is I have a red Jacks replaced
4:05 now for those that aren't familiar uh
4:08 the the default experience for find and
4:12 replace in vs code is going to be
4:14 character based but you can click this
4:17 little asterisk over here and use
4:19 regular expressions for this so I've
4:21 already got a regex that's figured out
4:23 here and let me show you what I'm doing
4:25 so let's go back over here
4:29 I'm going to paste that same text into
4:33 here and I'm going to put a carriage
4:35 return at the end because that's part of
4:37 what my regex is going to do but let's
4:39 break this down just a little bit what
4:41 we need to do is I want a nicely
4:44 formatted table in the end where the
4:46 time stamps portion of it are in a left
4:50 column
4:51 and the text is in the right column for
4:55 each time step
4:57 but this is really just a bunch of
4:59 carriage returns at the end of the
5:00 timestamp and a carriage return at the
5:02 end of the text
5:03 so I needed to do a regex expression
5:06 that's going to be able to format all
5:09 this in one Fell Swoop and you can do
5:12 something called grouping within regex
5:16 expressions and that is done with
5:17 parentheses so you'll see that this
5:19 section here I've got a group
5:22 and then I've got a group and then I've
5:24 got another group so I end up with four
5:26 groups
5:27 and the first group is is kind of the
5:30 most complicated but it's fairly basic
5:33 in the sense that it's three things
5:35 separated by colons optionally so this
5:39 first one says grab any digit characters
5:43 from zero to two characters right
5:47 because sometimes I may not have an hour
5:50 so if you think of it as an hour and a
5:54 minutes and a seconds on the time stamp
5:59 you'll see this video here doesn't go
6:01 into the hours but I do have some videos
6:03 that cross the one hour mark so I want
6:05 to make sure that I get those as well
6:08 so that's why I did zero comma two so
6:11 that says let's match any digits that
6:14 are numeric
6:16 between zero and two digits
6:19 and then a colon and then the question
6:21 mark after that says optional for that
6:24 because in the case of like 4304 I don't
6:28 have an hour and I also don't have the
6:31 other colon so that'll help me to just
6:35 get that optionally the next one is the
6:38 same thing but instead of zero to two I
6:41 know that I'm always going to have at
6:42 least one digit here even when it is way
6:47 back at the beginning of the transcript
6:50 you'll see that it always has a zero
6:52 here so I know that I'm going to have at
6:55 least one digit
6:56 prefix to the colon but just in case I
7:00 went ahead and put a question mark in
7:01 here anyways it doesn't hurt to have
7:04 that so that that works just fine
7:06 and then I want the colon followed by
7:10 two digits because it's always going to
7:12 have two digits at the end now I could
7:15 have done this with a curly brace two
7:18 here as well but I just chose to do
7:20 backslash D and then backslash D for
7:22 each digit
7:23 so that that is my first group uh for
7:28 the regex now the second group is says
7:31 okay now when you find this also look
7:35 for a new line or Line Feed
7:39 because I want to grab the Line Feed
7:41 that is after only the time stamp here
7:46 so that is the Line Feed that follows
7:49 this match
7:51 and then this says any characters after
7:54 that so that's going to grab any
7:57 characters that are after whatever I
7:58 match so that should get the next line
8:01 here and then I also want to look at the
8:04 new line that's after that because this
8:06 one is different than the new line
8:09 that's after the time stamp so you know
8:12 I'm not a regex expert reg X expert but
8:16 this ended up I had to play around with
8:17 this quite a bit but this is what I
8:19 ended up with now come full circle back
8:22 to over here what I want to do is use
8:25 that same regular expression to find it
8:27 and what's highlighted here is what
8:29 you're going to see so if I just did a
8:31 control X here you can see that's what
8:33 my text normally looks like if I put
8:35 this in here then it's going to say okay
8:37 it's it's matching stuff
8:39 so I want to also go all the way down to
8:42 the end because I want to make sure that
8:44 I've got a carriage return here at the
8:46 end because I want to detect that new
8:48 line that's there
8:49 so now I want to take that and I want to
8:52 replace it with a nicely formatted table
8:55 row and you know that way it goes
8:59 iterates through all of these and we're
9:01 using a couple of key pieces here so the
9:04 dollar sign one is a special thing that
9:06 says give me the first group so that is
9:11 represented by the first set of
9:13 parentheses
9:15 so I want to inject that value that
9:18 matches
9:19 inside of a strong tag
9:22 within a table cell that's within a
9:26 table row
9:27 so then I want to create another table
9:29 cell and I want to use the Third
9:32 group match so the third group match is
9:36 the text that is after the new line
9:40 after the time stamp hopefully your
9:43 following along makes sense
9:45 and then I want to close my table row
9:48 and then just from a formatting
9:50 standpoint I want to add a new line
9:51 right here at the end because I want my
9:53 result here to look really nice so that
9:56 I can just paste it right into my
9:58 wysiwyg editor in to sexy
10:01 so I'll go ahead and replace and you'll
10:03 see voila it formats all of this
10:07 absolutely beautiful
10:09 um so I've got my table row I did some
10:12 uh just inline Styles here to make sure
10:14 that every all the text within that row
10:16 is aligned to the top I've got a table
10:19 cell opening I've got a strong around
10:21 the time stamp I've got a closing cell
10:24 I've got another cell here so it'll be a
10:27 two column table and there's my table
10:29 row and then I've got a new line here in
10:31 the editor now if I scroll down I want
10:34 to make sure that it's doing that for
10:36 other kind of time stamps as well right
10:38 so I've got two digits as a minutes here
10:41 and two digits a year it looks like I
10:43 mean seconds and that looks fantastic if
10:47 this was one that had a really over an
10:52 hour in in there that this will work
10:55 with that just as well so now I can take
10:58 this
10:59 copy that into my clipboard
11:02 and I can go back over to my site and
11:05 since I have added a transcript
11:09 here I can go into the source code
11:14 View
11:15 I can do my wrapping table markup
11:20 do return paste and then close my table
11:25 and if I save that you'll see that looks
11:28 fantastic so now I can save this
11:31 save this and now because of my
11:34 conditional in the in the view back here
11:37 I said hey if if I don't have
11:41 um
11:42 if if the value of transcript
11:47 is not null or white space then I want
11:51 to show this markup here I want to
11:54 render this markup and inject my
11:55 transcript into there so now if I
11:58 refresh page right I've got a view
12:01 transcript and it's a nice collapsible
12:03 and I've got a beautifully formatted
12:05 transcript here and since that is done
12:09 in the way that it's done it's still
12:10 discoverable by search engines and
12:13 that's really nice so it's a little bit
12:14 of a manual process here and I do have
12:18 to have a way to remember
12:21 what I'm doing here so what I decided to
12:23 do is go ahead and just publish a public
12:26 gist out on GitHub and you can find that
12:30 if you go to my GitHub which is David
12:32 Dash Poindexter is my username so
12:35 github.com or excuse me
12:38 gist.github.com
12:39 my username you will see there a one for
12:45 YouTube transcript replace
12:47 so this is the fine characters and this
12:50 is the replace characters and hopefully
12:53 that'll help some of you go out for
12:55 something like this but it's it's also
12:57 kind of a lesson in thinking a little
13:00 bit outside of the box you know I I
13:01 could have done all that manually but
13:03 think about all the time that that would
13:05 have taken to be able to go through all
13:08 that and for every single one of these
13:11 videos so now I can just very quickly
13:13 and we'll go through another one here
13:14 real quick do it I can very quickly go
13:18 out here and manage these so I've got to
13:21 go through here and update all of mine
13:22 so I've gotten through most of them here
13:24 so let's take this one here I'll open
13:26 the detail I can click on this I'll go
13:29 ahead and pop oops
13:32 that has the wrong Link in it so I need
13:35 to fix that one I'll come back to that
13:36 one in just a minute
13:38 let's grab this one
13:40 and go to the actual video I'll go ahead
13:42 and pause it so that it doesn't run
13:44 and we'll grab the transcript from this
13:46 so show scrap transcript
13:49 I'll go ahead and swipe this down wait
13:53 till it gets to the end I'm wondering if
13:54 this one's more than an hour ah yeah
13:56 it's more than an hour so we can see an
13:58 example of that so I'll copy that into
13:59 my clipboard I'll go over to visual
14:02 studio code select all this paste I'll
14:06 do a carriage return at the end just to
14:08 make sure that my regex is going to
14:09 capture that last line I'll do a replace
14:12 beautiful I can select all that copy it
14:18 and actually before I go back over to
14:20 the website we'll just scroll down to
14:21 see what the past one hour mark see it
14:24 handles that beautifully as well
14:26 so I'll go back over to the website now
14:29 go into edit
14:32 I'll come now and enter into the source
14:35 code view for my transcript field
14:38 I'll type table
14:40 return paste and then close my table
14:45 save
14:47 save save save save
14:49 and now we've got a button with the
14:51 transcript so I just need to go through
14:53 here and update all the old videos but
14:55 as new videos come out I publish those
14:57 I'll now have a way to put the
14:59 transcript in there and bring in that
15:01 extra SEO value so I hope this has been
15:04 beneficial uh to you I hope everybody
15:07 has a great day


