Quick Win - Formatting YouTube Transcripts for the Web

Feb 12, 2023

SUMMARY

In this video, I share with you a technique to easily format YouTube transcripts for use on your own website.

GitHub Gist
https://gist.github.com/david-poindexter/e538deaf06b1c1cb277e8314604d93ee 

RegExr
https://regexr.com

VS Code
https://code.visualstudio.com

DNN Platform
https://github.com/dnnsoftware/Dnn.Platform

DNN Community
https://dnncommunity.org

2sxc
https://2sxc.org https://github.com/2sic/2sxc

0:08 thank you
0:12 hello hello it's Dean and Dave I hope
0:14 everybody is doing well today I thought
0:17 I would uh share with you something I
0:19 was playing around a bit today just
0:21 having a little bit of fun with a few
0:22 things and decided I wanted to do
0:26 something to kind of enhance the content
0:29 maybe even help SEO with my DNA day
0:34 website and I thought it was uh kind of
0:37 fun what I was trying to do and the way
0:40 they ended up solving it I thought maybe
0:42 I'd share that and maybe it would help
0:44 some of you out there that are they're
0:46 looking to do some things like this so
0:48 let me share my screen here and we'll go
0:50 through a few things
0:52 so what I wanted to do was on the DNA
0:57 day website when you go to a video
1:00 detail page
1:02 it currently doesn't show anything as it
1:05 relates to
1:07 the transcript of the video so if you're
1:11 out on YouTube you know you can actually
1:13 see the transcript of it and so forth
1:15 and I thought let me try to bring the
1:17 transcript into here and that can bring
1:19 multiple benefits to this page but
1:22 wasn't sure exactly how I would go about
1:25 doing that
1:26 so if I were to click on this video and
1:29 go to the YouTube channel and I'll go
1:31 ahead and pause it so that it doesn't
1:33 play you know you can actually go to the
1:35 the three dots over here and you can
1:37 actually see the transcript to the
1:39 entire video
1:41 so I was like well okay that sounds good
1:43 so maybe I can copy all of this text
1:46 here right
1:48 I'll just go ahead and do it scroll down
1:51 to the bottom
1:52 and I'll copy this text and I'm thinking
1:54 I could probably just easily manipulate
1:56 that and paste it into
1:58 my structure content Solution that's out
2:01 there
2:02 so when I copy that I've decided I would
2:05 look at it in Visual Studio code and see
2:07 what it looks like I tried pasting it in
2:10 different formats and so forth and
2:12 you'll see that things are highlighted a
2:13 little bit here because I've got
2:15 some things already done to prepare for
2:18 this particular video
2:22 so I was like well that's not really
2:23 formatted in a great way so how can I
2:26 manipulate this to to make it look
2:28 better on the site well first I want to
2:31 go ahead and go back to the site and
2:33 show you what I did as it relates to
2:36 the actual structure content solution
2:39 Let me refresh this site just to make
2:41 sure I'm logged in still yep I am still
2:43 logged in so in my my view for this this
2:46 is my detailed view I went ahead and
2:49 added a section here to do a collapsible
2:52 button based off of a new field that I
2:55 created so I created a field called
2:57 transcript and I'll show you where that
3:00 is if I go into my admin of the app I've
3:07 got data here and for each video I've
3:09 got a set of fields and I added a
3:12 transcript field I let that be a what
3:16 you see is what you get type input for
3:20 that so that we could just paste HTML
3:23 into there that's ultimately what I want
3:25 to do because I want a nicely formatted
3:27 transcript
3:30 let's just go ahead and do it for this
3:32 this particular video here with me and
3:35 adderson Oliveira not Anderson my name
3:39 is not anderson.com I believe it's the
3:41 site those that have been around DNN
3:44 Community for a while will appreciate
3:46 that one we miss you adderson
3:49 um so let's see we've got this I've
3:51 opened up the video over here I've
3:52 already copied the transcript into my
3:55 clipboard and I've pasted it into Visual
3:58 Studio code now one thing that you'll
4:01 notice is I have a red Jacks replaced
4:05 now for those that aren't familiar uh
4:08 the the default experience for find and
4:12 replace in vs code is going to be
4:14 character based but you can click this
4:17 little asterisk over here and use
4:19 regular expressions for this so I've
4:21 already got a regex that's figured out
4:23 here and let me show you what I'm doing
4:25 so let's go back over here
4:29 I'm going to paste that same text into
4:33 here and I'm going to put a carriage
4:35 return at the end because that's part of
4:37 what my regex is going to do but let's
4:39 break this down just a little bit what
4:41 we need to do is I want a nicely
4:44 formatted table in the end where the
4:46 time stamps portion of it are in a left
4:50 column
4:51 and the text is in the right column for
4:55 each time step
4:57 but this is really just a bunch of
4:59 carriage returns at the end of the
5:00 timestamp and a carriage return at the
5:02 end of the text
5:03 so I needed to do a regex expression
5:06 that's going to be able to format all
5:09 this in one Fell Swoop and you can do
5:12 something called grouping within regex
5:16 expressions and that is done with
5:17 parentheses so you'll see that this
5:19 section here I've got a group
5:22 and then I've got a group and then I've
5:24 got another group so I end up with four
5:26 groups
5:27 and the first group is is kind of the
5:30 most complicated but it's fairly basic
5:33 in the sense that it's three things
5:35 separated by colons optionally so this
5:39 first one says grab any digit characters
5:43 from zero to two characters right
5:47 because sometimes I may not have an hour
5:50 so if you think of it as an hour and a
5:54 minutes and a seconds on the time stamp
5:59 you'll see this video here doesn't go
6:01 into the hours but I do have some videos
6:03 that cross the one hour mark so I want
6:05 to make sure that I get those as well
6:08 so that's why I did zero comma two so
6:11 that says let's match any digits that
6:14 are numeric
6:16 between zero and two digits
6:19 and then a colon and then the question
6:21 mark after that says optional for that
6:24 because in the case of like 4304 I don't
6:28 have an hour and I also don't have the
6:31 other colon so that'll help me to just
6:35 get that optionally the next one is the
6:38 same thing but instead of zero to two I
6:41 know that I'm always going to have at
6:42 least one digit here even when it is way
6:47 back at the beginning of the transcript
6:50 you'll see that it always has a zero
6:52 here so I know that I'm going to have at
6:55 least one digit
6:56 prefix to the colon but just in case I
7:00 went ahead and put a question mark in
7:01 here anyways it doesn't hurt to have
7:04 that so that that works just fine
7:06 and then I want the colon followed by
7:10 two digits because it's always going to
7:12 have two digits at the end now I could
7:15 have done this with a curly brace two
7:18 here as well but I just chose to do
7:20 backslash D and then backslash D for
7:22 each digit
7:23 so that that is my first group uh for
7:28 the regex now the second group is says
7:31 okay now when you find this also look
7:35 for a new line or Line Feed
7:39 because I want to grab the Line Feed
7:41 that is after only the time stamp here
7:46 so that is the Line Feed that follows
7:49 this match
7:51 and then this says any characters after
7:54 that so that's going to grab any
7:57 characters that are after whatever I
7:58 match so that should get the next line
8:01 here and then I also want to look at the
8:04 new line that's after that because this
8:06 one is different than the new line
8:09 that's after the time stamp so you know
8:12 I'm not a regex expert reg X expert but
8:16 this ended up I had to play around with
8:17 this quite a bit but this is what I
8:19 ended up with now come full circle back
8:22 to over here what I want to do is use
8:25 that same regular expression to find it
8:27 and what's highlighted here is what
8:29 you're going to see so if I just did a
8:31 control X here you can see that's what
8:33 my text normally looks like if I put
8:35 this in here then it's going to say okay
8:37 it's it's matching stuff
8:39 so I want to also go all the way down to
8:42 the end because I want to make sure that
8:44 I've got a carriage return here at the
8:46 end because I want to detect that new
8:48 line that's there
8:49 so now I want to take that and I want to
8:52 replace it with a nicely formatted table
8:55 row and you know that way it goes
8:59 iterates through all of these and we're
9:01 using a couple of key pieces here so the
9:04 dollar sign one is a special thing that
9:06 says give me the first group so that is
9:11 represented by the first set of
9:13 parentheses
9:15 so I want to inject that value that
9:18 matches
9:19 inside of a strong tag
9:22 within a table cell that's within a
9:26 table row
9:27 so then I want to create another table
9:29 cell and I want to use the Third
9:32 group match so the third group match is
9:36 the text that is after the new line
9:40 after the time stamp hopefully your
9:43 following along makes sense
9:45 and then I want to close my table row
9:48 and then just from a formatting
9:50 standpoint I want to add a new line
9:51 right here at the end because I want my
9:53 result here to look really nice so that
9:56 I can just paste it right into my
9:58 wysiwyg editor in to sexy
10:01 so I'll go ahead and replace and you'll
10:03 see voila it formats all of this
10:07 absolutely beautiful
10:09 um so I've got my table row I did some
10:12 uh just inline Styles here to make sure
10:14 that every all the text within that row
10:16 is aligned to the top I've got a table
10:19 cell opening I've got a strong around
10:21 the time stamp I've got a closing cell
10:24 I've got another cell here so it'll be a
10:27 two column table and there's my table
10:29 row and then I've got a new line here in
10:31 the editor now if I scroll down I want
10:34 to make sure that it's doing that for
10:36 other kind of time stamps as well right
10:38 so I've got two digits as a minutes here
10:41 and two digits a year it looks like I
10:43 mean seconds and that looks fantastic if
10:47 this was one that had a really over an
10:52 hour in in there that this will work
10:55 with that just as well so now I can take
10:58 this
10:59 copy that into my clipboard
11:02 and I can go back over to my site and
11:05 since I have added a transcript
11:09 here I can go into the source code
11:14 View
11:15 I can do my wrapping table markup
11:20 do return paste and then close my table
11:25 and if I save that you'll see that looks
11:28 fantastic so now I can save this
11:31 save this and now because of my
11:34 conditional in the in the view back here
11:37 I said hey if if I don't have
11:41 um
11:42 if if the value of transcript
11:47 is not null or white space then I want
11:51 to show this markup here I want to
11:54 render this markup and inject my
11:55 transcript into there so now if I
11:58 refresh page right I've got a view
12:01 transcript and it's a nice collapsible
12:03 and I've got a beautifully formatted
12:05 transcript here and since that is done
12:09 in the way that it's done it's still
12:10 discoverable by search engines and
12:13 that's really nice so it's a little bit
12:14 of a manual process here and I do have
12:18 to have a way to remember
12:21 what I'm doing here so what I decided to
12:23 do is go ahead and just publish a public
12:26 gist out on GitHub and you can find that
12:30 if you go to my GitHub which is David
12:32 Dash Poindexter is my username so
12:35 github.com or excuse me
12:38 gist.github.com
12:39 my username you will see there a one for
12:45 YouTube transcript replace
12:47 so this is the fine characters and this
12:50 is the replace characters and hopefully
12:53 that'll help some of you go out for
12:55 something like this but it's it's also
12:57 kind of a lesson in thinking a little
13:00 bit outside of the box you know I I
13:01 could have done all that manually but
13:03 think about all the time that that would
13:05 have taken to be able to go through all
13:08 that and for every single one of these
13:11 videos so now I can just very quickly
13:13 and we'll go through another one here
13:14 real quick do it I can very quickly go
13:18 out here and manage these so I've got to
13:21 go through here and update all of mine
13:22 so I've gotten through most of them here
13:24 so let's take this one here I'll open
13:26 the detail I can click on this I'll go
13:29 ahead and pop oops
13:32 that has the wrong Link in it so I need
13:35 to fix that one I'll come back to that
13:36 one in just a minute
13:38 let's grab this one
13:40 and go to the actual video I'll go ahead
13:42 and pause it so that it doesn't run
13:44 and we'll grab the transcript from this
13:46 so show scrap transcript
13:49 I'll go ahead and swipe this down wait
13:53 till it gets to the end I'm wondering if
13:54 this one's more than an hour ah yeah
13:56 it's more than an hour so we can see an
13:58 example of that so I'll copy that into
13:59 my clipboard I'll go over to visual
14:02 studio code select all this paste I'll
14:06 do a carriage return at the end just to
14:08 make sure that my regex is going to
14:09 capture that last line I'll do a replace
14:12 beautiful I can select all that copy it
14:18 and actually before I go back over to
14:20 the website we'll just scroll down to
14:21 see what the past one hour mark see it
14:24 handles that beautifully as well
14:26 so I'll go back over to the website now
14:29 go into edit
14:32 I'll come now and enter into the source
14:35 code view for my transcript field
14:38 I'll type table
14:40 return paste and then close my table
14:45 save
14:47 save save save save
14:49 and now we've got a button with the
14:51 transcript so I just need to go through
14:53 here and update all the old videos but
14:55 as new videos come out I publish those
14:57 I'll now have a way to put the
14:59 transcript in there and bring in that
15:01 extra SEO value so I hope this has been
15:04 beneficial uh to you I hope everybody
15:07 has a great day

RELATED VIDEOS

2sxc for January (2023)
Jan 06, 2023

How To Use 2sxc with Git/GitHub

Quick Win - The Rubber Duck Technique
May 01, 2022

Have you ever been working on something, and it feels like it should just be working, but it is not? You spend hours troubleshooting to find it is something simple in the end and you feel like an idiot? Trust me, you are not alone! We've all been there. In this Quick Win, I'll share with you a very simple technique that can save you time (and self-deprecation)!

Quick Win - Install DNN In Less Than 1 Minute!
Mar 20, 2022

Have you ever struggled to install DNN? Fear no more! There is a cool tool that can help you install in less than a minute - yep you heard me right!