jay_spirit's profile

1K Messages

 • 

30.2K Points

Thursday, June 9th, 2022 11:55 PM

Closed

Solved

Bot-created duplicate pages for podcasts

I'm seeing dozens of presumably bot-created duplicate pages for podcast series. See a few examples below.

This is a mess, because the two pages for each show can't be merged until all the duplicate episodes are merged first. I can't imagine anyone wanting to perform this cleanup.

Is the staff aware of this problem?

Accepted Solution

200 Messages

 • 

2.8K Points

3 years ago

Here is the list over all the podcasts you have taken a picture of. I will have those who are duplicates cleaned up. Sorry for any inconvenience this has caused you. Be sure to let me know if you find any other podcasts that might be duplicated.

Bitcoin Audible

https://www.imdb.com/title/tt20561522/?ref_=fn_al_tt_1 (Mine, clean)

https://www.imdb.com/title/tt15275612/?ref_=fn_al_tt_2

Will have the first one removed and cleaned up.

Break the Cycle w/Joshua Smith

https://www.imdb.com/title/tt14908762/?ref_=adv_li_tt

https://www.imdb.com/title/tt17535210/?ref_=adv_li_tt (Mine, clean)

This one is my fault, I do check if the podcasts or any similar episodes exist before I add it to my database. However in this case the titles are one character apart (the space) meaning nothing popped up. I will try and be more aware next time.

Church with Jesse Lee Peterson

https://www.imdb.com/title/tt9900166/?ref_=adv_li_tt 

https://www.imdb.com/title/tt18211870/?ref_=adv_li_tt (Mine, clean)

I am not quite sure how this was duplicated. However this is my fault and I will clean it up. Most likely this is a podcast I added in the beginning of my bot program when there was a lot of small errors that collectively let things slip through.

The Rubin Report

https://www.imdb.com/title/tt7315758/?ref_=adv_li_tt

https://www.imdb.com/title/tt16240138/?ref_=adv_li_tt (Mine, clean)

Same thing here. There should not have been any way it could have slipped through.

The Babylon Bee

One is a tv-series, the other a podcast. They are not equal.

Timcast IRL

https://www.imdb.com/title/tt11765340/?ref_=adv_li_tt 

https://www.imdb.com/title/tt16838568/?ref_=adv_li_tt (Mine, clean)

Will have the one removed.

Morning Wire

https://www.imdb.com/title/tt16549266/?ref_=adv_li_tt (Mine, clean)

https://www.imdb.com/title/tt14204720/?ref_=adv_li_tt 

Will have the first one removed.

Revolutions

https://www.imdb.com/title/tt12491298/?ref_=adv_li_tt 

https://www.imdb.com/title/tt16613392/?ref_=adv_li_tt (Mine, clean)

Will clean up tt16613392.

Candace

https://www.imdb.com/title/tt16549402/?ref_=adv_li_tt (Mine, clean)

https://www.imdb.com/title/tt14214952/?ref_=adv_li_tt

Will have it cleaned up.

The Ben Shapiro Show

https://www.imdb.com/title/tt7262670/?ref_=adv_li_tt (Mine)

https://www.imdb.com/title/tt16549254/?ref_=adv_li_tt (Clean)

This one I am contributing to the correct one, however I will clean up the second one.

The Matt Walsh Show

https://www.imdb.com/title/tt8356486/?ref_=adv_li_tt 

https://www.imdb.com/title/tt16549278/?ref_=adv_li_tt (Mine, clean)

This is another one of the early podcasts, will have it cleaned up.

The Charlie Kirk Show

https://www.imdb.com/title/tt13429184/?ref_=adv_li_tt 

https://www.imdb.com/title/tt16550644/?ref_=adv_li_tt (Mine, clean)

Will have it cleaned up.

Techmeme Ride Home

https://www.imdb.com/title/tt14940862/?ref_=adv_li_tt 

https://www.imdb.com/title/tt18232908/?ref_=adv_li_tt (Mine, clean)

Will have it cleaned up.

YOUR WELCOME with Michael Malice

https://www.imdb.com/title/tt7564880/?ref_=adv_li_tt 

https://www.imdb.com/title/tt16550720/?ref_=adv_li_tt (Clean)

None of these are in my database, however I will do what I can do clean up the tt16550720.

As for a summary, I do admit a lot of these are duplicates, and that the fault lies in my court. I will do my best to clean up, thanks for letting me know. My program should be more robust and my techniques to check if it exists are done manually + automatically to try and ensure it is not a duplicate. The reoccurring theme for these podcasts are that they were within my first 1000 podcasts. Where a few slipped through the cracks unfortunately.

1K Messages

 • 

30.2K Points

@Nomissimon10​ Thank you.

Both pages for THE BABYLON BEE are for the podcast. The external links bear this out. One of the pages simply hadn't yet been changed to "(Podcast Series)" Now it has. Those two pages need to be merged.

The reason you missed CHURCH WITH JESSE LEE PETERSON is probably because the original title was BOND SUNDAY SERVICE.

(edited)

1K Messages

 • 

30.2K Points

1K Messages

 • 

30.2K Points

200 Messages

 • 

2.8K Points

All of the mentioned podcast series mentioned here (Not including THE BABYLON BEE) is now in the process of being cleaned up. About half are already gone. I will continue to have a look at the remaining later on.

200 Messages

 • 

2.8K Points

@jay_spirit I will have a look at BABYLON BEE later. However the deletion of 
Bill O'Reilly's No Spin News and Analysis (tt17346308)

The Peter Attia Drive Podcast (tt17078106)

Real Coffee with Scott Adams (tt20918640)

have been submitted and should be solved within a short amount of time.

Thanks again for letting me know.

(edited)

1K Messages

 • 

30.2K Points

@Nomissimon10​ 

The Young Turks (2005) (TV Series)

The Young Turks (2017) (Podcast Series)

Both of these pages refer to the same series. I believe (but am not certain) that THE YOUNG TURKS began as an online video series that was later released simultaneously as an audio-only podcast.

In any case, the two pages should be merged. The podcast and the subscription-based video series are the same show.

2.7K Messages

 • 

47K Points

@jay_spirit​ Are you sure that this requires a merger, though? Others in this thread have indicated that as long as a YouTube or TV show is separately released in podcast format, then it can also be separately listed on IMDb, even if the audio on the podcast format is identical to the other format. There seem to be two schools of thought and I'm not sure which one is correct.

1K Messages

 • 

30.2K Points

@keyword_expert​ 

I believe it would create a mess if we had separate pages for shows that have both a video version and an identical audio-only version. That would mean duplicate pages for thousands of shows, including most of the ones already cited on this thread, (e.g. Your Welcome with Michael Malice, The Ben Shapiro Show, The Matt Walsh Show, Real Coffee with Scott Adams, etc.).

The Young Turks, though, is a more ambiguous case. I believe it began as a video series and was released simultaneously in podcast form only later. Maybe there should be some discussion before the two are merged, but I hope they can be. Having two different pages would be confusing.

(edited)

Champion

 • 

14.5K Messages

 • 

331.1K Points

@keyword_expert​ 

When I said that Anderson Cooper 360 were not duplicates, I was mostly thinking that they were not two listings for podcasts, and that the show shouldn't end up listed only as a podcast. The podcast does seem to be audio from the TV show. It is not necessarily my opinion that the podcast needs a separate page. (I wouldn't use the word "identical" as the audio may well be edited.)

But if you want a policy discussion about this, I would suggest starting a new post.

200 Messages

 • 

2.8K Points

@Peter_pbn​ Could be an idea to make a new thread about that indeed. To me it makes sense to have them separated because as you mentioned the audio might be changed apart from the actual audio in the TV show. On top of that video could add an important element that you would otherwise miss by just listening to the audio. Making them two completely different experiencing meaning you would not give them the same rating. Therefore it should be differentiated between.

2.7K Messages

 • 

47K Points

@Nomissimon10​ I don't have strong opinions on this either way, but my preference would be for allowing two separate pages: one for the TV/YouTube series, and the other for the podcast series.

Most podcasts will contain slightly edited audio from the counterpart video versions, if nothing else for the promotional segments.

I also agree one of us should start up a new thread. Anyone willing to do that?

(edited)

2.7K Messages

 • 

47K Points

@jay_spirit

It turns out that The Young Turks started as audio-only, According to these sources, The Young Turks started in 2002 on Sirius satellite radio, then became a YouTube show, then was carried on Air America.

I can't tell whether a satellite radio show qualifies as a podcast, and if not, when exactly this particular podcast started. The Young Turks calls itself "the longest running daily stream online" (I believe that's a reference to their video stream).

And as with many other podcasts, the audio of The Young Turks does appear to have slightly edited audio, making it different from the video version.

https://en.wikipedia.org/wiki/The_Young_Turks

https://podsearch.com/listing/the-young-turks-2.html

https://podsearch.com/listing/the-young-turks.html

(edited)

1K Messages

 • 

30.2K Points

The Archers (1950) (Podcast Series)

I created this page for the radio series that later (much later) became a podcast -- but still listed the start date as the date the radio series, not the podcast, began. The precedent I set hasn't been codified into the guidelines, but I hope it will be. It seems to me the sanest way to handle this situation, especially since IMDb is biased towards inclusion over exclusion. The more data, the better.

I'd also like to see the differences between video and audio versions of podcasts handled in the Alternate Versions section of each page. Having one page for each version would be a mess.

1.7K Messages

 • 

22.9K Points

@jay_spirit​ I remember Orson Welles' "Mercury Theatre on the Air" was deleted as a result of this thread: https://community-imdb.sprinklr.com/conversations/data-issues-policy-discussions/staff-radio-shows-old-radio-so-not-podcasts-not-allowed-but-not-deleted-either-what-is-the-policy-here/6176bfce203b6a57f4171ac6

I don't really see much difference with this "The Archers" page and that deleted "Mercury Theatre on the Air" page. As someone said there, it looks like it also became available as a podcast much later. So if this "The Archers" page is allowed in this condition (with a 1950 start date and with old episodes listed and such) then deletion of Orson Welles' radio show was probably a mistake? Or allowing Archers in this condition with old date and episodes is a mistake?

(one difference may be Archers still continues but Mercury Theatre does not?)

(edited)

1K Messages

 • 

30.2K Points

@mbmb​ 

The only difference I see is that THE ARCHERS became a podcast with new episodes, while MERCURY THEATER ON THE AIR was strictly a podcast with old episodes.

Even if old radio shows aren't eligible (yet!), it seems to me that the podcast should have remained with the current podcast dates listed.

I don't know. Maybe there has to be some extra material for the podcast to count. For instance, THE GREAT DETECTIVES OF OLD TIME RADIO plays old radio shows, but has introductions and epilogues by the host.

(edited)

Champion

 • 

4K Messages

 • 

244.1K Points

@jay_spirit​ Radio is still isn't eligible. I would argue that radio-turned-podcast should abide to archive footage (well, archive sound in this case) and shell episode policies. 

1K Messages

 • 

30.2K Points

@MykolaYeriomin​ 

I agree if you're talking about a podcast like The Great Detectives of Old Time Radio. That program plays old radio shows, but with extra material. The current-day host has normal voice credits, but the actors in the old radio plays all have archives sound credits.

Ditto if you're talking about a podcast like The ComicWeb, which plays old radio programs without extra material. All the credits for this series are archive sound credits.

I assume you're not talking about a show like The Archers, which began as a radio program in 1950, but in this century is now both a radio program and a podcast. I assume that you would not want the 1950 episodes of this series to have archive sound credits.

(edited)

Champion

 • 

4K Messages

 • 

244.1K Points

@jay_spirit​ Makes sense, if it's a unique case like that. 

1K Messages

 • 

30.2K Points

3 years ago

There seems to be three reasons this is happening.

1. The existing page has a start date different from what the bot thinks it should be.

2. The existing page has a slightly different title from what the bot thinks it should be.

3. The existing page has an IMDb display title that is IDENTICAL to the title the bot thinks it should be, BUT the bot only sees the original title, which is different.

1 Message

 • 

64 Points

3 years ago

This bot is operated by @Nomissimon10 .

He didn't check duplicate titles/episodes and release dates before submitting. THIS MUST STOP RIGHT NOW!

Someone already noticed in January but no action taken by staff members.

100,000+ podcast episodes have been submitted with incorrect country name. Possibly done by Nomissimon10.
https://community-imdb.sprinklr.com/conversations/data-issues-policy-discussions/100000-podcast-episodes-have-been-submitted-with-incorrect-country-name-possibly-done-by-nomissimon10/61eb681523c1a32f12dd2ac9

2.7K Messages

 • 

47K Points

@Nomissimon10  Care to comment?

2.7K Messages

 • 

47K Points

@sarz​ I wonder if @Nomissimon10 is also responsible for these poorly formatted keywords?

scandinavia-eastern-europe (120 titles)

sweden-eastern-europe (119 titles)

norway-eastern-europe (3 titles)

finland-northern-europe (2 titles)

200 Messages

 • 

2.8K Points

@keyword_expert​ I don't have anything to do with keywords. Or any of these titles at all. Also, 1 ping is enough, does not get me to reply faster to tag me for the 2nd and third time when I am asleep.

(edited)

200 Messages

 • 

2.8K Points

@sarz​ Please link me the podcasts in question and I will have a look if I am the one responsible. And make a program that cleans up if so. (Clean up regardless, but especially if it is my wrongdoing)

Also, the post you link to the response is "Release Date is not the same thing as 'country of origin'". So there is no wrongdoing.

(edited)

2.7K Messages

 • 

47K Points

3 years ago

@Nomissimon10:  Can you please look into whether you are also responsible for these duplicate podcast submissions discussed in these threads started by @tom_wake and @Vande?

Someone keeps adding duplicate episodes to podcast series

Duplications from new Podcast Series category

(edited)

2.7K Messages

 • 

47K Points

3 years ago

@Nomissimon10

I may have found a couple more duplicates. 

Anderson Cooper 360 (2021) (Podcast Series)
Anderson Cooper 360° (2003) (TV Series)

Louder with Crowder (2015) (Podcast Series)
Louder with Crowder (2014) (Podcast Series)

Champion

 • 

14.5K Messages

 • 

331.1K Points

Anderson Cooper 360 is a TV show and an audio podcast, so no duplicate.

200 Messages

 • 

2.8K Points

@keyword_expert​ The duplicate Louder with Crowder (2014) - tt16549258 has been submitted for deletion. So it should be fixed relatively soon.

2.7K Messages

 • 

47K Points

3 years ago

@Nomissimon10 

Some more potential duplicates (some of these say "TV" and the counterpart one "Podcast," but I'm not sure I buy that distinction):

Turley Talks (2020) (Podcast Series)

The Rubin Report (2013) (Podcast Series)
The Rubin Report (2015) (Podcast Series)

Poster Dan Boningo Show (2017) (Podcast Series)

Dan Boningo Show (2017) (Podcast Series)

Slate Daily Feed (2020) (Podcast Series)

Slate Plus Podcasts (2021) (Podcast Series)

That last one has two different names, but the same exact episodes on both series.

https://www.imdb.com/title/tt17372102/episodes?year=2022

https://www.imdb.com/title/tt20429390/episodes?year=2022

And you might also want to have a look at this one, which has many of the same episodes as the prior two titles.

Slate News (2016) (Podcast Series)

(edited)

200 Messages

 • 

2.8K Points

3 years ago

Currently AFK, but will have a look at more of the posted links here once I get back online. For now the duplicate podcast Bitcoin Audible has been removed. Expect the rest to be gone by the end of next week.

2.7K Messages

 • 

47K Points

@Nomissimon10

Have you noticed that the titles added by your bots often have text glitches in the names/titles of the episode?

Here are some examples:

Mon. 01/03 âEUR" NFTs On Smart TVs?

Wed. 01/05 âEUR" SonyâEURs VR Version 2 Deets

Thu. 01/06 aEU\' Google Brings aEUoeIt Just WorksaEU To Everything

Ok, GroomerâEUR¦

The Devolution of Justice: From Scalia to KBJâEUR"The First BlackâEUR¦Woman?

I am guessing this has something to do with your bots' inability to properly detect, capture, and/or display certain punctuation marks within titles. Some examples are em-dashes, ellipses, and quotation marks.

For example, the last title shown above should be displayed like this (the correct title includes an em-dash in the middle and an ellipsis near the end):

The Devolution of Justice: From Scalia to KBJ—The First Black…Woman? 

And the third title listed above should simply have quotation marks in the middle:

Google Brings “It Just Works” To Everything

Then there is stuff like this. I don't know if this is yours or not, but wow!

dY~!dY~!dY~!dY~!dY~!dY~ dY~ dY~ dY~ dY~ dY~ dY~^dY~^dY~^dY~^dY~^dY~^dY~$?dY~$?dY~$?dY~$?dY~$?dY$?!dY$?!dY$?!dY$?!dY$?!

In recent weeks, I have noticed a lot (like, tens of thousands) of these podcast episodes with glitched titles. Some of them can be spotted in these title searches:

Title Matching "âEUR", Podcast Episode (Sorted by Popularity Ascending)

Title Matching "aEU", Podcast Episode (Sorted by Popularity Ascending)

Title Matching "dY!", Podcast Episode (Sorted by Popularity Descending)

2.7K Messages

 • 

47K Points

@Nomissimon10 

By the way, one of the reasons why I think this one might be the work of your bots is that the series page for that particular title uses the idiosyncratic phrase "The link to the Spotify," which also appears on many of the podcast series that you have claimed in this thread.

Curiously, your links that read "The link to the Spotify" don't even go to the actual podcast series, but rather just the Spotify domain (https://www.spotify.com/). 

I see you have also often added links to the actual podcast series that read "Link to the podcast on Spotify." 

As for the links to the Spotify webpage generally, there is no reason to do that on specific podcast series. When I first saw those links a few weeks ago, I assumed this is someone being paid by Spotify to take advantage of linking abilities on IMDb to run de facto advertisements for Spotify.

According to a recent Google search, there are as many as 11,000 podcast series on IMDb with your phrase "The link to the Spotify" displaying on the series title page.

(edited)

2.7K Messages

 • 

47K Points

Still another thing I've noticed is that for some series, your bots are adding to the correct series title, but they are adding duplicates of episodes, and the duplicates have slight deviations in the episode name (for example, apostrophes are appended to the beginning and end of the name).

Here's an example:

Official episode: Rebreyerment

Bot-added episode: 'Rebreyerment.'

I don't know if that example is from your bots, but I see other episodes in that same series that have the same above-discussed "âEUR" and "aEU" glitches in the episode title names (like this one, for example), so I think it's a fair assumption that your bots are adding to this series as well.

(edited)

200 Messages

 • 

2.8K Points

@keyword_expert​ thanks for the feedback, still not back to my computer. Will have a proper look at all of your posts once I have the time to. However an easier way to keep this conversation going would be if you added me on for example Discord. (If you have Discord) My Discord tag is: Nomissimon10#6948. Otherwise you can reach me on mail: simonbjornstad4@gmail.com.

 

(edited)

200 Messages

 • 

2.8K Points

@keyword_expert​ 11.000 is probably not the correct amount. I have contributed ~4.500 podcasts. (Probably a bit less, 4,138 according to my DB excluding podcasts from NRK and other Norwegian and Swedish podcast pages) And 1/3 does not have the "The link to the Spotify...". However, after inspecting the code I found the error leading to it just going to https://spotify.com/ as opposed to

This is now being corrected for all podcasts, expect all links to lead to the correct url by tomorrow. (Probably faster tbh, as the updates gets processed within the minute)

(edited)

200 Messages

 • 

2.8K Points

@keyword_expert​ Update: I am removing all of my Official Links and replacing them with Miscellaneous Links pointing to the correct url.

2.7K Messages

 • 

47K Points

@Nomissimon10​ Sounds good. Most of the titles with your Official Links already did have your Miscellaneous Links.

1K Messages

 • 

30.2K Points

3 years ago

The reason you missed MORNING WIRE may be that the original title is -- inexplicably -- listed as THE CANDACE OWENS SHOW.

That is false. The original title was and still is MORNING WIRE. I've tried to correct this false title twice, but both submissions were rejected. Maybe you'll have better luck when you merge the two pages for MORNING WIRE.

My guess is that someone took a duplicate page for THE CANDACE OWENS SHOW (which is an actual podcast and has its own page) and transformed it into a page for MORNING WIRE. The duplicate page was probably an empty shell, with little more than a title. Rather than creating a new title for MORNING WIRE, someone may have repurposed an unnecessary page.

(edited)

2.7K Messages

 • 

47K Points

3 years ago

@Nomissimon10 

1. Dark Woods (I) (2021 Podcast Series)

Short, Sci-Fi

  8.9
Dark Woods

2. Dark Woods (II) (2021 Podcast Series)

____

True Crime Daily the Podcast

1. True Crime Daily the Podcast (2020 Podcast Series)

True Crime Daily

2. True Crime Daily (2019 Podcast Series)

____

This next one does not look like your work, but it is a duplicate podcast series nonetheless.

Cold Case Files

1. Cold Case Files (2017 Podcast Series)

Cold Case Files: The Podcast

2. Cold Case Files: The Podcast (2017 Podcast Series)

8.6K Messages

 • 

176.8K Points

@keyword_expert​ 

? ?

Display tt numbers to see order of entries ??

1. Dark Woods (I) (2021 Podcast Series)

https://www.imdb.com/title/tt16383414/

2. Dark Woods (II) (2021 Podcast Series)

https://www.imdb.com/title/tt17079556/

1. True Crime Daily the Podcast (2020 Podcast Series)

https://www.imdb.com/title/tt17371892/

2. True Crime Daily (2019 Podcast Series)

https://www.imdb.com/title/tt11959084/

1. Cold Case Files (2017 Podcast Series)

https://www.imdb.com/title/tt16365754/

2. Cold Case Files: The Podcast (2017 Podcast Series)

https://www.imdb.com/title/tt14973052/

I will delete this later...

(edited)

Employee

 • 

17.6K Messages

 • 

314.4K Points

3 years ago

Hi @Nomissimon10 -

Thanks for posting the details regarding the recent data clean-ups.  We reached out to you directly for some additional information, you are welcome to reply to our email directly with any questions.

Cheers!

200 Messages

 • 

2.8K Points

@Michelle​ I have replied to the questions you had in the mail.

2.7K Messages

 • 

47K Points

3 years ago

@Nomissimon10

Do you have any comment on the text/character glitches in your podcast titles?

Here are a few more examples:

1853 - TBT: aEUoeNot Your MotheraEUs ResumA(c)aEU Service Doubles Business

1831 - Q&A: aEUoeIaEUd like to sell an Apple Watch guideaEU-aEU 

1829 - First $1,000: aEUoeDonaEUt give upaEU\'your future self will thank you!aEU

There are tens of thousands more examples like that. 

200 Messages

 • 

2.8K Points

@keyword_expert​ As I mentioned I would prefer if matters like that were discussed in mail or on Discord. So that this thread does not have to be spammed down. I only need the unique occurrences of a faulty character. An example of this is – which is supposed to be a -. But because the creator of the episode decided to use a special character - as opposed to a regular - it converts wrongly. Every time I get a new occurrence and find out what it is supposed to be I can fix ALL of the wrongly formatted titles with that fault. I also have a proposition if you want to help me keep duplicates at a low. 

As for the Official Links (3.426 / 4.138) has been removed. So expect them all to be gone by the end of the day.

(edited)

200 Messages

 • 

2.8K Points

@keyword_expert you have a place I can contact you privately?