jay_spirit's profile

986 Messages

 • 

29.3K Points

Thursday, June 9th, 2022 11:55 PM

Closed

Solved

Bot-created duplicate pages for podcasts

I'm seeing dozens of presumably bot-created duplicate pages for podcast series. See a few examples below.

This is a mess, because the two pages for each show can't be merged until all the duplicate episodes are merged first. I can't imagine anyone wanting to perform this cleanup.

Is the staff aware of this problem?

Accepted Solution

193 Messages

 • 

2.7K Points

2 years ago

Here is the list over all the podcasts you have taken a picture of. I will have those who are duplicates cleaned up. Sorry for any inconvenience this has caused you. Be sure to let me know if you find any other podcasts that might be duplicated.

Bitcoin Audible

https://www.imdb.com/title/tt20561522/?ref_=fn_al_tt_1 (Mine, clean)

https://www.imdb.com/title/tt15275612/?ref_=fn_al_tt_2

Will have the first one removed and cleaned up.

Break the Cycle w/Joshua Smith

https://www.imdb.com/title/tt14908762/?ref_=adv_li_tt

https://www.imdb.com/title/tt17535210/?ref_=adv_li_tt (Mine, clean)

This one is my fault, I do check if the podcasts or any similar episodes exist before I add it to my database. However in this case the titles are one character apart (the space) meaning nothing popped up. I will try and be more aware next time.

Church with Jesse Lee Peterson

https://www.imdb.com/title/tt9900166/?ref_=adv_li_tt 

https://www.imdb.com/title/tt18211870/?ref_=adv_li_tt (Mine, clean)

I am not quite sure how this was duplicated. However this is my fault and I will clean it up. Most likely this is a podcast I added in the beginning of my bot program when there was a lot of small errors that collectively let things slip through.

The Rubin Report

https://www.imdb.com/title/tt7315758/?ref_=adv_li_tt

https://www.imdb.com/title/tt16240138/?ref_=adv_li_tt (Mine, clean)

Same thing here. There should not have been any way it could have slipped through.

The Babylon Bee

One is a tv-series, the other a podcast. They are not equal.

Timcast IRL

https://www.imdb.com/title/tt11765340/?ref_=adv_li_tt 

https://www.imdb.com/title/tt16838568/?ref_=adv_li_tt (Mine, clean)

Will have the one removed.

Morning Wire

https://www.imdb.com/title/tt16549266/?ref_=adv_li_tt (Mine, clean)

https://www.imdb.com/title/tt14204720/?ref_=adv_li_tt 

Will have the first one removed.

Revolutions

https://www.imdb.com/title/tt12491298/?ref_=adv_li_tt 

https://www.imdb.com/title/tt16613392/?ref_=adv_li_tt (Mine, clean)

Will clean up tt16613392.

Candace

https://www.imdb.com/title/tt16549402/?ref_=adv_li_tt (Mine, clean)

https://www.imdb.com/title/tt14214952/?ref_=adv_li_tt

Will have it cleaned up.

The Ben Shapiro Show

https://www.imdb.com/title/tt7262670/?ref_=adv_li_tt (Mine)

https://www.imdb.com/title/tt16549254/?ref_=adv_li_tt (Clean)

This one I am contributing to the correct one, however I will clean up the second one.

The Matt Walsh Show

https://www.imdb.com/title/tt8356486/?ref_=adv_li_tt 

https://www.imdb.com/title/tt16549278/?ref_=adv_li_tt (Mine, clean)

This is another one of the early podcasts, will have it cleaned up.

The Charlie Kirk Show

https://www.imdb.com/title/tt13429184/?ref_=adv_li_tt 

https://www.imdb.com/title/tt16550644/?ref_=adv_li_tt (Mine, clean)

Will have it cleaned up.

Techmeme Ride Home

https://www.imdb.com/title/tt14940862/?ref_=adv_li_tt 

https://www.imdb.com/title/tt18232908/?ref_=adv_li_tt (Mine, clean)

Will have it cleaned up.

YOUR WELCOME with Michael Malice

https://www.imdb.com/title/tt7564880/?ref_=adv_li_tt 

https://www.imdb.com/title/tt16550720/?ref_=adv_li_tt (Clean)

None of these are in my database, however I will do what I can do clean up the tt16550720.

As for a summary, I do admit a lot of these are duplicates, and that the fault lies in my court. I will do my best to clean up, thanks for letting me know. My program should be more robust and my techniques to check if it exists are done manually + automatically to try and ensure it is not a duplicate. The reoccurring theme for these podcasts are that they were within my first 1000 podcasts. Where a few slipped through the cracks unfortunately.

986 Messages

 • 

29.3K Points

2 years ago

There seems to be three reasons this is happening.

1. The existing page has a start date different from what the bot thinks it should be.

2. The existing page has a slightly different title from what the bot thinks it should be.

3. The existing page has an IMDb display title that is IDENTICAL to the title the bot thinks it should be, BUT the bot only sees the original title, which is different.

8.4K Messages

 • 

175.2K Points

@jay_spirit​ 

? ?

Add links to check status later

🙄

.

1 Message

 • 

64 Points

2 years ago

This bot is operated by @Nomissimon10 .

He didn't check duplicate titles/episodes and release dates before submitting. THIS MUST STOP RIGHT NOW!

Someone already noticed in January but no action taken by staff members.

100,000+ podcast episodes have been submitted with incorrect country name. Possibly done by Nomissimon10.
https://community-imdb.sprinklr.com/conversations/data-issues-policy-discussions/100000-podcast-episodes-have-been-submitted-with-incorrect-country-name-possibly-done-by-nomissimon10/61eb681523c1a32f12dd2ac9

2.7K Messages

 • 

47K Points

@Nomissimon10  Care to comment?

2.7K Messages

 • 

47K Points

@sarz​ I wonder if @Nomissimon10 is also responsible for these poorly formatted keywords?

scandinavia-eastern-europe (120 titles)

sweden-eastern-europe (119 titles)

norway-eastern-europe (3 titles)

finland-northern-europe (2 titles)

193 Messages

 • 

2.7K Points

@keyword_expert​ I don't have anything to do with keywords. Or any of these titles at all. Also, 1 ping is enough, does not get me to reply faster to tag me for the 2nd and third time when I am asleep.

(edited)

193 Messages

 • 

2.7K Points

@sarz​ Please link me the podcasts in question and I will have a look if I am the one responsible. And make a program that cleans up if so. (Clean up regardless, but especially if it is my wrongdoing)

Also, the post you link to the response is "Release Date is not the same thing as 'country of origin'". So there is no wrongdoing.

(edited)

2.7K Messages

 • 

47K Points

2 years ago

@Nomissimon10:  Can you please look into whether you are also responsible for these duplicate podcast submissions discussed in these threads started by @tom_wake and @Vande?

Someone keeps adding duplicate episodes to podcast series

Duplications from new Podcast Series category

(edited)

2.7K Messages

 • 

47K Points

2 years ago

@Nomissimon10

I may have found a couple more duplicates. 

Anderson Cooper 360 (2021) (Podcast Series)
Anderson Cooper 360° (2003) (TV Series)

Louder with Crowder (2015) (Podcast Series)
Louder with Crowder (2014) (Podcast Series)

Champion

 • 

14.3K Messages

 • 

328.6K Points

Anderson Cooper 360 is a TV show and an audio podcast, so no duplicate.

193 Messages

 • 

2.7K Points

@keyword_expert​ The duplicate Louder with Crowder (2014) - tt16549258 has been submitted for deletion. So it should be fixed relatively soon.

2.7K Messages

 • 

47K Points

2 years ago

@Nomissimon10 

Some more potential duplicates (some of these say "TV" and the counterpart one "Podcast," but I'm not sure I buy that distinction):

Turley Talks (2020) (Podcast Series)

The Rubin Report (2013) (Podcast Series)
The Rubin Report (2015) (Podcast Series)

Poster Dan Boningo Show (2017) (Podcast Series)

Dan Boningo Show (2017) (Podcast Series)

Slate Daily Feed (2020) (Podcast Series)

Slate Plus Podcasts (2021) (Podcast Series)

That last one has two different names, but the same exact episodes on both series.

https://www.imdb.com/title/tt17372102/episodes?year=2022

https://www.imdb.com/title/tt20429390/episodes?year=2022

And you might also want to have a look at this one, which has many of the same episodes as the prior two titles.

Slate News (2016) (Podcast Series)

(edited)

193 Messages

 • 

2.7K Points

2 years ago

Currently AFK, but will have a look at more of the posted links here once I get back online. For now the duplicate podcast Bitcoin Audible has been removed. Expect the rest to be gone by the end of next week.

2.7K Messages

 • 

47K Points

@Nomissimon10

Have you noticed that the titles added by your bots often have text glitches in the names/titles of the episode?

Here are some examples:

Mon. 01/03 âEUR" NFTs On Smart TVs?

Wed. 01/05 âEUR" SonyâEURs VR Version 2 Deets

Thu. 01/06 aEU\' Google Brings aEUoeIt Just WorksaEU To Everything

Ok, GroomerâEUR¦

The Devolution of Justice: From Scalia to KBJâEUR"The First BlackâEUR¦Woman?

I am guessing this has something to do with your bots' inability to properly detect, capture, and/or display certain punctuation marks within titles. Some examples are em-dashes, ellipses, and quotation marks.

For example, the last title shown above should be displayed like this (the correct title includes an em-dash in the middle and an ellipsis near the end):

The Devolution of Justice: From Scalia to KBJ—The First Black…Woman? 

And the third title listed above should simply have quotation marks in the middle:

Google Brings “It Just Works” To Everything

Then there is stuff like this. I don't know if this is yours or not, but wow!

dY~!dY~!dY~!dY~!dY~!dY~ dY~ dY~ dY~ dY~ dY~ dY~^dY~^dY~^dY~^dY~^dY~^dY~$?dY~$?dY~$?dY~$?dY~$?dY$?!dY$?!dY$?!dY$?!dY$?!

In recent weeks, I have noticed a lot (like, tens of thousands) of these podcast episodes with glitched titles. Some of them can be spotted in these title searches:

Title Matching "âEUR", Podcast Episode (Sorted by Popularity Ascending)

Title Matching "aEU", Podcast Episode (Sorted by Popularity Ascending)

Title Matching "dY!", Podcast Episode (Sorted by Popularity Descending)

2.7K Messages

 • 

47K Points

@Nomissimon10 

By the way, one of the reasons why I think this one might be the work of your bots is that the series page for that particular title uses the idiosyncratic phrase "The link to the Spotify," which also appears on many of the podcast series that you have claimed in this thread.

Curiously, your links that read "The link to the Spotify" don't even go to the actual podcast series, but rather just the Spotify domain (https://www.spotify.com/). 

I see you have also often added links to the actual podcast series that read "Link to the podcast on Spotify." 

As for the links to the Spotify webpage generally, there is no reason to do that on specific podcast series. When I first saw those links a few weeks ago, I assumed this is someone being paid by Spotify to take advantage of linking abilities on IMDb to run de facto advertisements for Spotify.

According to a recent Google search, there are as many as 11,000 podcast series on IMDb with your phrase "The link to the Spotify" displaying on the series title page.

(edited)

2.7K Messages

 • 

47K Points

Still another thing I've noticed is that for some series, your bots are adding to the correct series title, but they are adding duplicates of episodes, and the duplicates have slight deviations in the episode name (for example, apostrophes are appended to the beginning and end of the name).

Here's an example:

Official episode: Rebreyerment

Bot-added episode: 'Rebreyerment.'

I don't know if that example is from your bots, but I see other episodes in that same series that have the same above-discussed "âEUR" and "aEU" glitches in the episode title names (like this one, for example), so I think it's a fair assumption that your bots are adding to this series as well.

(edited)

193 Messages

 • 

2.7K Points

@keyword_expert​ thanks for the feedback, still not back to my computer. Will have a proper look at all of your posts once I have the time to. However an easier way to keep this conversation going would be if you added me on for example Discord. (If you have Discord) My Discord tag is: Nomissimon10#6948. Otherwise you can reach me on mail: simonbjornstad4@gmail.com.

 

(edited)

193 Messages

 • 

2.7K Points

@keyword_expert​ 11.000 is probably not the correct amount. I have contributed ~4.500 podcasts. (Probably a bit less, 4,138 according to my DB excluding podcasts from NRK and other Norwegian and Swedish podcast pages) And 1/3 does not have the "The link to the Spotify...". However, after inspecting the code I found the error leading to it just going to https://spotify.com/ as opposed to

This is now being corrected for all podcasts, expect all links to lead to the correct url by tomorrow. (Probably faster tbh, as the updates gets processed within the minute)

(edited)

193 Messages

 • 

2.7K Points

@keyword_expert​ Update: I am removing all of my Official Links and replacing them with Miscellaneous Links pointing to the correct url.

2.7K Messages

 • 

47K Points

@Nomissimon10​ Sounds good. Most of the titles with your Official Links already did have your Miscellaneous Links.

986 Messages

 • 

29.3K Points

2 years ago

The reason you missed MORNING WIRE may be that the original title is -- inexplicably -- listed as THE CANDACE OWENS SHOW.

That is false. The original title was and still is MORNING WIRE. I've tried to correct this false title twice, but both submissions were rejected. Maybe you'll have better luck when you merge the two pages for MORNING WIRE.

My guess is that someone took a duplicate page for THE CANDACE OWENS SHOW (which is an actual podcast and has its own page) and transformed it into a page for MORNING WIRE. The duplicate page was probably an empty shell, with little more than a title. Rather than creating a new title for MORNING WIRE, someone may have repurposed an unnecessary page.

(edited)

2.7K Messages

 • 

47K Points

2 years ago

@Nomissimon10 

1. Dark Woods (I) (2021 Podcast Series)

Short, Sci-Fi

  8.9
Dark Woods

2. Dark Woods (II) (2021 Podcast Series)

____

True Crime Daily the Podcast

1. True Crime Daily the Podcast (2020 Podcast Series)

True Crime Daily

2. True Crime Daily (2019 Podcast Series)

____

This next one does not look like your work, but it is a duplicate podcast series nonetheless.

Cold Case Files

1. Cold Case Files (2017 Podcast Series)

Cold Case Files: The Podcast

2. Cold Case Files: The Podcast (2017 Podcast Series)

8.4K Messages

 • 

175.2K Points

@keyword_expert​ 

? ?

Display tt numbers to see order of entries ??

1. Dark Woods (I) (2021 Podcast Series)

https://www.imdb.com/title/tt16383414/

2. Dark Woods (II) (2021 Podcast Series)

https://www.imdb.com/title/tt17079556/

1. True Crime Daily the Podcast (2020 Podcast Series)

https://www.imdb.com/title/tt17371892/

2. True Crime Daily (2019 Podcast Series)

https://www.imdb.com/title/tt11959084/

1. Cold Case Files (2017 Podcast Series)

https://www.imdb.com/title/tt16365754/

2. Cold Case Files: The Podcast (2017 Podcast Series)

https://www.imdb.com/title/tt14973052/

I will delete this later...

(edited)

Employee

 • 

17.2K Messages

 • 

310.5K Points

2 years ago

Hi @Nomissimon10 -

Thanks for posting the details regarding the recent data clean-ups.  We reached out to you directly for some additional information, you are welcome to reply to our email directly with any questions.

Cheers!

193 Messages

 • 

2.7K Points

@Michelle​ I have replied to the questions you had in the mail.

2.7K Messages

 • 

47K Points

2 years ago

@Nomissimon10

Do you have any comment on the text/character glitches in your podcast titles?

Here are a few more examples:

1853 - TBT: aEUoeNot Your MotheraEUs ResumA(c)aEU Service Doubles Business

1831 - Q&A: aEUoeIaEUd like to sell an Apple Watch guideaEU-aEU 

1829 - First $1,000: aEUoeDonaEUt give upaEU\'your future self will thank you!aEU

There are tens of thousands more examples like that. 

193 Messages

 • 

2.7K Points

@keyword_expert​ As I mentioned I would prefer if matters like that were discussed in mail or on Discord. So that this thread does not have to be spammed down. I only need the unique occurrences of a faulty character. An example of this is – which is supposed to be a -. But because the creator of the episode decided to use a special character - as opposed to a regular - it converts wrongly. Every time I get a new occurrence and find out what it is supposed to be I can fix ALL of the wrongly formatted titles with that fault. I also have a proposition if you want to help me keep duplicates at a low. 

As for the Official Links (3.426 / 4.138) has been removed. So expect them all to be gone by the end of the day.

(edited)

193 Messages

 • 

2.7K Points

@keyword_expert you have a place I can contact you privately?