986 Messages
•
29.3K Points
Bot-created duplicate pages for podcasts
I'm seeing dozens of presumably bot-created duplicate pages for podcast series. See a few examples below.
This is a mess, because the two pages for each show can't be merged until all the duplicate episodes are merged first. I can't imagine anyone wanting to perform this cleanup.
Is the staff aware of this problem?
Accepted Solution
Nomissimon10
193 Messages
•
2.7K Points
2 years ago
Here is the list over all the podcasts you have taken a picture of. I will have those who are duplicates cleaned up. Sorry for any inconvenience this has caused you. Be sure to let me know if you find any other podcasts that might be duplicated.
Bitcoin Audible
https://www.imdb.com/title/tt20561522/?ref_=fn_al_tt_1 (Mine, clean)
https://www.imdb.com/title/tt15275612/?ref_=fn_al_tt_2
Will have the first one removed and cleaned up.
Break the Cycle w/Joshua Smith
https://www.imdb.com/title/tt14908762/?ref_=adv_li_tt
https://www.imdb.com/title/tt17535210/?ref_=adv_li_tt (Mine, clean)
This one is my fault, I do check if the podcasts or any similar episodes exist before I add it to my database. However in this case the titles are one character apart (the space) meaning nothing popped up. I will try and be more aware next time.
Church with Jesse Lee Peterson
https://www.imdb.com/title/tt9900166/?ref_=adv_li_tt
https://www.imdb.com/title/tt18211870/?ref_=adv_li_tt (Mine, clean)
I am not quite sure how this was duplicated. However this is my fault and I will clean it up. Most likely this is a podcast I added in the beginning of my bot program when there was a lot of small errors that collectively let things slip through.
The Rubin Report
https://www.imdb.com/title/tt7315758/?ref_=adv_li_tt
https://www.imdb.com/title/tt16240138/?ref_=adv_li_tt (Mine, clean)
Same thing here. There should not have been any way it could have slipped through.
The Babylon Bee
One is a tv-series, the other a podcast. They are not equal.
Timcast IRL
https://www.imdb.com/title/tt11765340/?ref_=adv_li_tt
https://www.imdb.com/title/tt16838568/?ref_=adv_li_tt (Mine, clean)
Will have the one removed.
Morning Wire
https://www.imdb.com/title/tt16549266/?ref_=adv_li_tt (Mine, clean)
https://www.imdb.com/title/tt14204720/?ref_=adv_li_tt
Will have the first one removed.
Revolutions
https://www.imdb.com/title/tt12491298/?ref_=adv_li_tt
https://www.imdb.com/title/tt16613392/?ref_=adv_li_tt (Mine, clean)
Will clean up tt16613392.
Candace
https://www.imdb.com/title/tt16549402/?ref_=adv_li_tt (Mine, clean)
https://www.imdb.com/title/tt14214952/?ref_=adv_li_tt
Will have it cleaned up.
The Ben Shapiro Show
https://www.imdb.com/title/tt7262670/?ref_=adv_li_tt (Mine)
https://www.imdb.com/title/tt16549254/?ref_=adv_li_tt (Clean)
This one I am contributing to the correct one, however I will clean up the second one.
The Matt Walsh Show
https://www.imdb.com/title/tt8356486/?ref_=adv_li_tt
https://www.imdb.com/title/tt16549278/?ref_=adv_li_tt (Mine, clean)
This is another one of the early podcasts, will have it cleaned up.
The Charlie Kirk Show
https://www.imdb.com/title/tt13429184/?ref_=adv_li_tt
https://www.imdb.com/title/tt16550644/?ref_=adv_li_tt (Mine, clean)
Will have it cleaned up.
Techmeme Ride Home
https://www.imdb.com/title/tt14940862/?ref_=adv_li_tt
https://www.imdb.com/title/tt18232908/?ref_=adv_li_tt (Mine, clean)
Will have it cleaned up.
YOUR WELCOME with Michael Malice
https://www.imdb.com/title/tt7564880/?ref_=adv_li_tt
https://www.imdb.com/title/tt16550720/?ref_=adv_li_tt (Clean)
None of these are in my database, however I will do what I can do clean up the tt16550720.
As for a summary, I do admit a lot of these are duplicates, and that the fault lies in my court. I will do my best to clean up, thanks for letting me know. My program should be more robust and my techniques to check if it exists are done manually + automatically to try and ensure it is not a duplicate. The reoccurring theme for these podcasts are that they were within my first 1000 podcasts. Where a few slipped through the cracks unfortunately.
19
jay_spirit
986 Messages
•
29.3K Points
2 years ago
There seems to be three reasons this is happening.
1. The existing page has a start date different from what the bot thinks it should be.
2. The existing page has a slightly different title from what the bot thinks it should be.
3. The existing page has an IMDb display title that is IDENTICAL to the title the bot thinks it should be, BUT the bot only sees the original title, which is different.
1
sarz
1 Message
•
64 Points
2 years ago
This bot is operated by @Nomissimon10 .
He didn't check duplicate titles/episodes and release dates before submitting. THIS MUST STOP RIGHT NOW!
Someone already noticed in January but no action taken by staff members.
100,000+ podcast episodes have been submitted with incorrect country name. Possibly done by Nomissimon10.
https://community-imdb.sprinklr.com/conversations/data-issues-policy-discussions/100000-podcast-episodes-have-been-submitted-with-incorrect-country-name-possibly-done-by-nomissimon10/61eb681523c1a32f12dd2ac9
4
keyword_expert
2.7K Messages
•
47K Points
2 years ago
@Nomissimon10: Can you please look into whether you are also responsible for these duplicate podcast submissions discussed in these threads started by @tom_wake and @Vande?
Someone keeps adding duplicate episodes to podcast series
Duplications from new Podcast Series category
(edited)
0
keyword_expert
2.7K Messages
•
47K Points
2 years ago
@Nomissimon10
I may have found a couple more duplicates.
2
keyword_expert
2.7K Messages
•
47K Points
2 years ago
@Nomissimon10
Some more potential duplicates (some of these say "TV" and the counterpart one "Podcast," but I'm not sure I buy that distinction):
That last one has two different names, but the same exact episodes on both series.
https://www.imdb.com/title/tt17372102/episodes?year=2022
https://www.imdb.com/title/tt20429390/episodes?year=2022
And you might also want to have a look at this one, which has many of the same episodes as the prior two titles.
(edited)
0
Nomissimon10
193 Messages
•
2.7K Points
2 years ago
Currently AFK, but will have a look at more of the posted links here once I get back online. For now the duplicate podcast Bitcoin Audible has been removed. Expect the rest to be gone by the end of next week.
7
jay_spirit
986 Messages
•
29.3K Points
2 years ago
The reason you missed MORNING WIRE may be that the original title is -- inexplicably -- listed as THE CANDACE OWENS SHOW.
That is false. The original title was and still is MORNING WIRE. I've tried to correct this false title twice, but both submissions were rejected. Maybe you'll have better luck when you merge the two pages for MORNING WIRE.
My guess is that someone took a duplicate page for THE CANDACE OWENS SHOW (which is an actual podcast and has its own page) and transformed it into a page for MORNING WIRE. The duplicate page was probably an empty shell, with little more than a title. Rather than creating a new title for MORNING WIRE, someone may have repurposed an unnecessary page.
(edited)
0
keyword_expert
2.7K Messages
•
47K Points
2 years ago
@Nomissimon10
1. Dark Woods (I) (2021 Podcast Series)
Short, Sci-Fi
2. Dark Woods (II) (2021 Podcast Series)
____
1. True Crime Daily the Podcast (2020 Podcast Series)
2. True Crime Daily (2019 Podcast Series)
____
This next one does not look like your work, but it is a duplicate podcast series nonetheless.
1. Cold Case Files (2017 Podcast Series)
2. Cold Case Files: The Podcast (2017 Podcast Series)
1
0
Michelle
Employee
•
17.2K Messages
•
310.5K Points
2 years ago
Hi @Nomissimon10 -
Thanks for posting the details regarding the recent data clean-ups. We reached out to you directly for some additional information, you are welcome to reply to our email directly with any questions.
Cheers!
1
keyword_expert
2.7K Messages
•
47K Points
2 years ago
@Nomissimon10
Do you have any comment on the text/character glitches in your podcast titles?
Here are a few more examples:
1853 - TBT: aEUoeNot Your MotheraEUs ResumA(c)aEU Service Doubles Business
1831 - Q&A: aEUoeIaEUd like to sell an Apple Watch guideaEU-aEU
1829 - First $1,000: aEUoeDonaEUt give upaEU\'your future self will thank you!aEU
There are tens of thousands more examples like that.
2