jay_spirit's profile

1.1K Messages

 • 

31.7K Points

Tuesday, October 25th, 2022

Closed

Answered

Duplicate episodes | Coffee with Scott Adams

Every day I add the new episode of Coffee with Scott Adams (a seven-day-a-week podcast). Later, a bot adds the same episode.

The series has hundreds of duplicate episodes, requiring a cleanup I'm not willing to do.

Oldest First
Selected Oldest First

2.7K Messages

 • 

47K Points

3 years ago

200 Messages

 • 

2.8K Points

3 years ago

This is indeed my work, usually episodes like these gets automatically filtered by a warning given by the contributor tab. However often this does not trigger if two of the same episodes are added in close proximity, or before the other has already been processed. I am not quite sure what to do about this. (However it is probably the reason why 306 episodes have not been uploaded from my db) If you are going to keep uploading episodes for this podcast manually I will at least remove this podcast from my database to avoid future duplicates. @jay_spirit 

Thanks for the tag @keyword_expert 

1.1K Messages

 • 

31.7K Points

@Nomissimon10​ 

The reason you don't get a warning is because you are adding something to the title that I do not add and I believe should not be added.

You add:

Episode [number] Scott Adams:

That is part of the title of each episode, but my understanding of the IMDb rules is that it should not be included.

(Compare the duplicate episodes on the image above.)

There is an automatic warning against adding an episode number to a title, which is why I exclude it.

Also: it feels to me like the name "Scott Adams" is just a reiteration of the series title and not part of the episode title proper.

If there's any way for your bot to exclude everything up to and including the colon, the problem with you duplicating titles should end.

(edited)

200 Messages

 • 

2.8K Points

@jay_spirit​ I see, the episodes and the titles of each title is retrieved from his Podcast on Spotify https://open.spotify.com/show/1l4XIhsppHl2vmoT7pWbGe I could of course add an exception to this one podcast alone, but on a large scale this would not be efficient nor wanted. (Reason being there are tens of thousands of podcasts out there, most uses different formats) I would make an attempt to go through all the duplicates and get them gone, however it is made so hard due to the fact titles have to be merged.

Would be nice if IMDb added a report duplicate button so that titles with little to no large scale interaction could be removed swiftly as opposed to a agonizing process of merging.

I could of course make an attempt to make a test for stuff like that, checking for patterns. However that could lead to other issues again where if a list of guests are listed. Example: On Corona Crisis with Nikolai Tesla, Elon Musk and Scott Adams Episode 2000

Could lead to the title being cut into an empty string.

My point being in most cases just using the title straight off the bat with minor adaptations like removing/replacing emojis, special characters etc. works the best because it in most cases is easier to edit a title to add episode number and season number as opposed to add it from scratch. Especially when podcasts have hundreds or thousands of episodes. A quick edit and remove start faster than adding the entire thing.

2.7K Messages

 • 

47K Points

@jay_spirit​ You should cite and quote the rule(s) that forbid including the episode number and series name within the episode name.

@Nomissimon10 I am not convinced that reconfiguring your bot in the manner suggested by @jay_spirit would lead to empty strings. Just configure it to check for the series title on IMDb and then to remove the episode number and series title within your episode titles, unless those details appear later in the episode name on Spotify (i.e., not at the beginning of the episode name). I don't even see a real episode named "On Corona Crisis with Nikolai [sic] Tesla, Elon Musk and Scott Adams Episode 2000." So you made that episode title up? You should use some real-world examples, cite them here, and we can help you work through how to make the bot avoid these duplications (and ensure compliance with the applicable IMDb guidelines). 

(edited)

200 Messages

 • 

2.8K Points

@keyword_expert​ Of course my example was made up on the spot as I did not want to spend time looking through thousands of episodes to find a real example. My point was not to explain why I could not reconfigure it to work with this podcast in particular, but rather to explain that making general changes to the title being uploaded could lead to worse titles on other Podcasts I am uploading for.

But here are a few examples of different setups that I found by filtering for "Episode" in the title.

‘1619,’ Episode 5: The Land of Our Fathers, Part 1 (Crime Junkie, tt13824018)

'Rabbit Hole,' Episode 5: The Accidental Emperor (Crime Junkie, tt13824018)

^ Both unuploaded, but crucial that the episode is kept as it relates to the episode within the substory of the podcast.

Roscoe Arbuckle & Virginia Rappe – Episode 4: De tre rettssakene (Truecrimepodden Dokumentar, tt17077930) - tt17081940

^ All of the episodes in this podcast series start with the substory and then then episode. If you cut off the episode you would only be able to go by the title of each episode, which can be similar or equal when the hundreds of stories stack and crime pods often have related titles.

My Favorite Murder Presents: That's Messed Up: An SVU Podcast - Episode 1: Bully (My Favorite Murder with Karen Kilgariff and Georgia Hardstark, tt17077950) - tt17081794

^ A few of the episodes start with the beginning of the name of the podcast name: "My Favorite Murder Presents". Could be cut, but where would it be natural to cut the text? If the first word of the podcast is "The" (between 1.000-2.054 of the podcasts I am monitoring starts with this), that would possibly trigger a cut of the beginning of episode names leaving them worse off.

Just a few examples that I could dig up in a matter of a few minutes. And just to make it clear, I do not have access to any database (other than the public dataset of movies that is accessible for students, and would not be of much help). And I have been by IMDb that I am not allowed to scrape or gather any info from their site beyond uploading, so checking that automagically would not be allowed. On top of that large pages often won't load or requires manual interaction to retrieve the entire page. (Occasionally large pages also crash leaving this process to also be unreliable) Using the search functionality and filter could work, but that leaves short titles with familiar phrases in a blind spot where a lot of similar episode titles will be picked up and would either give false positive for duplicate, or take too long time to check all pages.

Here are some of my data

PS: I have mentioned this before, but if you (or any other person interested in my contributions, or quality assurance in general) want more info feel free to reach out to my mail. Simonbjornstad4@gmail.com
For the time being I am mainly focused on new podcasts that does not exist on IMDb, trying to make the process of adding podcasts safer to avoid duplicates like the ones you informed me about a while back. Automating process I am having to manually start when I have time, along with trying to make cool features I think IMDb should adapt and add to their website.

1.1K Messages

 • 

31.7K Points

@keyword_expert​ 

Below is the rule regarding episode numbers in the title.

There's no specific rule that would prevent a contributor from adding the Scott Adams: part to the episode title.

I'm wondering if I should just submit the episodes the way @Nomissimon10 does.

After all, he is submitting the episode titles exactly the way they appear on the Spotify feed and all other feeds.

The rule may be saying only that you cannot add an episode number if it does not already appear in the title. It may not imply that if the episode title includes the number the contributor must remove it.

I don't know.

It felt to me like the Episode [number] Scott Adams: part did not belong in the episode title, even though it technically appears there.

But maybe I'm wrong.

(edited)

2.7K Messages

 • 

47K Points

Seems like IMDb staff should weigh in on this. The core question is whether episode numbers and series titles (in part or in full) can (and should) be included in episode titles on IMDb when that is exactly how they appear in their original podcast formatting. This may be a case of where it's okay to deviate from the guidelines. (And the guidelines may need some revisions to account for these situations.)

200 Messages

 • 

2.8K Points

@jay_spirit​ I agree that in this case "Episode 1906 Scott Adams:", "Episode 1907 Scott Adams:" etc. does not belong in this specific case. Like you I am unsure if it is a clear rule break, but I hope my explanation for why it has been uploaded as it has gives some insight :)

(edited)

1.1K Messages

 • 

31.7K Points

@Nomissimon10​ 

Are you able to get your bot to add the episode number?

That way you would not have to alter the title. IMDb's system would prevent the bot from adding a duplicate episode if the episode numbers were the same.

200 Messages

 • 

2.8K Points

@jay_spirit​ Unfortunately this sparks the same issue as before. Where for example True Crime pods use sub-episodes for series while other podcasts like Coffee with Scott Adams use the actual episode number in the title. Differentiating between them would be hard and could lead to some issues.