bderoes's profile
Champion

Champion

 • 

5.1K Messages

 • 

118.7K Points

Thursday, October 15th, 2020

Closed

finding librett*

I would like to identify every person on IMDb who's got a credit for librettist or libretto.

I downloaded the dataset title.crew.tsv, but I don't have software that can handle the volume of data.

LibreOffice loaded ~1,048k records, but I suspect that's a drop in the bucket.

OpenOffice couldn't even load the file at all.

 

Is there some other (free) software that can handle this job?

All I need to do is sort that file by name id, and search for "librett".

Unless someone who does this a lot can extract those "librett" records for me.

@ljdoncel , what software do you use?

This conversation is no longer open for comments or replies and is no longer visible to community members.

Oldest First
Selected Oldest First

Accepted Solution

Champion

 • 

1.3K Messages

 • 

53.7K Points

5 years ago

Hi, @bderoes :

 

You are correct in saying that most spreadsheets or DBMS can't handle some of the very huge tsv files. I have noticed that even programs like SPSS, which allows unlimited number of records (in theory), are not able to open the largest files directly without first shortening/pre-filtering them. To achieve that, I use a text editor capable of handling large files (EmEditor) where, depending on the query I'm working on, I use to do one (or both) of the following:

  1. Filtering the cases (rows) I'm interested in to another smaller tsv file, which can be opened in Excel, LibreOffice, OpenOffice, SPSS, etc... ← THIS IS WHAT YOU NEED FOR THIS JOB
  2. Splitting the variables (columns), while keeping all the cases, into two (or more) separate "long but narrow" tsv files, which can be opened separately in SPSS, and then cross-reference between them.

I performed a quick search and there are around 13,000 cases of librett* in the database:

https://jpst.it/2jjOY

https://jpst.it/2jjOY

 

Cheers! https://jpst.it/2jjOY

Champion

 • 

5.1K Messages

 • 

118.7K Points

ljdoncel,

Thank you *AGAIN* SO MUCH.

Champion

 • 

1.3K Messages

 • 

53.7K Points

You're most welcome!

2.4K Messages

 • 

81.2K Points

Thanks a lot to both of you for these tips!!

I have worked with MS Access for ages. Its main constraint is that accdb files cannot be larger than 2 Gb (but I have tables with 10+ million rows).

So far I have found workarounds with the tsv file size, but themselves will come to an end (because of these ever increasing damned episodes...)

 

Any hint for a open/free DBMS I could plug to my Access "Front office" with good performances?
I have spent too much time developping forms, controls, VBA/SQL, etc. to start all over again!!

Anyway, thanks again for the tips above :)

Champion

 • 

1.3K Messages

 • 

53.7K Points

Hi, @Vincent :

 

In the hospital where I work we had the same problem a few years ago. Since 2000 we began to use (and sometimes still do) Microsoft Access to produce all our patient reports, including admissions, complementary tests, appointments, etc. When the database reached the then Office 2003 1 GB limit, we had to extract some of the larger tables into another mdb file and link them to the original file. In fact, recently we exceeded the current Office 2017 2 GB limit (combining all database files) and we are still employing it without any problem. So, short answer: try linking rather than embedding.

2.4K Messages

 • 

81.2K Points

Hi @ljdoncel,

 

I am already using external Access accdb/mdb, especially for IMDb tsv files, but even so, upload would fail for the larger ones, e.g. title.principals data is already 1,7 Gb big, which does not leave room for indexes.

I am using a workaround by uploading only filtered rows through an external joint. But this is only temporary and therefore I am investigating about alternative DBMS with ODBC with Access (but I am still no IT guru !)

8.8K Messages

 • 

179.5K Points

5 years ago

? ?

goto the folder with the db file on file explorer

and you can search there for librettist or libretto text ? ?

This may not be helpful ...  :-(

- - -

 

Advanced Name Search

https://www.imdb.com/search/name/

No option for Job title ? ?

 

Biographies

https://www.imdb.com/search/name/?bio=librettist - 59 names.

https://www.imdb.com/search/name/?bio=libretto  - 93 names.

.

 

(edited)

Champion

 • 

5.1K Messages

 • 

118.7K Points

An option for job title would not help much here, since this is usually in the Attribute for the job Writer.

 

I tried the file explorer thing... no results. Besides, I need the data sorted by name constant, otherwise I'll randomly get all the person's librett* credits shuffled among everyone else's, so I'll be doing at least double or triple the lookups than I need. And there are hundreds of these folks.

 

In addition, Hispanic and Asian cultures use the term librettist for "ordinary" writers. I found several telenovela series with hundreds of episodes each where the writers have attribute librettist. I need to be able to skip those large clusters, hence the need to sort by name constant.

10.7K Messages

 • 

226.1K Points

5 years ago

What about Gnu Regular Expression Program?