AITA for making this? A public dataset of Reddit posts about moral dilemmas

2 Likes

Nice. Would be good to mention that the data file itself will not be checked in to git, but rather by dvc, so it is present in .gitignore.

3 Likes

Hi, I’m trying to import this with pandas but I keep getting input/output error. How are you importing this dataset?

Hi @mgntprg, have you imported the dataset to your local machine using dvc get? As in:

$ dvc get https://github.com/iterative/aita_dataset aita_clean.csv

I can’t tell if you’ve done this step yet. If you didn’t, then the file won’t be in your local workspace and pandas won’t be able to import it. After doing dvc get, you should be able to open Python and run

df = pd.read_csv("aita_clean.csv")

Hi, I finally fixed it. I had the file but it was my pd.read_csv parameters that I needed to change to make it work

    df = pd.read_csv('gdrive/My Drive//aita_clean.csv', error_bad_lines=False, nrows=10, encoding = "ISO-8859-1", engine='python')

I’ll try some analysis on the dataset and get back to you with my results

Cool, please be in touch with any results. I always like hearing about them :slight_smile:

Also, you might be interested in the DVC python API- you can do dvc.api.open to load the file from DVC storage directly to your Python environment https://dvc.org/doc/api-reference/open#:~:text=Description,by%20DVC%20or%20by%20Git.