Skip to content

Assignment 5: Long Abstract

In the field of scientific research, efficient data management is crucial for effective collaboration, data sharing, and accessibility. This project focuses on enhancing dataset management for researchers by migrating datasets to a more accessible platform called CKAN while also standardizing metadata format and enhancing the storage space of datasets. Firstly, to migrate datasets, I created a program in Python, a programming language, which allowed me to access datasets and their metadata from the current platform and then register those same datasets into CKAN. It was important to perform this migration to CKAN because the previous platform was not accessible to the general public. Secondly, in order to enhance the efficiency of datasets, I created another Python script to take CSV (Comma Separated Values) files in datasets and convert them into something called Parquet files which require less storage space while maintaining accuracy. These Parquet files can then be put into CKAN. Another aspect of my project is converting datasets into something called the Croissant metadata format. This metadata format is a common way of formatting metadata which will make it easier for other researchers to utilize these datasets by improving data interoperability. The last part of my project is creating a user interface which would allow researchers to easily use my python scripts to manage datasets. This is essentially the results of my datasets because it integrates all of my python scripts and functionality into one place.

Chatbot
💬