Apache CouchDB to store text
So far the application will find Parts of Speech and Named Entities. For anything more than this we really need a way of working on a larger scale piece of work like a novel. I have chosen Apache CouchDB for this purpose. It's a simple json based database.
There are handy bulk insert operations so that all 5419 paragraphs of the book can be uploaded in about a second.
docs = []
for id, content in enumerate(cleaned_paragraphs):
docs.append({"id": id, "content": content})
db.bulk_docs(docs)
Comments
Post a Comment