Journal Entry Retrieval Program

Course: LING 508 – Computational Techniques for Linguists

Institution: University of Arizona

Project Summary

This project is a web-based application that allows users to query a database of journal entries by author name. When a name is submitted via an HTML form or API call, the system returns all journal entries associated with that author. The aim was to build a functional, scalable, and modular system that showcases a fundamental understanding of full-stack development principles in the context of Human Language Technology (HLT).

I was the sole developer on this project, which I completed as part of a course in the University of Arizona’s Human Language Technology (HLT) program. This experience enabled me to design, implement, and evaluate a text-based retrieval system grounded in real-world database structures. It reflected my growth as an HLT student by allowing me to work with structured language data, explore relational databases, and bridge backend logic with a front-end interface. While this project may not push the boundaries of HLT research, it demonstrates my ability to independently plan and execute a technically sound and professional application.

The application was developed with an emphasis on modularity and scalability, featuring:

A relational database to store entries
A service layer to abstract database access
A RESTful API to expose the data in JSON format
An HTML interface to facilitate user input

Goal:
To create a searchable journal entry retrieval system that can scale for broader applications, such as digital archives or note-taking systems.

Approach:

Design and populate a SQL database with sample journal entries
Implement Python classes to interact with the database
Create a service layer to abstract logic from data
Build a simple REST API for querying entries
Develop an HTML frontend for direct interaction

Technologies & Libraries Used

Python 3.10
Flask – Web framework for building the REST API
SQLite – Lightweight relational database for storing journal data
HTML/CSS – Front-end design
Postman – For API testing
Docker - For containerization

Challenges & Solutions

Problem	Solution
Keeping logic and data separate	Introduced a service layer to decouple classes from database logic
Ensuring future scalability	Chose SQL for flexibility; designed system to allow database swapping
Displaying results clearly	Used clear design for readable HTML output and JSON format for API
Handling missing or misspelled author names	Implemented basic error handling and case-insensitive queries

State of Project at Completion

The API could successfully handle author-based queries
The HTML frontend worked as expected for input and display
Code was tested with dummy data and modularized for future improvements

Code Structure and Architecture

I followed a microservices-inspired architecture to design the system, with clear separation between the service, repository, and API layers to support modularity, scalability, and independent testing.

The following are excerpts. Full code and clone instruction is available at my Github.

Database Layer (`db/`)

The database layer is responsible for interacting directly with the SQL-based data store. It includes:

repository.py – an abstract repository interface that defines the contract for interacting with journal data, allowing different database backends to be plugged in without modifying core application logic.
mysql_repository.py – an implementation specifically tailored for MySQL.

# repository.py
import abc
from app.journal_classes import *

class Repository(metaclass=abc.ABCMeta):
    @abc.abstractmethod
    def load_journals(self) -> list[JournalEntry]:
        raise NotImplementedError

By abstracting these interactions, I ensured that the backend database could be swapped or extended in the future without rewriting core application logic. This approach supports portability and flexibility in deployment scenarios. See Github repository for mysql_repository.py

Service Layer (`app/services.py`)

The service layer acts as a bridge between the application’s API and the underlying database implementation, abstracting the data access logic from the rest of the system. It contains the core business logic that:

Verifies and sanitizes input
Calls repository functions to fetch data
Prepares structured results for API consumption

# services.py
class Services:
    def __init__(self):
        self.repo = db.mysql_repository.MysqlRepository()
        self.journals = self.repo.load_journals()

    # Use case 1: Call journals based on an author
    def recall_author(self, target_author: str):
        """Calls the journal entries of target_author; should they exist in the database"""

        # Filter entries by the target author
        filtered_entries = [entry for entry in self.journals if entry.get_author() == target_author]

        # Print the filtered entries
        if filtered_entries:
            return filtered_entries

By centralizing these tasks in the service layer, I maintained a clean separation of concerns and made it easier to debug and test specific pieces of logic.

Web/API Layer (`app.py`)

The app.py file acts as the Flask application controller and manages API routing. It handles incoming HTTP requests, communicates with the service layer, and returns the results as JSON. It also serves the HTML front-end interface. I also configured CORS to allow cross-origin access from the client side during development.

The main route returns the HTML interface, and the /get_data/<author> endpoint allows users to query journal entries by author name. The data is returned in structured JSON format.

from flask import Flask, request, jsonify, send_from_directory
from flask_cors import CORS, cross_origin
from app.services import Services

app = Flask(__name__)
app.config['CORS_HEADERS'] = 'Content-Type'
cors = CORS(app, resources={r"/get_data": {"origins": 'http://localhost:5000'}})

services = Services()

@app.route("/")
def index():
    return send_from_directory('web', 'journal-retriever.html')

@app.route("/get_data/<string:author>", methods=["GET"])
@cross_origin(origin='http://localhost:5000', headers=['Content-Type', 'Authorization'])
def get_data(author):
    data = services.recall_author(author)

    if not data:
        return jsonify({"error": "No journals found for the given author"}), 404

    result = [entry.__dict__ for entry in data]
    return jsonify({"result": result})

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=5000)

This route structure allows users to either use a browser to access the HTML interface or make direct HTTP GET requests to /get_data/ to retrieve JSON-formatted results.

Frontend Layer (`web/`)

The HTML interface allows users to submit an author’s name and view matching journal entries. It is a simple, form-based interface that sends requests to the backend and displays results in a user-friendly format. This can be seen in depth in the Github repository.

Docker Integration

To streamline development and ensure environment consistency across systems, I used Docker and Docker Compose to containerize the application.

The Dockerfile defines a Python-based image that installs the app’s dependencies and runs the Flask server.
The docker-compose.yml file spins up both the Flask app container and a MySQL container, allowing seamless interaction without local database configuration.

This setup ensured that:

Developers or evaluators can run the project in an isolated environment.
The database and API are networked together via Docker’s internal networking.
The application can be deployed or replicated with a single command (docker-compose up).

Using Docker is an important step toward professional DevOps practices and reproducible workflows in software development.

Development Process and Highlights

Modular Code and Reusability

From the beginning, I prioritized clean, reusable code by applying software engineering best practices. Each module in the project was designed with a single responsibility principle. The service layer (services.py) orchestrates logic and communicates with the repository classes, keeping the code easy to manage and scale.

Functions such as get_journals_by_author(author_name) were abstracted and clearly defined, which not only made them reusable but also testable in isolation. I maintained consistent naming conventions, descriptive variable names, and clear function scopes throughout the project to improve code readability and maintainability.

Debugging and Testing

During development, I faced several technical challenges—particularly when configuring and interacting with a MySQL database inside a containerized environment. Establishing and maintaining a reliable connection required multiple rounds of configuration, including specifying correct hostnames for Docker (e.g., using "db" instead of "localhost") and validating user credentials.

Ensuring that each layer of the architecture—database, repository, service, and API—continued working cohesively throughout development was a priority. I applied an incremental and test-oriented development process, validating each component through regular unit and integration tests to ensure reliability across the architecture

Occasional issues with data formatting, server restarts, and mismatched expectations between layers also arose. For example, early tests failed when the API attempted to serialize objects that weren’t properly handled as dictionaries. This was addressed by ensuring all returned JournalEntry instances were processed using . __dict__ before being sent in JSON format.

All tests were grouped in the tests/ directory. These included unit tests for validating individual classes like JournalEntry, integration tests for the MySQL repository and service layer, and endpoint tests using requests to simulate real-world API calls. These tests were essential not only for validating correctness but also for revealing subtle bugs, like type mismatches or incorrect assumptions in the filtering logic.

Testing played a key role in keeping the system robust and cohesive, allowing for a smoother development process and clearer diagnosis when unexpected behavior occurred.

Folder Structure and File Organization

To ensure long-term clarity and scalability, the codebase is divided into logical folders:

app/ – contains business logic and service classes
db/ – contains data access classes and database implementations
web/ – contains HTML frontend files
documents/ – includes project documentation, such as API.md, UseCases.md, and a UML class diagram illustrating the relationships between the main components of the system—including the service layer, repository interface, and data models.
tests/ – contains test cases for core components

This layout made it easy to isolate functionality, swap out components, and onboard potential collaborators.

Documentation Practices

I used multiple layers of documentation to make the codebase easy to understand and navigate:

Docstrings in service and repository methods to describe input parameters, return types, and exceptions
README.md to provide an overview of the project, its purpose, and setup instructions
API.md to detail each API route, its method, parameters, and response formats
UseCases.md to outline practical usage scenarios
UML_class_diagram.png to visualize the structure and flow between components

This documentation ensures that anyone reviewing or building upon the project can do so.

Version Control with Git

Throughout development, I used Git and GitHub to track changes and manage the project lifecycle. I attempted to commit code frequently, using descriptive messages to document what was added or fixed. This was a learning process in remembering to do so as well. Version control helped me:

Revert code to previous versions during testing
Compare diffs to locate bugs
Maintain a clear timeline of development milestones

Although I worked independently, I practiced industry-standard Git workflows to demonstrate professional habits expected in collaborative tech environments.

Reflections and Takeaways

This project provided a valuable opportunity to apply HLT concepts in a full-stack environment. It helped me connect the dots between structured language data, database design, service logic, and end-user interaction. I gained confidence in my ability to manage a multi-layered codebase, debug common errors, and produce professional-quality documentation.

It also reinforced my understanding of modular design and the importance of abstraction, both of which are critical in larger HLT systems. I feel better prepared for future roles in the tech industry that require both backend knowledge and the ability to work with language data systems.

Share on

Twitter Facebook LinkedIn

Moises Coronel