We set s-maxage to a variable amount of time, because for requests that ask for dates using a relative time ("today's data", or "the last 10 day's data"), we only want to ask the CDN to cache that for roughly an hour since that might update when the next APOD comes out.This is really powerful since this cache is shared across all users and devices. So when a request for a resource is within the s-maxage, the server (in our case, Vercel's CDN) will send the cached response. s-maxage tells servers how long to cache a request.That way clients will still get new data as soon as it updates. We set max-age to 0, following Vercel's advice, to prevent browsers from caching API response locally. If a request for a resource is within the max-age, the cached response would be used instead. max-age tells browsers how long to cache a request.) the above is a reformatted paraphrasing of the actual handler The code for the specific headers we want to send from the function handler is: response I’m hosting on Vercel, but this should work with Netlify and Cloudflare as well. We can use headers to tell the Content Delivery Network (CDN) to aggressively cache the response of our cloud function. Since historical data doesn’t change, and new entries are added once a day, the actual application server doesn't need to be hit most of the time. The bulk of time on APOD’s official API was spent waiting for the server to send the first byte. Here's a comparison of timings before and after on-demand scraping:ĪDVERTISEMENT 3. Saving each day's data as it runs allows us to continue from where the failure happened. You might wonder – why not fetch all the data first and save just one file at the end? When making 9000+ network requests, some of them are bound to fail, and you really don't want to have to start back from zero. It stores each day's result as a separate JSON file on the filesystem, and finally combines all the daily JSON data into one single data.json. extractData.ts, which calls getDataByDate with days from a date range (initially "every day between today and June 16th, 1995") using the async library's eachLimit method to make multiple requests in parallel.getDataByDate(date: DateTime) is a function that, when given a particular date, will fetch the corresponding APOD webpage for that day, parse pieces of data out of the HTML using cheerio (JavaScript's equivalent to BeautifulSoup), and return structured data in the form of a JavaScript object.Pretty chunky for a JSON file, but given that a free tier Vercel function can have an unzipped size of 250MB and has 1024MB of memory, it’s still small enough to be directly loaded without needing to bother with a database. I ended up writing a script to dump the website’s data into a single 12MB JSON file. We can separate the data extraction step from the handling of API requests. One of the main reasons why NASA’s API response is slow is because data scraping and parsing happens live, adding a significant overhead to each request. The official API also didn’t seem to do any caching – a request that took 30 seconds to load the first time would take another 30s to load the second time.ĪDVERTISEMENT 1. And it took over half a minute for a year's data when it didn’t just time out or send back a server error instead. It also looked like requests for date ranges were made serially rather than in parallel, so asking for even a month of data took a long time to come back. ![]() This wasn't great for performance though, as each day’s data that the API needed to return needed an additional network request to be fetched. ![]() MySQL would have only been released mere weeks before the first APOD photo on June 16th. Then I remembered, this website was created in 1995. The API was parsing data out of the APOD website’s HTML using BeautifulSoup, live per request. I was fascinated to find that there was no database. It’s a website where a new awe-inspiring image of the universe has been posted every day since 1995.Īs I was building a project using APOD’s official API, I found that requests would periodically time out, or take a surprisingly long time to return.Ĭurious and a bit confused (the data being returned was simple, shouldn’t require much computation, and should be easy to cache), I decided to poke around the API’s repo and see if I could find the cause, and perhaps even fix it. Astronomy Picture of the Day (APOD) is like the universe’s Instagram account.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |