| Commit message (Collapse) | Author | Age | Files | Lines | |
|---|---|---|---|---|---|
| * | Indexer+Frontend: Integrate with actix | Baitinq | 2022-11-05 | 1 | -1/+1 |
| | | |||||
| * | Misc: Cargo fmt | Baitinq | 2022-10-30 | 1 | -6/+6 |
| | | |||||
| * | Crawler: Set 4 as the maximum "crawl depth" | Baitinq | 2022-10-30 | 1 | -0/+1 |
| | | | | | Its not really crawl depth as we just count the path segments. | ||||
| * | Crawler: Accept max_queue_size as an argument for crawler() | Baitinq | 2022-10-30 | 1 | -3/+5 |
| | | | | | | | | We also now set the max queue size to the max of the root url list or the max_queue_size. This is useful because if someone changes the root url list the crawler would previously hang if it had more entries than the max_queue_size. | ||||
| * | Frontend: Move app-specific code to app.rs | Baitinq | 2022-10-30 | 1 | -0/+1 |
| | | |||||
| * | Misc: Add local lib crate to share common structs | Baitinq | 2022-10-30 | 1 | -7/+1 |
| | | |||||
| * | Crawler+Indexer+Frontend: Rename structs to follow logical relations | Baitinq | 2022-10-29 | 1 | -2/+2 |
| | | | | | | | Now Resource is CrawledResource as it is created by the crawler, and the previous CrawledResource is now IndexedResource as its created by the indexer. | ||||
| * | Crawler: Only accept HTTP_STATUS_CODE: 200 as success in crawl_url() | Baitinq | 2022-10-28 | 1 | -3/+4 |
| | | |||||
| * | Misc: Add TODOs | Baitinq | 2022-10-28 | 1 | -1/+0 |
| | | |||||
| * | Crawler: Replace String::from with .to_string() | Baitinq | 2022-10-27 | 1 | -3/+6 |
| | | |||||
| * | Crawler: Fix bad error handling with match handling | Baitinq | 2022-10-25 | 1 | -6/+9 |
| | | |||||
| * | Crawler: Use async Client | Baitinq | 2022-10-25 | 1 | -6/+11 |
| | | |||||
| * | Crawler: Shuffle crawled urls | Baitinq | 2022-10-25 | 1 | -3/+2 |
| | | |||||
| * | Crawler: Add "correct" error handling | Baitinq | 2022-10-25 | 1 | -21/+23 |
| | | |||||
| * | Crawler: Parse urls with the "url" crate | Baitinq | 2022-10-25 | 1 | -25/+24 |
| | | | | | | This fixes relative urls, makes url filtering and validation better, and many other improvements. | ||||
| * | Crawler: Add crawled url filter | Baitinq | 2022-10-24 | 1 | -1/+8 |
| | | | | | This filters hrefs such as "/", "#" or "javascript:" | ||||
| * | Crawler: Set queue size to 2222 | Baitinq | 2022-10-24 | 1 | -1/+1 |
| | | |||||
| * | Crawler+Indexer: Rust cleanup | Baitinq | 2022-10-23 | 1 | -3/+2 |
| | | | | | | | Getting more familiar with the language so fixed some non optimal into_iter() usage, unnecessary .clone()s and unnecessary hack when we could just get a &mut for inserting into the indexer url database. | ||||
| * | Crawler: Replace println! with dbg! | Baitinq | 2022-10-23 | 1 | -7/+7 |
| | | |||||
| * | Crawler: Remove prepending of https:// to each url | Baitinq | 2022-10-23 | 1 | -5/+5 |
| | | | | | | We now prepend it to the top-1000-urls list. This fixes crawled urls having two https:// | ||||
| * | Crawler: Only crawl 2 urls per url | Baitinq | 2022-10-23 | 1 | -0/+6 |
| | | | | | This makes it so that we dont get rate limited from websites. | ||||
| * | Crawler: Change blockingqueue to channels | Baitinq | 2022-10-23 | 1 | -11/+11 |
| | | | | | | We now use the async-channel channels implementation. This allows us to have bounded async channels. | ||||
| * | Crawler: Implement basic async functionality | Baitinq | 2022-10-22 | 1 | -39/+45 |
| | | |||||
| * | Crawler: Add basic indexer communication | Baitinq | 2022-10-21 | 1 | -10/+46 |
| | | |||||
| * | Crawler: Add Err string in the craw_url method | Baitinq | 2022-10-20 | 1 | -3/+3 |
| | | |||||
| * | Crawler: Add indexer interaction skeleton | Baitinq | 2022-10-20 | 1 | -1/+5 |
| | | |||||
| * | Crawler: Wrap crawl response in Result type | Baitinq | 2022-10-20 | 1 | -18/+23 |
| | | |||||
| * | Crawler: Normalise relative urls | Baitinq | 2022-10-20 | 1 | -2/+17 |
| | | | | | | We now normalise urls starting with / (relative to root) and // (relative to protocol) | ||||
| * | Crawler: Remove duplicate parsed urls | Baitinq | 2022-10-20 | 1 | -0/+3 |
| | | |||||
| * | Crawler: Add basic html parsing and link-following | Baitinq | 2022-10-20 | 1 | -9/+34 |
| | | | | | | Extremely basic implementation. Needs max queue size, error handling, formatting of parsed links. | ||||
| * | Crawler: Add skeleton crawler implementation | Baitinq | 2022-10-20 | 1 | -0/+40 |
| | | | | | | Starts by filling a queue with the top 1000 most visited sites. "Crawls" each one (empty fn), and blocks for new elements on the queue. | ||||
| * | Misc: Separate OSSE into components | Baitinq | 2022-10-19 | 1 | -0/+3 |
| We now have a cargo workspace with the Crawler, Client and Indexer packages. | |||||