| Commit message (Collapse) | Author | Files | Lines | ||
|---|---|---|---|---|---|
| 2022-10-25 | Crawler: Fix bad error handling with match handling | Baitinq | 1 | -6/+9 | |
| 2022-10-25 | Crawler: Use async Client | Baitinq | 4 | -48/+152 | |
| 2022-10-25 | Indexer: Use CrawledResource structure as values in the reverse index db | Baitinq | 3 | -11/+45 | |
| This will allow us to integrate priorities and other improvements. | |||||
| 2022-10-25 | Indexer: Add "correct" error handling | Baitinq | 1 | -7/+7 | |
| 2022-10-25 | Crawler: Shuffle crawled urls | Baitinq | 3 | -4/+5 | |
| 2022-10-25 | Crawler: Add "correct" error handling | Baitinq | 1 | -21/+23 | |
| 2022-10-25 | Crawler: Parse urls with the "url" crate | Baitinq | 3 | -25/+26 | |
| This fixes relative urls, makes url filtering and validation better, and many other improvements. | |||||
| 2022-10-24 | Crawler: Add crawled url filter | Baitinq | 1 | -1/+8 | |
| This filters hrefs such as "/", "#" or "javascript:" | |||||
| 2022-10-24 | Flake: Add rust-analyzer package | Baitinq | 1 | -0/+1 | |
| 2022-10-24 | Crawler: Set queue size to 2222 | Baitinq | 1 | -1/+1 | |
| 2022-10-24 | Misc: Update build/run instructions | Baitinq | 1 | -2/+4 | |
| Now show how to run each module + yew frontend | |||||
| 2022-10-24 | Client->Frontend: Create yew frontend skeleton | Baitinq | 8 | -14/+238 | |
| We have replaced the client with a yew frontend. | |||||
| 2022-10-23 | Crawler+Indexer: Rust cleanup | Baitinq | 2 | -14/+6 | |
| Getting more familiar with the language so fixed some non optimal into_iter() usage, unnecessary .clone()s and unnecessary hack when we could just get a &mut for inserting into the indexer url database. | |||||
| 2022-10-23 | Crawler: Replace println! with dbg! | Baitinq | 1 | -7/+7 | |
| 2022-10-23 | Crawler: Remove prepending of https:// to each url | Baitinq | 2 | -1006/+1006 | |
| We now prepend it to the top-1000-urls list. This fixes crawled urls having two https:// | |||||
| 2022-10-23 | Crawler: Only crawl 2 urls per url | Baitinq | 1 | -0/+6 | |
| This makes it so that we dont get rate limited from websites. | |||||
| 2022-10-23 | Crawler: Change blockingqueue to channels | Baitinq | 3 | -19/+45 | |
| We now use the async-channel channels implementation. This allows us to have bounded async channels. | |||||
| 2022-10-23 | Indexer: Listen on 0.0.0.0 | Baitinq | 1 | -1/+1 | |
| 2022-10-22 | Indexer: Implement basic reverse index searching and adding | Baitinq | 3 | -15/+163 | |
| Very inefficient but kind of functional:::))))))) | |||||
| 2022-10-22 | Crawler: Implement basic async functionality | Baitinq | 3 | -93/+285 | |
| 2022-10-21 | Crawler: Add basic indexer communication | Baitinq | 2 | -11/+48 | |
| 2022-10-21 | Indexer: Add skeleton http rest endpoint functionality | Baitinq | 3 | -1/+539 | |
| /search and /resource endpoint. | |||||
| 2022-10-20 | Crawler: Add Err string in the craw_url method | Baitinq | 1 | -3/+3 | |
| 2022-10-20 | Crawler: Add indexer interaction skeleton | Baitinq | 1 | -1/+5 | |
| 2022-10-20 | Crawler: Wrap crawl response in Result type | Baitinq | 1 | -18/+23 | |
| 2022-10-20 | Crawler: Normalise relative urls | Baitinq | 1 | -2/+17 | |
| We now normalise urls starting with / (relative to root) and // (relative to protocol) | |||||
| 2022-10-20 | Crawler: Remove duplicate parsed urls | Baitinq | 3 | -0/+20 | |
| 2022-10-20 | Crawler: Add basic html parsing and link-following | Baitinq | 3 | -9/+1561 | |
| Extremely basic implementation. Needs max queue size, error handling, formatting of parsed links. | |||||
| 2022-10-20 | Crawler: Add skeleton crawler implementation | Baitinq | 4 | -0/+1051 | |
| Starts by filling a queue with the top 1000 most visited sites. "Crawls" each one (empty fn), and blocks for new elements on the queue. | |||||
| 2022-10-19 | Misc: Change to use "oxalica/rust-overlay" for the nix development shell | Baitinq | 3 | -26/+90 | |
| This fixes vscode not being able to find rust-analyzer and rust-src | |||||
| 2022-10-19 | Misc: Separate OSSE into components | Baitinq | 9 | -10/+56 | |
| We now have a cargo workspace with the Crawler, Client and Indexer packages. | |||||